Toward bridging the gap to humans: a human-inspired scene perception model for multi-functional service robots
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The long-term vision in robotics is to empower multi-functional service robots with human-like capabilities, enabling them to handle multiple complex tasks in open-world settings. Perception provides the information basis for every cognitive function, making a holistic and long-term understanding of the environment essential. From this, research questions arise that investigate how the necessary scene information can be determined, fused, organized, and analyzed so that robots can perceive the environment holistically. This thesis aims to develop a human-inspired scene perception model to close the gap towards human-like skills. The methodology draws inspiration from neuroscience, leveraging fundamental concepts of perception and applying them to mobile robots. This process involves selecting relevant principles from robot and human perception domains, culminating in a holistic solution. The developed approach emulates the three-part division of perception pre-sented in popular neuroscience studies: recognition, which makes sensory data under-standable; knowledge representation, enabling the storage of perceived information in a structured manner; and interpretation, facilitating the meaningful utilization of the acquired scene knowledge. The approach employs 3D segmentation to separate the scene background from the foreground, mimicking humans’ preattentive and postattentive pro-cessing. While the background is initially reconstructed based on simple 3D shapes, the foreground is given concentrated processing of distinct regions, i. e., segmented instances. The approach integrates newly developed components with a popular 2D bounding-box-based object detector and a feature-based 3D SLAM to leverage the performance. Specifically, detected objects are merged with 3D foreground segments and input into a multi-object tracking pipeline, consolidating detections into spatially located instances. The entire scene knowledge, including instance detections, aggregated instances, and the spatial map, is represented in a knowledge base. Scene analysis techniques use this comprehensive knowledge to adapt the perception during runtime. Two key features have been implemented: a determination of object properties and spatio-temporal analysis to extract the spatial distribution of objects based on heatmaps. Experiments in two simulated and one real-world environment deploy a novel benchmark for repetitively executing fetch-and-carry tasks with a single-setting ablation study to pinpoint the perceptual performance of each component. The background-foreground split accelerates the recognition by 70 %. The multi-object-tracking pipeline obtains valuable object knowledge that significantly increases the success rates from 20 % up to 100 % and empowers scene analysis techniques to accelerate the scenario up to 61 % when objects of interest are in distinct areas. The experiments highlight that a human-inspired perception enhances the robot’s efficiency and safety, underlining the potential of the approach to empower multifunctional service robots with human-like capabilities.