Bayesian Cognitive Models for 3D Structure and Motion Multimodal Perception
PhD of João Filipe Ferreira (2010) Universidade de Coimbra
Humans use various sensory cues to extract crucial information from the environ- ment. With a view of having robots as human companions, we are motivated towards helping to develop a knowledge representation system along the lines of what we know about us. While recent research has shown interesting results, we are still far from having concepts and algorithms that interpret space, coping with the complexity of the environment.
By understanding how animals (humans) navigate and build their own spatial rep- resentation, the observed phenomena can be applied in robotics. In order to have a robust and reliable framework for navigation (i.e. in order to move within an environ- ment, manipulate objects in it, avoid undesirable mishaps — e.g. collisions — etc.) space representation, localisation, mapping and perception are all needed.
The goal of this work was to research Bayesian models to deal with fusion, multi- modality, conflicts, and ambiguities in perception, while simultaneously drawing inspi- ration upon human perceptual processes and respective behaviours.
We will present a Bayesian framework for active multimodal perception of 3D structure and motion which, while not strictly neuromimetic, finds its roots in the role of the dorsal perceptual pathway of the human brain. Its composing models build upon a common egocentric spatial configuration that is naturally fitting for the in- tegration of readings from multiple sensors using a Bayesian approach. At its most basic level, these models present efficient and robust probabilistic solutions for cy- clopean geometry-based stereovision and auditory perception based only on binaural cues, defined using a consistent formalisation that allows their use as building blocks for the multimodal sensor fusion framework, both explicitly or implicitly addressing the most important challenges of sensor fusion, for vision, audition and proprioception (in- cluding vestibular sensing). Parallely, baseline research on human multimodal motion perception presented in this text provides the support for future work in new sensor models for the framework. This framework is then extended in a hierarchical fashion by incrementally implementing active perception strategies, such as active exploration based on entropy of the perceptual map that constitutes the basis of the framework and sensory saliency-based behaviours.
The computational models described herewith support a real-time robotic imple- mentation of multimodal active perception to be used in real-world applications, such as human-machine interaction or mobile robot navigation.
With this work, we also hope to be able to address questions such as: Where are the limits on optimal sensory integration behaviour? What are the temporal aspects of sensory integration? How do we solve the “correspondence problem” for sensory integration? How to answer the combination versus integration debate? How to an- swer the switching versus weighing controversy? What are the limits of crossmodal plasticity?