Abstract:
Perceptual organization is a core process of human vision that transforms the raw visual input into a structured, object-centric scene representation.
Motion information plays a central role in this process:
On the one hand, perceiving motion requires a perceptual organization process to establish an identity behind percepts at different time points.
On the other hand, motion information has been shown to be a dominant cue that humans use to infer scene structure, for example by the Gestalt principle of common fate.
Moreover, motion has been shown to enable more efficient processing by guiding attention to relevant areas of scenes, and to contribute to learning perceptual organization during infancy.
In this thesis, we combine insights from psychology and neuroscience with recent advances in machine learning in order study how motion motion promotes different aspects of dynamic scene perception.
First, we study the role of motion in guiding eye movements as a basis for more efficient scene perception.
Our analysis reveals several strong effects of temporal patterns on eye movements in a data-driven manner, but also identifies their scarcity in common benchmarks as a key limitation for modeling this process.
We propose a new benchmark that combines the respective cases from several existing benchmarks to support future research on this topic.
In our second project, we take inspiration from developmental psychology and study the role of motion for learning how to decompose a scene into objects.
Trained this way, our model reflects central capabilities of scene perception in humans, such as the ability to complete partial objects and to generate novel scenes that systematically generalize beyond the training distribution.
Finally, we study the neural basis of motion segmentation using a combination of computational modeling and experimental psychophysics.
We find striking differences between state-of-the-art computer vision models and human perception in terms of appearance-independent segmentation of moving random dot patterns.
Furthermore, we show that a neuroscience-inspired motion energy approach allows matching human perception and thus provide a compelling link between the neural mechanisms of motion perception and the Gestalt principle of common fate.
In summary, the projects in this thesis contribute to our understanding how motion information promotes perceptual organization from an interdisciplinary NeuroAI perspective.
DNNs allow building more capable scientific models of human vision, and thus enable novel insights into the perception of natural scenes.
Conversely, we show that insights from human vision can be successfully transferred to a computer vision setting.
Our work therefore contributes to a more holistic understanding of human vision and provides insights that may inspire more capable machine vision in the future.