****Please excuse the mess- site under construction****
NSF/CRCNS Project: |
Object and Action Recognition in Time Sequences of Images: |
About the CenterInvestigatorsPublicationsNews and EventsRecent AchievementsConference Abstracts
|
A collaboration between the laboratories of Tomaso Poggio at MIT and David Sheinberg at Brown University. Abstract: |
Normal vision is not static: time is a key dimension of the natural world we see. The eventual understanding of biological vision requires understanding the neural mechanisms used to recognize objects and actions over time. Thus the focus of the proposed research is to study how the primate visual system recognizes objects and actions in time sequences of images. A meta-goal of this project is to exploit the synergies between computational approaches and physiological experiments to lead to a better understanding of brain function and at the same time to develop better computer vision algorithms. Object recognition in time sequences of images presents a significant challenge for recognition systems, because it requires both selectivity to shape and invariance to changes of appearance in time. Here, our main scientific hypothesis is that the same neural mechanisms involved in learning selectivity to actions are responsible for learning selectivity and invariance to objects that change appearance in time. A computational model (developed in one of our groups) of the initial, feedforward flow of information in the primate visual system for the task of recognizing static images 1) agrees with a variety of physiological findings in different cortical areas of the ventral visual pathway, including V1, V4, and IT, 2) is consistent with human psychophysics on rapid categorization and 3) performs well on difficult recognition tasks. We propose to extend this modeling work to the recognition of dynamic scenes and objects – and in particular to the recognition of actions – through a coordinated effort based on computation and experimental methods. First, we will extend the computational model of the ventral stream by adding temporal dynamics in its model neurons and the ability to process video sequences. We will also expand a working model of the dorsal stream to understand the relative roles that it and the ventral stream play in dynamic visual recognition. Second, we will record from single units, and multiple single units, from high level visual areas including IT and regions of the STS to characterize the tuning of single neurons to the shape dynamics of specific image sequences. Guided by the model, we will design stimuli and physiology experiments with behaving monkeys, in order to understand the role of motion signals compared to combined static shape signals in action perception. Are these representations integrated in the anterior regions of the visual pathway (e.g., in anterior STS) or are they independent paths to action? What role do sensory areas that encode persistent stimuli play in the maintained representation of image sequences? By combining modeling and physiology, we propose to find a computational explanation for how the higher areas of the visual cortex recognize objects and actions over time and how they can learn invariances – for instance to viewpoint – from different types of temporal sequences. To test our refined model, we will collect and process a large database of natural videos, and compare their performance with state-of-the-art computer vision systems. Intellectual Merit. Understanding how the brain works and reproducing its central capabilities in computers is arguably one of the greatest problems in science and engineering. The study of vision is a key step in understanding one of the most advanced aspects of the primate brain and one of the computationally most difficult competences of our nervous system. This proposal builds upon cooperative work between computational neuroscience and neurophysiology of the visual cortex. The proposal directly addresses one of the major challenges for understanding normal and abnormal sensory perception: how does the brain represent, process and recognize time sequences of images? This is a key question in the quest to understand how the visual system operates. Broader Impact. A better understanding of the computations performed by the brain as it processes complex streams of visual information will have a major impact in the fields of neuroscience, education, computer science, including artificial intelligence, robotics and surveillance systems. In addition, establishing a quantitative, mechanistic link between behavior and neural processing can yield important insights into mental disorders. Our integrative effort, which is focused on processing of dynamic perceptual information, can have a significant and direct impact on current theories of autism, dyslexia, and effects of stroke. As in previous projects, the research we propose is tightly coupled to education and teaching, theses of graduate students and training of a new generation of interdisciplinary researchers combining a background in computer science and in neuroscience at two Institutions, MIT and Brown. One of us [DS] is a physiologist, the other [TP] works in computer science and neuroscience (all the animal work will be done at Brown, computer simulations at MIT and data analysis jointly). The databases of videos, the stimuli, the modeling software and the experimental data will be made available to the broad scientific community. |
|
Supported by: NSF |
|
Comments and questions to____________ |
|