2008 Recipient: Shimon Ullman

2008 Recipient: Shimon Ullman

Throughout his career, Shimon Ullman has exploited computational methods and experimental investigations, leading to key insights into the process of perceiving the three-dimensional structure of the world and recognizing objects from vision. He is a fitting recipient of the David E. Rumelhart prize since his research addresses the theoretical foundations of perception, and draws heavily on both mathematical and experimental investigations, as did the research of David Rumelhart.

Dr. Ullman did his undergraduate work in Mathematics, Physics and Biology at the Hebrew University in Israel. He received his Ph.D. from MIT in Electrical Engineering and Computer Science in 1977, becoming David Marr’s first Ph.D. student. Remaining at MIT, he became an Associate Professor in the Department of Brain and Cognitive Sciences in 1981 and a Full Professor in 1985. Simultaneously, he took a position in applied mathematics at the Weizmann Institute of Science in Israel. While employed at both institutions, he also became the chief scientist at Orbotech, a position he held until 2004. In 1994, he left MIT to be the Head of the Department of Applied Mathematics and Computer Science at the Weizmann Institute, where he is now the Samy & Ruth Cohn Professor of Computer Science.

Dr. Ullman has developed elegant and well-grounded computational models of vision and carefully compared them to human visual processes. This comparison has proven valuable for furthering research in both natural and artificial vision. The computational models have provided working systems that provide accounts of how humans recognize objects, perceive motion, probe their visual world for task-relevant information, and create coherent representations of their environments. These models, by reproducing many impressive feats of human vision as well as occasional illusory percepts, provide satisfying theories of how humans perceive their world [13]. Reciprocally, a close consideration of human vision has provided Dr. Ullman with inspiration for his computational models, leading to solutions to difficult problems in artificial intelligence [5]. By learning from natural intelligence, Dr. Ullman has created artificial intelligence systems that would otherwise most likely never have been constructed.

Dr. Ullman’s contributions have spanned across low-level [1, 3, 4, 14] and high-level [6, 7, 9, 10, 11, 12, 15] vision. Low-level vision is associated with the extraction of physical properties of the visible environment, such as depth, three-dimensional shape, object boundaries, and surface material. High-level vision concerns object recognition, classification, and determining spatial relations among objects. By conducting pioneering research on both fronts, Dr. Ullman has been able to create complete models of vision that begin with raw, unprocessed visual inputs and produce as outputs categorizations such a “car,” “face,” and “Winston Churchill.”

In the 1970’s, Dr. Ullman pioneered research on motion perception. In his dissertation, he developed computational mechanisms able to perceive motion of objects from noisy and complex scenes. These models assumed only the presence of patterns of light intensity that changed over time. They did not presume coherent, stable objects. In the book stemming from his dissertation [1], Dr. Ullman showed that the perception of stable objects depends on solving the “correspondence problem” – determining which elements of one movie frame correspond to the elements from the next frame. Dr. Ullman’s solution to this problem employed several constraints for determining correspondences, including a drive to create one-to-one correspondences, and the proximity, light similarity, and shape similarity of the elements across frames. None of these constraints is decisive by itself. People can see two elements as belonging to the same object even if they do not have the color, darkness, shape, or location. However, when these sources of information are combined together, correspondences emerge over time that are coherent and globally harmonious. Once established, these correspondences determine what elements across frames will be deemed as belonging to the same object, as well as the motion of the hypothesized objects.

Dr. Ullman went on to show that it is possible to determine the three-dimensional structure of an object from its motion [1, 2]. It is not necessary to have a pre-established object representation, but only identifiable points from the object projected on a two-dimensional image plane akin to the human retina. The representation of the object itself can be computed rather than assumed. Assuming that a moving object is rigid, Dr. Ullman formally proved that it is possible to deduce both the three-dimensional structure and motion of the object from only three different views of it with four non-colinear identified points. This work was an early influential example of a computationally driven approach to human vision, adding to a growing corpus of algorithmically formulated solutions for enabling artificial cognitive systems to see and interpret the world.

In the same way that the elements of temporally adjacent frames can be placed into alignment with one another to reveal motion, Dr. Ullman used another alignment process to recognize objects. He and his students developed techniques to align two-dimensional images with three-dimensional models or previously stored two-dimensional views in order to classify the images [6, 7, 10, 12]. This approach has had noteworthy success in recognizing faces and other difficult-to-describe objects [9, 10, 11].

Dr. Ullman pioneered the use of “visual routines” to compute visual relations such as “X lying inside versus outside of Object Y” and “X on Contour A but not B” [3, 9]. This work posited program-like visual operations such as marking regions, spreading regions, and boundary tracing to act as a bridge between low-level and high-level perceptual properties. Whitman Richards, Professor of Cognitive Science at MIT, notes that “these ideas continue to influence visual psychophysics and models for object recognition, saliency and attention.” Related to this work, Dr. Ullman created models of contour integration that demonstrated how people can find informative edges of objects despite noise and occlusions [9]. This work has provided the basis for a model of functional recovery following retinal lesions, linking observations on remapping of cortical topography to expected perceptual changes in subjects suffering from adult macular degeneration

Together with Christof Koch, Dr. Ullman developed the notion of “saliency maps” that underlie the detection of image locations, and are employed in conjunction with attentional mechanisms [4]. These structures serve perceptual segmentation processes, and implicate both low-level visual properties and high-level task demands. This work, while grounded in the behavior and neurobiology of human vision, has proven useful in computer and robotic vision applications in which a system must quickly and adaptively interact with a changing environment. It also led to the development of his “counter streams” model of the bi-directional flow of information in visual cortex. This model is consistent with the massive recurrent feedback found in visual cortex, and gives rise to mutually reinforcing bottom-up and top-down influences on perception [8].

Dr. Ullman has proposed that object recognition can effectively proceed by learning fragments based upon environmentally presented objects. Avoiding problems with either extreme view – either creating whole-object templates or using elemental features such as simple lines or dots – his fragment-based model acquires diagnostic intermediate representations that are weighted according to their informational content. The fragments are not predetermined, but rather are based upon training stimuli and will vary with the class of objects to be classified. Using a hierarchical representation, the fragments are assembled into larger fragments constrained by color, texture, and contour similarity [14, 15]. Firmly grounded in mathematical information theory, fragments have also received empirical support from neurophysiological investigations. Dr. Ullman’s work addresses one of the most salient puzzles regarding the neural coding of objects: why do we find few, if any neurons, that code for objects, but instead find visual neurons that are sensitive to seemingly random object features and parts?
Dr. Ullman has also played an important role in training computer, cognitive, and vision scientists. Several of his students have gone on to become leading researchers themselves, including Moshe Bar, Ronen Basri, Shimon Edelman, Kalanit Grill-Spector, Avraham Guissin, Ellen Hildreth, Dan Huttenlocher, Brian Subirana, and Dimitri Terzoupolous.

Over his career, Dr. Ullman has consistently developed elegant models that are appropriately constrained by psychological and neurophysiological evidence. His 1997 book “High-level Vision: Object recognition and visual cognition” provides a unified and singularly coherent formal approach to vision. His contributions reach far beyond the computer-vision community to researchers whose scientific passions center on the development of computational accounts of animal, and especially human, intelligence+; his outstanding contributions have been highly influential in shaping the research direction of a whole generation of cognitive scientists and neuroscientists interested in vision.+ Through his involvement with the company Orbotech, he has also participated in the development of real-world applications of his theories to the automated inspection of circuit boards and displays. Just as his models have integrated line segments to create contours and contours to create objects, his research has integrated low-level and high-level perception, neuroscience and functional descriptions, human and machine vision, as well as theory and application.

Selected Publications

[1] Ulman, S. (1979). The interpretation of visual motion. Cambridge, MA: MIT Press.[2] Ullman, S. (1980). Against direct perception. The Behavioral and Brain Sciences, 3, 373-415.[3] Ullman, S. (1984). Visual routines. Cognition, 18, 97-159.[4] Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219-227.[5] Ullman, S. (1986). Artificial intelligence and the brain: Computational studies of the visual system. Annual Review of Neuroscience, 9, 1-26.[6] Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cognition, 32, 193-254.[7] Huttenlocher, D. P., & Ullman, S. (1990). Recognizing solid objects by alignment with an image. International Journal of Computer Vision, 5, 195-212.[8] Ullman, S. (1995). Sequence-seeking and counter streams: A computational model for bi-directional information flow in the visual cortex. Cerebral Cortex, 5(1) 1-11[9] Ullman, S. (1996). High-level vision: Object recognition and visual cognition. Cambridge, MA: MIT Press.[10] Ullman, S., & Basri, R. (1991). Recognition by linear combination of models. IEEE Pattern Matching and Machine Intelligence, 13, 992-1006.[11] Adini, Y., Moses, Y. and Ullman, S. (1997). Face recognition: the problem of compensating for changes in illumination direction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 721-732[12] Moses, Y. and Ullman, S. (1998). Generalization to novel views: Universal, class-based and model-based processing. International Journal of Computer Vision, 29(3) 233-253[13] Ullman, S. & Solovieiv, S. (1999) Computation of pattern invariance in brain-like structures. Neural Networks, 12, 1021-1036.[14] Ullman, S., Vidal-Naquet, M. , and Sali, E. (2002) Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 1-6[15] Ullman, S. (2006). Object recognition and segmentation by a fragment-based hierarchy. Trends in Cognitive Sciences, 11, 58-64.