FIGURE 2.8 (a) A pig rump.
(b) A bear.
The problem is even more acute for faces. Figure 2.9a is a cartoon face. The mere presence of horizontal and vertical dashes can substitute for nose, eyes, and mouth, but only if the relationship between them is correct. The face in Figure 2.9b has the same exact features as the one in Figure 2.9a, but they’re scrambled. No face is seen—unless you happen to be Picasso. Their correct arrangement is crucial.
But surely there is more to it. As Steven Kosslyn of Harvard University has pointed out, the relationship between features (such as nose, eyes, mouth in the right relative positions) tells you only that it’s a face and not, say, a pig or a donkey; it doesn’t tell you whose face it is. For recognizing individual faces you have to switch to measuring the relative sizes and distances between features. It’s as if your brain has a created a generic template of the human face by averaging together the thousands of faces it has encountered. Then, when you encounter a novel face, you compare the new face with the template—that is, your neurons mathematically subtract the average face from the new one. The pattern of deviation from the average face becomes your specific template for the new face. For example, compared to the average face Richard Nixon’s face would have a bulbous nose and shaggy eyebrows. In fact, you can deliberately exaggerate these deviations and produce a caricature—a face that can be said to look more like Nixon than the original. Again, we will see later how this has relevance to some types of art.
FIGURE 2.9 (a) A cartoon face.
(b) A scrambled face.
We have to bear in mind, though, that words such as “exaggeration,” “template,” and “relationships” can lull us into a false sense of having explained much more than we really have. They conceal depths of ignorance. We don’t know how neurons in the brain perform any of these operations. Nonetheless, the scheme I have outlined might provide a useful place to start future research on these questions. For example, over twenty years ago neuroscientists discovered neurons in the temporal lobes of monkeys that respond to faces; each set of neurons firing when the monkey looks at a specific familiar face, such as Joe the alpha male or Lana the pride of his harem. In an essay on art that I published in 1998, I predicted that such neurons might, paradoxically, fire even more vigorously in response to an exaggerated caricature of the face in question than to the original. Intriguingly, this prediction has now been confirmed in an elegant series of experiments performed at Harvard. Such experiments are important because they will help us translate purely theoretical speculations on vision and art into more precise, testable models of visual function.
Object recognition is a difficult problem, and I have offered some speculations on what the steps involved are. The word “recognition,” however, doesn’t tell us anything much unless we can explain how the object or face in question evokes meaning—based on the memory associations of the face. The question of how neurons encode meaning and evoke all the semantic associations of an object is the holy grail of neuroscience, whether you are studying memory, perception, art, or consciousness.
AGAIN, WE DON’T really know why we higher primates have such a large number of distinct visual areas, but it seems that they are all specialized for different aspects of vision, such as color vision, seeing movement, seeing shapes, recognizing faces, and so on. The computational strategies for each of these might be sufficiently different that evolution developed the neural hardware separately.
A good example of this is the middle temporal (MT) area, a small patch of cortical tissue found in each hemisphere, that appears to be mainly concerned with seeing movement. In the late 1970s a woman in Zurich, whom I’ll call Ingrid, suffered a stroke that damaged the MT areas on both sides of her brain but left the rest of her brain intact. Ingrid’s vision was normal in most respects: She could read newspapers and recognize objects and people. But she had great difficulty seeing movement. When she looked at a moving car, it appeared like a long succession of static snapshots, as if seen under a strobe. She could read the number plate and tell you what color it was, but there was no impression of motion. She was terrified of crossing the street because she didn’t know how fast the cars were approaching. When she poured water into a glass, the stream of water looked like a static icicle. She didn’t know when to stop pouring because she couldn’t see the rate at which the water level was rising, so it always overflowed. Even talking to people was like “talking on a phone,” she said, because she couldn’t see the lips moving. Life became a strange ordeal for her. So it would seem that the MT areas are concerned mainly with seeing motion but not with other aspects of vision. There are four other bits of evidence supporting this view.
First, you can record from single nerve cells in a monkey’s MT areas. The cells signal the direction of moving objects but don’t seem that interested in color or shape. Second, you can use microelectrodes to stimulate tiny clusters of cells in a monkey’s MT area. This causes the cells to fire, and the monkey starts hallucinating motion when the current is applied. We know this because the monkey starts moving his eyes around tracking imaginary moving objects in its visual field. Third, in human volunteers, you can watch MT activity with functional brain imaging such as fMRI (functional MRI). In fMRI, magnetic fields in the brain produced by changes in blood flow are measured while the subject is doing or looking at something. In this case, the MT areas lights up while you are looking at moving objects, but not when you are shown static pictures, colors, or printed words. And fourth, you can use a device called a transcranial magnetic stimulator to briefly stun the neurons of volunteers’ MT areas—in effect creating a temporary brain lesion. Lo and behold, the subjects become briefly motion blind like Ingrid while the rest of their visual abilities remain, to all appearances, intact. All this might seem like overkill to prove the single point that MT is the motion area of the brain, but in science it never hurts to have converging lines of evidence that prove the same thing.
Likewise, there is an area called V4 in the temporal lobe that appears to be specialized for processing color. When this area is damaged on both sides of the brain, the entire world becomes drained of color and looks like a black-and-white motion picture. But the patient’s other visual functions seem to remain perfectly intact: She can still perceive motion, recognize faces, read, and so on. And just as with the MT areas, you can get converging lines of evidence through single-neuron studies, functional imaging, and direct electrical stimulation to show that V4 is the brain’s “color center.”
Unfortunately, unlike MT and V4, most of the rest of the thirty or so visual areas of the primate brain do not reveal their functions so cleanly when they are lesioned, imaged, or zapped. This may be because they are not as narrowly specialized, or their functions are more easily compensated for by other regions (like water flowing around an obstacle), or perhaps our definition of what constitutes a single function is murky (“ill posed,” as computer scientists say). But in any case, beneath all the bewildering anatomical complexity there is a simple organizational pattern that is very helpful in the study of vision. This pattern is a division of the flow of visual information along (semi)separate, parallel pathways (Figure 2.10).