How does your auditory system serve as a movement-tracking system? In addition to sensing whether a mover is to your left or right, in front or behind, and above or below—a skill that depends on the shape, position, and number of ears you have—you possess specialized auditory software that interprets the sounds of movers and generates a good guess as to the nature of the mover’s movement through space. Your software has evolved to give you four kinds of information about a mover: (i) his distance from you, (ii) his directedness toward (or away from, or at an angle to) you, (iii) his speed, and (iv) his behavior or gait. How, then, does your auditory system infer these four kinds of information? As we will see in this and the following chapters, (i) distance is gleaned from loudness, (ii) directedness toward you is cued by pitch, (iii) speed is inferred by the number of footsteps per second, and (iv) behavior and gait are read from the pattern and emphasis of footsteps. Four fundamental parameters of human movement, and four kinds of auditory cues: (i) loudness, (ii) sound frequency, (iii) step rate, and (iv) temporal pattern and emphasis. (See Figure 13.) Your auditory system has evolved to track these cues because of the supreme value of knowing what everyone is doing nearby, and where.
This is where things get interesting. Even though joggers without headphones are not listening to music, their auditory systems are listening to fundamentally music-like constituents. Consider the four auditory movement cues mentioned just above (and shown on the right of Figure 13). Loudness? That’s just pianissimo versus piano versus forte and so on. (This is called “dynamics” in music, a term I will avoid because it brings confusion in the context of a movement theory of music.) Sound frequency? That’s roughly pitch. Step rate? That’s tempo. And the gait pattern? That’s akin to rhythm and beat. The four fundamental auditory cues for movement are, then, mighty similar to (i) loudness, (ii) pitch, (iii) tempo, and (iv) rhythm. (See Figure 14.) These are the most fundamental ingredients of music, and yet, there they are in the sounds of human movers. The most informative sounds of human movers are the fundamental building blocks of music!
Figure 13. The four properties of human movers (left) are inferred from the four respective auditory stimuli (right).
Figure 14. Central to music are the four musical properties in the center column, which map directly onto the auditory cues for sensing human movement.
The importance of loudness, pitch, tempo, and rhythm to both music and movement is, as we will see, more than a coincidence. The similarity runs deep—something speculated on ever since the Greeks[1]. Music is not just built with the building blocks of movement, but is actually organized like movement, thereby harnessing our movement-recognition auditory mechanisms. Headphoned joggers, then, don’t just miss out on the real movement around them—they pipe fictional movement into their ears, making them even more hazardous than a jogger wearing earplugs.
Much of the rest of this book is about how music came into the lives of us humans, how it gets into our brains, and why it affects us as it does. In short, we will see that music moves us because it literally sounds like moving.
The Secret Ingredient
When I was a teenager, my mother began listening to French instructional programs in order to brush up. She was proud of me when I began sitting and listening with her. “Perhaps my son isn’t a square physics kid after all,” she thought. And, in fact, I found the experience utterly enthralling. After many months, however, my mother’s pride turned to worry, because whenever she attempted to banter in even the most elementary French with me, I would stare back, dumbfounded. “Why isn’t this kid learning French?” she fretted.
What I didn’t tell my mother was that I wasn’t trying to learn French. Why was I bothering to listen to a program I could not comprehend? I will let you in on my secret in a moment, but in the meantime I can tell you what I was not listening to it for: the speech sounds. No one would set aside a half hour each day for months in order to listen to unintelligible speech. Foreign speech sounds can pique our curiosity, but we don’t go out of our way to hear them. If people loved foreign speech sounds, there would be a market for them; we would set our alarm clocks to blare German at 5:30 a.m., listen to Navajo on the way to work in the car, and put on Bushmen clicks as background for our dinner parties. No. I was not listening to the French program for the speech sounds. Speech doesn’t enthrall us—not even in French.
Whereas foreign speech sounds don’t make it as a form of entertainment, music is quintessentially entertaining. Music does get piped into our alarm clocks, car radios, and dinner parties. Music has its own vibrant industry, whereas no one is foolish enough to see a business opportunity in easy-listening foreign speech sounds. And this motivates the following question. Why is music so evocative? Why doesn’t music feel like listening to speech sounds, or animal calls, or garbage disposal rumbles? Put simply: why is music nice to listen to?
In an effort to answer, let’s go back to the French instructional program and my proud, and then concerned, mother. Why was I joining my mom each day for a lesson I couldn’t comprehend, and had no intention of comprehending? Truth be told, it wasn’t an audiotape we were listening to, but a television show. And it wasn’t the meaningless-to-me speech sounds that lured me in, but one of the actors. A young French actress, in particular. Her hair, her smile, her mannerisms, her pout . . . but I digress. I wasn’t watching for the French language so much as for the French people, one in particular. Sorry, Mom!
What was evocative about the show and kept me wanting more was the human element. The most important thing in the lives of our ancestors was the other people around them, and it is on the faces and bodies of other people that we find the most emotionally evocative stimuli. So when one finds a human artifact that is capable of evoking strong feelings, my hunch is that it looks or sounds human in some way. This is, I suggest, an important clue to the nature of music.
Let’s take a step back from speech and music, and look for a moment at evocative and nonevocative visual stimuli in order to see whether evocativeness springs from people. In particular, consider two kinds of visual stimuli, writing and color—each an area of my research covered in my previous book, The Vision Revolution.
Writing, I have argued, has culturally evolved over centuries to look like natural objects, and to have the contour structures found in three-dimensional scenes of opaque objects. The nature that underlies writing is, then, “opaque objects in 3-D,” and that is not a specifically human thing. Writing looks like objects, not humans, and thus only has the evocative power expected of opaque objects: little or none. That’s why most writing—like the letters and words on this page—is not emotionally evocative to look at. (See top left of Figure 15.) Colors, on the other hand, are notoriously evocative—people have strong preferences regarding the colors of their clothes, cars, and houses, and we sense strong associations between color and emotions. I have argued in my research and in The Vision Revolution that color vision in us primates—our new-to-primates red-green sensitivity in particular—evolved to detect the blood physiology modulations occurring in the skin, which allow us to see color signals indicating emotional state and mood. Color vision in us primates is primarily about the emotions of others. Color is about humans, and it is this human connection to color that is the source of color’s evocativeness. And although, unlike color, writing is not generally evocative, not all writing is sterile. For example, “V” stimuli have long been recognized as one of the most evocative geometrical shapes for warning symbols. But notice that “V” stimuli are reminiscent of (exaggerations of) “angry eyebrows” on angry faces. Color is “about” human skin and emotion, and “V” stimuli may be about angry eyebrows—so the emotionality in each one springs from a human source. (See top right of Figure 15.) We see, then, that the nonevocative visual signs look like opaque, not-necessarily-human objects, and the evocative visual signs look like human expressions. I have summarized this in the top row of the table in Figure 15.