Выбрать главу

The difference, then, between a live show seen up close and a 3-D movie of the same show is that the former pulls just one or several audience members into the thick of the story, whereas 3-D movies have this effect on everyone. So the fun of 3-D movies is not that they are 3-D at all. We can have the same fun when we happen to be the target in a real live show. The fun is in being targeted. When the show doesn’t merely leap off the screen, but leaps at you, it fundamentally alters the emotional experience. It no longer feels like a story about others, but becomes a story that invades your space, perhaps threateningly, perhaps provocatively, perhaps joyously. You are immersed in the story, not an audience member at all.

What does all this have to do with music and the auditory sense? Imagine yourself again at a live show. You hear the performers’ rhythmic banging ganglies as they carry out behaviors onstage. And as they move onstage and vary their direction, the sounds they make will change pitch due to the Doppler effect. Sitting there in the audience, watching from a vantage point outside of the story, you get the rhythm and pitch modulations of human movers. You get the attitude (rhythm) and action (pitch). But you are not immersed in the story. You can more easily remain detached.

Now imagine that the performers suddenly begin to target you. Several just jumped off the stage, headed directly toward you. A minute later, there you are, grinning and red-faced, with tousled hair and the bright red lipstick mark of a mistress’s kiss on your forehead . . . and, for good measure, a pirate is in your face calling you “salty.” During all this targeting you hear the gait sounds and pitch modulations of the performers, but you also heard these sounds when you were still in detached, untargeted audience-member mode. The big auditory consequence of being targeted by the actors is not in the rhythm or pitch, but in the loudness. When the performers were onstage, most of the time they were more or less equidistant, and fairly far away—and so there was little loudness modulation as they carried on. But when the performers broke through the “screen,” they ramped up the volume. It is these high-loudness parts of music—the fortissimos, or ff s—that are often highly evocative and thrilling, as when the dinosaur reaches out of the 3-D theater’s screen to get you.

And that’s the final topic of this chapter: loudness, and its musical meaning. I will try to convince you that loudness modulations are used in music in the 3-D, invade-the-listener’s-space fashion I just described. In particular, this means that the loudness modulations in music tend to mimic loudness modulations due to changes in the proximity of a mover. Before getting into the evidence for this, let’s discuss why I don’t think loudness mimics something else.

Nearness versus Stompiness

I will be suggesting that loudness in music is primarily driven by spatial proximity. Rather than musical pitch being a spatial indicator, as is commonly suggested (see the earlier section “Why Pitch Seems Spatial”), it is loudness in music that has the spatial meaning. As was the case with pitch, here, too, there are several stumbling blocks preventing us from seeing the spatial meaning of loudness. The first is the bias for pitch: if one mistakenly believes that pitch codes for space, then loudness must code for something else. A second stumbling block to interpreting loudness as spatial concerns musical notation, which codes loudness primarily via letters (pp, p, mf, f, ff, and so on), rather than as a spatial code (which is, confusingly, how it codes pitch, as we’ve seen). Musical instruments throw a third smokescreen over the spatial meaning of loudness, because most instruments modulate loudness not by spatial modulations of one’s body, but by hitting, bowing, plucking, or blowing harder.

Therefore, several factors are conspiring to obfuscate the spatial meaning of loudness. But, in addition, the third source of confusion I just mentioned suggests an alternative interpretation: that loudness concerns the energy level of the sound maker. A musician must use more energy to play more loudly, and this can’t help but suggest that louder music might be “trying” to sound like a more energetic mover. The energy with which a behavior is carried out is an obvious real-world source of loudness modulations. These energy modulations are, in addition, highly informative about the behavior and expressions of the mover. A stomper walking nearby means something different than a tiptoer walking nearby. So energy or “stompiness” is a potential candidate for what loudness might mean in music.

Loudness in the real world can, then, come both from the energy of a mover and from the spatial proximity of the mover. And each seems to be the right sort of thing to potentially explain why the loudest parts of music are often so thrilling and evocative: stompiness, because the mover is energized (maybe angry); proximity, because the mover is very close by. Which of these ecological meanings is more likely to drive musical loudness, supposing that music mimics movement? Although I suspect music uses high loudness for both purposes—sometimes to describe a stompy mover, and sometimes to describe a nearby mover—I’m putting my theoretical money on spatial proximity.

One reason to go with the spatial-proximity interpretation of loudness, at the expense of the stompiness interpretation, is pragmatic: the theory is easier! Spatial proximity is simply distance from the listener, and so changes in loudness are due to changes in distance. That’s something I can wrap my theoretical head around. But I don’t know how to make predictions about how walkers vary in their stompiness. Stompers vary their stompiness when they want to, not in the way physics wants to. That is, if musical loudness is stompiness, then what exactly does this predict? It depends on the psychological dynamics of stompiness, and I don’t know that. So, as with any good theorist, spatial proximity becomes my friend, and I ignore stompiness.

But there is a second reason, this one substantive, for latching onto spatial proximity as the meaning of musical loudness. Between proximity and stompiness, proximity can better explain the large range of loudness that is possible in music. Loudness varies as the inverse square of proximity, and so it rises dramatically as a mover nears the listener. Spatial proximity can therefore bring huge swings in loudness, far greater than the loudness changes that can be obtained by stomping softly and then loudly at a constant distance from a listener. That’s why I suspect proximity is the larger driver of loudness modulations in music. And as we will see, the totality of loudness phenomena in music are consistent with proximity, and less plausible for stompiness (including the phenomenon discussed in Encore 5, that note density rises with greater loudness).

Thus, to the question “Is it nearness or stompiness that drives musical loudness modulations?” the answer, for both pragmatic and substantive reasons, is nearness, or proximity. Nearness can modulate loudness much more than stompiness can, and nearness is theoretically tractable in a way that stompiness is not. Let’s see if proximity can make sense of the behavior of loudness in music.

Slow Loudness, Fast Pitch

Have you ever wondered why our musical notation system is as it is? In particular, why does our Western music notation system indicate pitch by shifting the notes up and down on the staff, while it indicates loudness symbolically by letters (e.g., pp, f ) along the bottom? Figure 37 shows a typical piece of music. Even if you don’t read music—and thus don’t know exactly which pitch each note is on—you can instantly interpret how the pitch varies in the melody. In this piece of music, pitch rises, wiggles, falls, falls, falls yet again, only to rise and tumble down. You can see what pitch does because the notation system creates what is roughly a plot of pitch versus time. Loudness, on the other hand, must be read off the letters along the bottom, and their meaning unpacked from your mental dictionary: p for “quiet,” f for “loud,” and so on. Why does pitch get a nice mapping onto spatial position, whereas loudness only gets a lookup table, or glossary?