Figure 7. Illustration that voiced plosives are like rigid, elastic hits, and unvoiced plosives like nonrigid, inelastic hits. These plots show the amplitude of the sound on the y-axis, and time on the x-axis. (a) The sound made by a stiff hardcover book landing on my wooden desk on the left, followed by the sound of that same book landing on my desk, but where a wrinkly piece of paper cushioned the landing (making it less rigid and less elastic). (b) Me saying “bee” and “pee.” Notice that in the inelastic book-drop and the unvoiced plosive cases—i.e., the right in (a) and (b)—there is a delay after the initial collision before the ringing begins.
Not only do languages utilize a wide variety of voice onset times—hit-to-ring gaps—for plosive phonemes, but one does not find plosive phonemes that don’t care about the length of the gap. One could imagine that, just as the intensity of a spoken plosive doesn’t change the identity of the plosive, the voice onset time after a plosive might not matter to the identity of a plosive. But what we find is that it always does matter. And that’s because the intensity of a hit in nature is not informative about the objects involved, but the gap from hit to ring is informative (as is the timbre). That’s why the gap from hit to ring is harnessed in language. And that’s why, as we saw earlier, the distinct plosive sounds at the start and end of words are treated as the same, despite being acoustically more different than are voiced and unvoiced plosives (like “b” and “p”).
In light of the ecological meaning of voiced versus unvoiced plosives, consider the following two letters from a mystery language: ◆ and ✴. Each stands for a plosive, but one is voiced and the other unvoiced. Which is which? Most people guess that ◆ is voiced, and that ✴ is unvoiced. Why? My speculation is that it is because ◆ looks rigid, and would tend to be involved in hits that are voiced (i.e., a short gap from hit to ring), whereas ✴ looks more kinked, and thus would be likely to have a more complex collision, one that is unvoiced (i.e., a long gap between hit and ring). My “mystery language” is fictional, but could it be that more rigid-looking letters across real human writing systems have a tendency to be voiced, and more kinked-looking letters have a tendency to be unvoiced? It is typically assumed that the shapes of letters are completely arbitrary, and have no connection to the sounds of speech they stand for, but could it be that there are connections because objects with certain shapes tend to make certain sounds? This is the question Kyle McDonald—a graduate student at Rensselaer Polytechnic Institute (RPI) working with me—raised and set out to investigate. He found that letters having junctions with more contours emanating from them—i.e., the more kinked letters—have a greater probability of being unvoiced. For example, in English the three voiced plosives are “b,” “d,” and “g,” and their unvoiced counterparts are “p,” “t,” and “k.” Notice how the unvoiced letters—the “t” and “k,” in particular—have more complex structures than the voiced ones. Kyle McDonald’s data—currently unpublished—show that this is a weak but significant tendency across writing systems generally.
Rigid Muffler
As I walk along my upstairs hallway, I accidentally bump the hammer I’m carrying into the antique gong we have, for some inexplicable reason, hung outside the bedroom of our sleeping infant. I need to muffle it, quickly! I have one bare hand, and the other wielding the guilty hammer; what do I do? It’s obvious. I should use my bare hand, not the hammer, to muffle the gong. Whereas my hand will dampen out the gong ring quickly, the hammer couldn’t be worse as a dampener. My hand serves as a good gong-muffler because it is fleshy and nonrigid. My hand muffles the gong faster than the rigid hammer, yet recall from the previous section that nonrigid objects cause explosive hits with long hit-to-ring gaps. Nonrigid hits create rings with a delay, and yet diminish rings without delay. And, similarly, rigid hits create rings without delay, but are slow dampeners of rings.
These gong observations are crucial for understanding what happens to voiced and unvoiced plosives when they are not released (i.e., when the air in the mouth and lungs is not allowed to burst out, creating the explosive hit sound), which often occurs at word endings (as discussed in the section titled “Two-Hit Wonder”). When a plosive is not released, there clearly cannot be a hit-to-ring gap—because it never rings. So how do voiced and unvoiced plosives retain their voiced-versus-unvoiced distinction at word endings? For example, consider the word “bad.” How do we know it is a “d” and not a “t” at the end, given that it is unreleased, and thus there is no hit-to-ring delay characterizing it as a “d” and not a “t”?
My gong story makes a prediction in this regard. If voiced plosives really have their foundation in rigid objects (mimicking rigidity’s imperceptibly tiny hit-to-ring gap at a word’s beginning), then, because rigid objects are poor mufflers, the sonorant preceding an unreleased voiced plosive at a word ending should last longer than the sonorant preceding an unreleased unvoiced plosive at a word ending. For example, the vowel sound in “bad” should last longer than in the word “bat.” The nonrigid “t” at the end of the latter should muffle it quickly. Are words like “bad” spoken with vowels that ring longer than in words like “bat”?
Yes. Say “bad” and “bat.” The main difference is not whether the final plosive is voiced—neither is, because neither is ever released, and thus neither ever gets to ring. Notice how when you say “bad,” the “a” gets more drawn out, lasting longer, than the “a” sound in “bat.” Most nonlinguist readers may never have noticed that the principal distinguishing feature of voiced and unvoiced plosives at word endings is not whether they are voiced at all. It is a seemingly unrelated feature: how long the preceding vowel lasts. But, as we see from the physics of events, a longer-lasting ring before a dampening hit is the signature of a rigid object’s bouncy hit, and so there is a fundamental ecological order to the seemingly arbitrary linguistic phonological regularity. (See Figure 8.)
Figure 8. Matrix illustrating the tight match between the qualities of hits (not in parentheses) and plosives (within parentheses). For hits, the columns distinguish between rigid and nonrigid hits, and the rows distinguish between hits that initiate rings and hits that muffle rings. Inside the matrix are short descriptions of the auditory signature of the four kinds of hits. For plosives, the columns distinguish the analogs of rigid and nonrigid hits, which are, respectively, voiced and unvoiced plosives; the rows distinguish the analogs of ring-initiating and ring-muffling hits, which are, respectively, released and unreleased plosives. Together, this means four kinds of hits, and four expected kinds of plosives, matching the signature features of the respective hits. If the meaning of voiced versus unvoiced concerns rigid versus nonrigid objects, then we expect that plosives at word starts should have little versus a lot of voice-onset time, respectively, for voiced and unvoiced. And we expect that for plosives at word endings the voiced ones should reveal themselves via a longer preceding sonorant (slow to damp) whereas unvoiced should reveal themselves via a shorter preceding sonorant (fast to damp). Plosives do, in fact, modulate across this matrix as predicted from the ecological regularities of rigid and nonrigid hits at ring-inceptions and ring-dampenings.