The negative emotional response, I reckon, comes from the fact that we all secretly think that we understand our experimental system well enough to know exactly how it will turn out. We are, from an emotional point of view, just running through the experiment to show other people what we’ve already figured out in our heads.[38] By the time we’ve made our prediction, a process that is really like running our own internal model cognitively, we have committed emotionally to a particular outcome. We aim to “prove” our point through demonstration.
Although disappointment and disillusionment in the face of unexpected results may be natural emotional responses—and ones that I share with my students—they run counter to the way that many but not all of us reason scientifically. Strictly speaking, we demonstrate that some testable concept is true through our repeated failure to show it to be false.[39] Although we can certainly demonstrate that something predictable happens every time we release our coffee mug from a height of two meters, no one has seen gravity.[40] Gravity is a concept for a kind of energy related to the masses of objects. The relationships that we see between objects on planets and between planets and stars in space is observable and consistent with our concept of gravity; hence, having failed repeatedly to disprove those consistent relationships between objects, most of us think that gravity is a fact. Because the failure to reject gives us confidence in the truth that we infer about gravity, we employ gravity, in turn, as an inferential tool to create new scenarios. We use the fact of gravity to infer the presence of the universe’s unseen dark matter. If dark matter were shown to be nonexistent, that observation would indirectly refute the gravitational paradigm. In sum, to prove, we attempt to refute.[41]
Following our therapy session, which included a harnessed descent into the cave of refutational negativism, our Tadro team decided to look deeper into the data. We needed to figure out if, in fact, our data sucked (which is always a real possibility) or if the results were screaming in our faces about something really interesting that we just hadn’t anticipated. I’ll spare you the tedium involved in figuring out if your data suck: it’s all the usual kinds of things about checking your transcriptions, lab notebooks, mathematical formulae, control experiments, and calibrations of instruments.[42] We came to the conclusion that we couldn’t explain our results away with the reflexive “bad data by bad scientists.” Something much more interesting was afoot.
If you look back at Figure 4.1, you’ll notice that from generations 1 to 2, when selection was present, we had a big increase in the feeding behavior score that was correlated with a jump in structural stiffness. This pattern was as we predicted, and it allowed us to make the nifty overlay in Figure 4.3 in which the Tadro3 with the stiffest tail has the best feeding behavior and the Tadro with the most flexible tail has the worst. Also from generations 2 to 3, when selection wasn’t present, we still see that feeding behavior and structural stiffness are correlated, changing together. All is well, even though stiffness is dropping a bit. No big surprise, given that random genetic changes can be dominant in the absence of selection, as we talked about earlier.
After generation 2 the system appears to run amok: from generations 3 to 4 and beyond, really, the one-to-one connection between behavior and stiffness disappears. Behavior drops or stays constant while structural stiffness increases. Or the behavior score increases while structural stiffness drops, as in generations 4 to 5. In this case, in the absence of selection we know that we have only random genetic processes driving the evolutionary changes. Randomness challenges our assumptions. Because of the one-to-one relationship between behavior and stiffness from generations 1 to 3, we assumed that the structural stiffness of the axial skeleton was causally related to the feeding behavior. But faced with the evidence from later evolutionary changes, that relationship is either not true or it’s only true some of the time. How can we tell?
FIGURE 4.3. Tadro3s compete for food. In the top image the three Tadro3s jockey for position as they navigate to the light target, which serves as food. Note that the Tadro3 up top is taking a slightly different path than the other two. In the bottom diagram the paths of three Tadro3s competing in generation 1 are overlaid to show their differences in behavior. The first-place Tadro3 moves quickly to the light target, earning top scores as well for speed and the tight orbit it holds around the target. In contrast, the third-place Tadro meanders toward the target from its starting position and holds a much larger orbit around the target. For all three Tadro3s, the structural stiffnesses of their notochords positively correlated with their behavioral performance. This relationship between stiffness and behavior in generation 1 was reflected only in the evolutionary change in the population from generations 1 to 2 and 2 to 3 (see Figure 4.1).
When complexity dims the light of interpretation, one way to navigate is to stop and examine your assumptions. In our case, the first assumption we tested was a fundamental one: when we selected for enhanced feeding behavior the population responded by evolving enhanced feeding behavior. This was what happened. Thank goodness! Selection was present in four generations: 1, 5, 6, and 9; in three of those cases (generations 1, 6, and 9), the ensuing generations showed higher average feeding scores than their parental generation (see also Figure 4.4). We took the data from each individual, not just the averages, from all four of those generations with selection and statistically tested the mean response to selection. The statistical test confirmed what we see by eye: on average, selection evolves enhanced feeding behavior. Keep in mind that even when selection is acting, randomness will almost always deflect, to lesser or greater degree, the population’s evolutionary trajectory that selection proposes (see Figure 4.2).
We can learn something new (for us) and important by examining the only time, from generation 5 to 6, when selection didn’t evolve improved feeding behavior. If you look at the change in genes that results from selection (bottom panel, Figure 4.1), you can see a drop in tail length, L, accompanied by a jump in material stiffness, E. Going back to our handy-dandy equation (I just knew it would be useful) for structural stiffness, k, we know that because of the magnifying effects of the L3 term in the denominator, the population’s average k must be higher in generation 6 than in generation 5. And, indeed, we see that jump in average k in the structural stiffness plot just above the genes plot. This connection of genes and increased structural stiffness rules out the random genetic deflection idea in this case as the primary cause of the evolution of the feeding behavior. To explain this fully, though, we have to go on a little journey. Fasten your safety harness, please.
What the disconnect between selection in generation 5 and feeding behavior in generation 6 suggests is that we need to test this assumption: feeding behavior is causally connected to structural stiffness of the notochord. If this assumption were always true, then we’d expect to see behavior improve or decline in concert with increases or decreases, respectively, in the notochord’s structural stiffness. From our previous discussion, we know that behavior and stiffness don’t show any regular pattern that might lead us to believe that they were causally connected. However, several other patterns are possible. First, it could be that changes in the notochords’ structural stiffness are correlated not with the overall feeding behavior but rather with some of feeding’s sub-behaviors: swimming speed, body wobble, average distance from the food, and time to find the food. Second, those sub-behaviors might not be correlated with structural stiffness but rather with the stiffnesses’ subcharacters: material stiffness, E, and length of the tail, L.
38
You can find a great introduction to the evidence for our mental modeling in the following book: Read Montague,
39
The philosopher of science, Karl Popper, has formalized the “hypothetical-deductive” methodology in order to avoid what other philosophers have called the “problem of induction,” or generalizing from a few observations to the world in general. Most of our statistical hypothesis testing in science is structured around the idea of falsification or rejection of the “null” hypothesis. The danger with this approach is that if you reject the null hypothesis, you are tempted to treat the alternative as “true,” when in fact it becomes the new null to be tested. An excellent place to start with this kind of careful inference is with Popper himself: Karl R. Popper,
40
For that matter (ahem …), no one has seen energy. In fact, physicists don’t even know what energy is. Richard Feynman, the Nobel Laureate in Physics, and his coauthors Robert Leighton and Matthew Sands point this out eloquently in
41
I’ve given short shrift here to an interesting philosophical debate: logical positivism versus Popper’s hypothetico-deductivism. One distillation of the difference is
42
In case you are interested in what we did to try to find our flaws, here’s an example. We were very concerned that our initial measurements of structural stiffness were somehow flawed. We had created a standard curve that gave us a value of material stiffness,