RISE was a step toward the Master Algorithm because it combined symbolic and analogical learning. It was only a small step, however, because it doesn’t have the full power of either of those paradigms, and it’s still missing the other three. RISE’s rules can’t be chained together in different ways; each rule just predicts the class of an example directly from its attributes. Also, the rules can’t talk about more than one entity at a time; for example, RISE can’t express a rule like If A has the flu and B was in contact with A, B may have the flu as well. On the analogical side, RISE just generalizes the simple nearest-neighbor algorithm; it can’t learn across domains using structure mapping or some such strategy. At the time I finished my PhD, I didn’t see a way to bring together in one algorithm the full power of all the five paradigms, and I set the problem aside for a while. But as I applied machine learning to problems like word-of-mouth marketing, data integration, programming by example, and website personalization, I kept seeing how each of the paradigms provided only part of the solution. There had to be a better way.
And so we have traveled through the territories of the five tribes, gathering their insights, negotiating the border crossings, wondering how the pieces might fit together. We know immensely more now than when we started out. But something is still missing. There’s a gaping hole in the center of the puzzle, making it hard to see the pattern. The problem is that all the learners we’ve seen so far need a teacher to tell them the right answer. They can’t learn to distinguish tumor cells from healthy ones unless someone labels them “tumor” or “healthy.” But humans can learn without a teacher; they do it from the day they’re born. Like Frodo at the gates of Mordor, our long journey will have been in vain if we don’t find a way around this barrier. But there is a path past the ramparts and the guards, and the prize is near. Follow me…
CHAPTER EIGHT: Learning Without a Teacher
If you’re a parent, the entire mystery of learning unfolds before your eyes in the first three years of your child’s life. A newborn baby can’t talk, walk, recognize objects, or even understand that a object continues to exist when the baby isn’t looking at it. But month after month, in steps large and small, by trial and error and great conceptual leaps, the child figures out how the world works, how people behave, and how to communicate. By a child’s third birthday, all this learning has coalesced into a stable self, a stream of consciousness that will continue throughout life. Older children and adults can time-travel, aka remember things past, but only so far back. If we could revisit ourselves as infants and toddlers and see the world again through those newborn eyes, much of what puzzles us about learning-even about existence itself-would suddenly seem obvious. But as it is, the greatest mystery in the universe is not how it begins or ends, or what infinitesimal threads it’s woven from, it’s what goes on in a small child’s mind: how a pound of gray jelly can grow into the seat of consciousness.
The scientific study of children’s learning is still young, having begun in earnest only a few decades ago, but it has already come remarkably far. Infants can’t answer questionnaires or follow experimental protocols, but we can infer a surprising amount about what goes on in their minds by videotaping and studying their reactions during experiments. A coherent picture emerges: an infant’s mind isn’t just the unfolding of a predefined genetic program or a biological device for recording correlations in sense data; rather, the infant’s mind actively synthesizes his or her reality, and this reality changes quite radically over time.
Increasingly, and most relevant to us, cognitive scientists express their theories of children’s learning in the form of algorithms. Many machine-learning researchers take inspiration from this. Everything we need is right there in a child’s mind, if only we can somehow capture its essence in computer code. Some researchers even argue that the way to create intelligent machines is to build a robot baby and let him experience the world as a human baby does. We, the researchers, would be his parents (perhaps even with an assist from crowdsourcing, giving a whole new meaning to the term global village). Little Robby-let’s call him that, in honor of the chubby but much taller robot in Forbidden Planet-is the only robot baby we’ll ever have to build. Once he has learned everything a three-year-old knows, the AI problem is solved. We can copy the contents of his mind into as many other robots as we like, and they’ll take it from there, the hardest part already accomplished.
The question, of course, is what algorithm should be running in Robby’s brain at birth. Researchers influenced by child psychology look askance at neural networks because the microscopic workings of a neuron seem a million miles from the sophistication of even a child’s most basic behaviors, like reaching for an object, grasping it, and inspecting it with wide, curious eyes. We need to model the child’s learning at a higher level of abstraction, lest we miss the planet for the trees. Above all, even though children certainly get plenty of help from their parents, they learn mostly on their own, without supervision, and that’s what seems most miraculous. None of the algorithms we’ve seen so far can do it, but we’re about to see several that can-bringing us one step closer to the Master Algorithm.
Putting together birds of a feather
We flip the “on” switch, and Robby’s video eyes open for the very first time. At once he’s flooded with what William James memorably called the “blooming, buzzing confusion” of the world. With new images streaming in at a rate of dozens per second, one of the first things he must do is learn to organize them into larger chunks. The real world is made up of objects that persist over time, not random pixels changing arbitrarily from one moment to the next. Mommy isn’t replaced by a smaller Mommy when she walks away. Putting a dish on the table doesn’t make a white hole in it. A young baby is not surprised if a teddy bear passes behind a screen and reemerges as an airplane, but a one-year-old is. Somehow, he’s figured out that teddy bears are different from airplanes and don’t spontaneously transmute. Soon afterward, he’ll figure out that some objects are more alike than others and start forming categories. Given a pile of toy horses and pencils to play with, a nine-month-old doesn’t think to sort them into separate piles of horses and pencils, but an eighteen-month-old does.
Organizing the world into objects and categories is second nature to an adult but not to an infant, and even less to Robby the robot. We could endow him with a visual cortex in the form of a multilayer perceptron and show him labeled examples of all the objects and categories in the world-here’s Mommy close up, here’s Mommy far away-but we’d never be done. What we need is an algorithm that will spontaneously group together similar objects, or different images of the same object. This is the problem of clustering, and it’s one of the most intensively studied in machine learning.
A cluster is a set of similar entities, or at a minimum, a set of entities that are more similar to each other than to members of other clusters. It’s human nature to cluster things, and it’s often the first step on the road to knowledge. When we look up at the night sky, we can’t help seeing clusters of stars, and then we fancifully name them after shapes they resemble. Noticing that certain sets of elements had very similar chemical properties was the first step in discovering the periodic table. Each of those sets is now a column in it. Everything we perceive is a cluster, from friends’ faces to speech sounds. Without them, we’d be lost: children can’t learn a language before they learn to identify the characteristic sounds it’s made of, which they do in their first year of life, and all the words they then learn mean nothing without the clusters of real things they refer to. Confronted with big data-a very large number of objects-our first recourse is to group them into a more manageable number of clusters. A whole market is too coarse, and individual customers are too fine, so marketers divide markets into segments, which is their word for clusters. Even objects themselves are at bottom clusters of their observations, from all the different angles light falls on Mommy’s face to all the different sound waves baby hears as the word mommy. And we can’t think without objects, which is perhaps why quantum mechanics is so unintuitive: we want to visualize the subatomic world as particles colliding, or waves interfering, but it’s not really either.