Выбрать главу

So what’s the least content a learner can have in order to be useful? How about the laws of physics? After all, everything in the world obeys them (we believe), and they gave rise to evolution and (through it) the brain. Well, perhaps the Master Algorithm is implicit in the laws of physics, but if so, then we need to make it explicit. Just throwing data at the laws of physics won’t result in any new laws. Here’s one way to think about it: perhaps some field’s master theory is just the laws of physics compiled into a more convenient form for that field, but if so then we need an algorithm that finds a shortcut from that field’s data to its theory, and it’s not clear the laws of physics can be of any help with this. Another issue is that, if the laws of physics were different, the Master Algorithm would presumably still be able to discover them in many cases. Mathematicians like to say that God can disobey the laws of physics, but even he cannot defy the laws of logic. This may be so, but the laws of logic are for deduction; what we need is something equivalent, but for induction.

The five tribes of machine learning

Of course, we don’t have to start from scratch in our hunt for the Master Algorithm. We have a few decades of machine learning research to draw on. Some of the smartest people on the planet have devoted their lives to inventing learning algorithms, and some would even claim that they already have a universal learner in hand. We will stand on the shoulders of these giants, but take such claims with a grain of salt. Which raises the question: how will we know when we’ve found the Master Algorithm? When the same learner, with only parameter changes and minimal input aside from the data, can understand video and text as well as humans, and make significant new discoveries in biology, sociology, and other sciences. Clearly, by this standard no learner has yet been demonstrated to be the Master Algorithm, even in the unlikely case one already exists.

Crucially, the Master Algorithm is not required to start from scratch in each new problem. That bar is probably too high for any learner to meet, and it’s certainly very unlike what people do. For example, language does not exist in a vacuum; we couldn’t understand a sentence without our knowledge of the world it refers to. Thus, when learning to read, the Master Algorithm can rely on having previously learned to see, hear, and control a robot. Likewise, a scientist does not just blindly fit models to data; he can bring all his knowledge of the field to bear on the problem. Therefore, when making discoveries in biology, the Master Algorithm can first read all the biology it wants, relying on having previously learned to read. The Master Algorithm is not just a passive consumer of data; it can interact with its environment and actively seek the data it wants, like Adam, the robot scientist, or like any child exploring her world.

Our search for the Master Algorithm is complicated, but also enlivened, by the rival schools of thought that exist within machine learning. The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it.

For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. They’ve figured out how to incorporate preexisting knowledge into learning, and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible.

For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be.

Evolutionaries believe that the mother of all learning is natural selection. If it made us, it can make anything, and all we need to do is simulate it on the computer. The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like backpropagation does, but creating the brain that those adjustments can then fine-tune. The evolutionaries’ master algorithm is genetic programming, which mates and evolves computer programs in the same way that nature mates and evolves organisms.

Bayesians are concerned above all with uncertainty. All learned knowledge is uncertain, and learning itself is a form of uncertain inference. The problem then becomes how to deal with noisy, incomplete, and even contradictory information without falling apart. The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivates. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible.

For analogizers, the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions.

Each tribe’s solution to its central problem is a brilliant, hard-won advance. But the true Master Algorithm must solve all five problems, not just one. For example, to cure cancer we need to understand the metabolic networks in the celclass="underline" which genes regulate which others, which chemical reactions the resulting proteins control, and how adding a new molecule to the mix would affect the network. It would be silly to try to learn all of this from scratch, ignoring all the knowledge that biologists have painstakingly accumulated over the decades. Symbolists know how to combine this knowledge with data from DNA sequencers, gene expression microarrays, and so on, to produce results that you couldn’t get with either alone. But the knowledge we obtain by inverse deduction is purely qualitative; we need to learn not just who interacts with whom, but how much, and backpropagation can do that. Nevertheless, both inverse deduction and backpropagation would be lost in space without some basic structure on which to hang the interactions and parameters they find, and genetic programming can discover it. At this point, if we had complete knowledge of the metabolism and all the data relevant to a given patient, we could figure out a treatment for her. But in reality the information we have is always very incomplete, and even incorrect in places; we need to make headway despite that, and that’s what probabilistic inference is for. In the hardest cases, the patient’s cancer looks very different from previous ones, and all our learned knowledge fails. Similarity-based algorithms can save the day by seeing analogies between superficially very different situations, zeroing in on their essential similarities and ignoring the rest.

In this book we will synthesize a single algorithm will all these capabilities:

Our quest will take us across the territory of each of the five tribes. The border crossings, where they meet, negotiate and skirmish, will be the trickiest part of the journey. Each tribe has a different piece of the puzzle, which we must gather. Machine learners, like all scientists, resemble the blind men and the elephant: one feels the trunk and thinks it’s a snake, another leans against the leg and thinks it’s a tree, yet another touches the tusk and thinks it’s a bull. Our aim is to touch each part without jumping to conclusions; and once we’ve touched all of them, we will try to picture the whole elephant. It’s far from obvious how to combine all the pieces into one solution-impossible, according to some-but this is what we will do.