Remarkably, that was the common view until American mathematician Claude Shannon (1916–2001) came along and demonstrated how we can create arbitrarily accurate communication using even the most unreliable communication channels. What Shannon stated in his landmark paper “A Mathematical Theory of Communication,” published in the Bell System Technical Journal in July and October 1948, and in particular in his noisy channel-coding theorem, was that if you have available a channel with any error rate (except for exactly 50 percent per bit, which would mean that the channel was just transmitting pure noise), you are able to transmit a message in which the error rate is as accurate as you desire. In other words, the error rate of the transmission can be one bit out of n bits, where n can be as large as you define. So, for example, in the extreme, if you have a channel that correctly transmits bits of information only 51 percent of the time (that is, it transmits the correct bit just slightly more often than the wrong bit), you can nonetheless transmit messages such that only one bit out of a million is incorrect, or one bit out of a trillion or a trillion trillion.
How is this possible? The answer is through redundancy. That may seem obvious now, but it was not at the time. As a simple example, if I transmit each bit three times and take the majority vote, I will have substantially increased the reliability of the result. If that is not good enough, simply increase the redundancy until you get the reliability you need. Simply repeating information is the easiest way to achieve arbitrarily high accuracy rates from low-accuracy channels, but it is not the most efficient approach. Shannon’s paper, which established the field of information theory, presented optimal methods of error detection and correction codes that can achieve any target accuracy through any nonrandom channel.
Older readers will recall telephone modems, which transmitted information through noisy analog phone lines. These lines featured audibly obvious hisses and pops and many other forms of distortion, but nonetheless were able to transmit digital data with very high accuracy rates, thanks to Shannon’s noisy channel theorem. The same issue and the same solution exist for digital memory. Ever wonder how CDs, DVDs, and program disks continue to provide reliable results even after the disk has been dropped on the floor and scratched? Again, we can thank Shannon.
Computation consists of three elements: communication—which, as I mentioned, is pervasive both within and between computers—memory, and logic gates (which perform the arithmetic and logical functions). The accuracy of logic gates can also be made arbitrarily high by similarly using error detection and correction codes. It is due to Shannon’s theorem and theory that we can handle arbitrarily large and complex digital data and algorithms without the processes being disturbed or destroyed by errors. It is important to point out that the brain uses Shannon’s principle as well, although the evolution of the human brain clearly predates Shannon’s own! Most of the patterns or ideas (and an idea is also a pattern), as we have seen, are stored in the brain with a substantial amount of redundancy. A primary reason for the redundancy in the brain is the inherent unreliability of neural circuits.
The second important idea on which the information age relies is the one I mentioned earlier: the universality of computation. In 1936 Alan Turing described his “Turing machine,” which was not an actual machine but another thought experiment. His theoretical computer consists of an infinitely long memory tape with a 1 or a 0 in each square. Input to the machine is presented on this tape, which the machine can read one square at a time. The machine also contains a table of rules—essentially a stored program—that consist of numbered states. Each rule specifies one action if the square currently being read is a 0, and a different action if the current square is a 1. Possible actions include writing a 0 or 1 on the tape, moving the tape one square to the right or left, or halting. Each state will then specify the number of the next state that the machine should be in.
The input to the Turing machine is presented on the tape. The program runs, and when the machine halts, it has completed its algorithm, and the output of the process is left on the tape. Note that even though the tape is theoretically infinite in length, any actual program that does not get into an infinite loop will use only a finite portion of the tape, so if we limit ourselves to a finite tape, the machine will still solve a useful set of problems.
If the Turing machine sounds simple, it is because that was its inventor’s objective. Turing wanted his machine to be as simple as possible (but no simpler, to paraphrase Einstein). Turing and Alonzo Church (1903–1995), his former professor, went on to develop the Church-Turing thesis, which states that if a problem that can be presented to a Turing machine is not solvable by it, it is also not solvable by any machine, following natural law. Even though the Turing machine has only a handful of commands and processes only one bit at a time, it can compute anything that any computer can compute. Another way to say this is that any machine that is “Turing complete” (that is, that has equivalent capabilities to a Turing machine) can compute any algorithm (any procedure that we can define).
A block diagram of a Turing machine with a head that reads and writes the tape and an internal program consisting of state transitions.
“Strong” interpretations of the Church-Turing thesis propose an essential equivalence between what a human can think or know and what is computable by a machine. The basic idea is that the human brain is likewise subject to natural law, and thus its information-processing ability cannot exceed that of a machine (and therefore of a Turing machine).
We can properly credit Turing with establishing the theoretical foundation of computation with his 1936 paper, but it is important to note that he was deeply influenced by a lecture that Hungarian American mathematician John von Neumann (1903–1957) gave in Cambridge in 1935 on his stored program concept, a concept enshrined in the Turing machine.1 In turn, von Neumann was influenced by Turing’s 1936 paper, which elegantly laid out the principles of computation, and made it required reading for his colleagues in the late 1930s and early 1940s.2
In the same paper Turing reports another unexpected discovery: that of unsolvable problems. These are problems that are well defined with unique answers that can be shown to exist, but that we can also prove can never be computed by any Turing machine—that is to say, by any machine, a reversal of what had been a nineteenth-century dogma that problems that could be defined would ultimately be solved. Turing showed that there are as many unsolvable problems as solvable ones. Austrian American mathematician and philosopher Kurt Gödel reached a similar conclusion in his 1931 “incompleteness theorem.” We are thus left with the perplexing situation of being able to define a problem, to prove that a unique answer exists, and yet know that the answer can never be found.
Turing had shown that at its essence, computation is based on a very simple mechanism. Because the Turing machine (and therefore any computer) is capable of basing its future course of action on results it has already computed, it is capable of making decisions and modeling arbitrarily complex hierarchies of information.
In 1939 Turing designed an electronic calculator called Bombe that helped decode messages that had been encrypted by the Nazi Enigma coding machine. By 1943, an engineering team influenced by Turing completed what is arguably the first computer, the Colossus, that enabled the Allies to continue decoding messages from more sophisticated versions of Enigma. The Bombe and Colossus were designed for a single task and could not be reprogrammed for a different one. But they performed this task brilliantly and are credited with having enabled the Allies to overcome the three-to-one advantage that the German Luftwaffe enjoyed over the British Royal Air Force and win the crucial Battle of Britain, as well as to continue anticipating Nazi tactics throughout the war.