Выбрать главу

Until the ability to sequence tiny amounts of DNA from ancient bones, the standard view of human evolution was that a species called Homo erectus evolved in Africa before dispersing into Europe and Asia 2 million years ago. The species was astonishingly successful, evolving into Neanderthals and various other species including the diminutive Homo floresiensis, or the hobbit-man of the Indonesian island of Flores. In Africa, descendants of Homo erectus evolved into modern-day humans about 300,000 years ago. The first Homo sapiens dispersed out of Africa 125,000 years later, but they only got as far as the Arabian Peninsula, Syria and Turkey. They failed to establish permanent populations out of Africa and subsequently died out. But 60,000 years ago a second out-of-Africa migration occurred. These Homo sapiens were more successful, reaching South Asia between 50,000 and 60,000 years ago before making it to Australia a few thousand years later. Europe was successfully colonized around 40,000 years ago. America was colonized after Europe, sometime between 25,000 and 16,500 years ago. As Homo sapiens spread across the globe, they outcompeted the more primitive hominin species that evolved from Homo erectus, with Neanderthals being the last species to go extinct, about 30,000 years ago.

By the mid-1990s, speculation on whether humans dispersing from Africa had bred with Neanderthals had become rife, but in the absence of fossils of copulating pairs of Neanderthals and humans the debate was evidence-free. I remember that at some point in 1994 a colleague of mine was asked during a TV interview whether our ancestors mated with Neanderthals. He explained that there was no evidence but thought probably not. A few days later he received a letter from a viewer who wrote, ‘Neanderthals and modern humans have reproduced, and I have evidence. Please find enclosed a photo of my partner.’ An amusing anecdote, but given that Neanderthals died out many millennia ago, it seemed back in the 1990s that we might never know whether they interbred with humans. It is remarkable how genetics has advanced the field of human evolution in the last three decades.

The discovery of woman X’s finger bone, along with finds of well-preserved Neanderthal remains, coupled with significant technological advances in extracting DNA from ancient fossils and assembling genome sequences from the fragments, allowed biologists to address the question of interbreeding between Neanderthals, Homo sapiens and Denisovans in a scientifically rigorous way. But the work had to be conducted extraordinarily carefully.

Over time, DNA in dead bodies degrades as the chemical bonds between the atoms from which it is made break. The rate at which it breaks down depends upon the environment. The average temperature in the Denisova cave since the death of woman X has been about freezing, and the relatively dry air in the cave meant some DNA from woman X had survived and scientists were able to extract tiny amounts of DNA from the finger bone. Having done this, they had to be careful that the phalanx (the scientific name for a bone from a finger or toe) had not been contaminated with DNA from other people, including Dionisij and the scientists who discovered it. They also had to be certain the DNA they extracted belonged to something closely related to humans, and not to any other animals or bacteria that may have used or lived in the cave. By following extremely careful procedures to ensure the DNA was not contaminated with DNA from anywhere else, a group of biologists based in Germany was able to read, or sequence, the DNA of woman X. They could then compare her DNA with that of lots of modern humans living in different parts of the globe, and with DNA obtained from Neanderthal bones and teeth.

Although the amount of material used in the initial analysis of woman X came from a single finger bone, it generated a lot of data. DNA consists of strands of molecules called nucleobases, which come in four varieties called adenine, guanine, cytosine and thymine, or A, G, C and T. The order of bases in a strand of DNA is known as a DNA sequence, with AACACTGT being a different sequence from ATTAGAGC. Your genome consists of about 3 billion nucleobases linked together in 46 different strands, with a strand referred to as a chromosome. The genome of woman X was initially sequenced to a depth of 1.9 times, which means that, on average, each nucleobase on each chromosome was recorded just under two times. Any single A, G, C or T within a strand was recorded by a machine designed to read strings of nucleobases. Each nucleobase may have never been read, or may have been read a few times, but the average number of times each nucleobase was read was 1.9. But why read each nucleobase more than once?

When genomes are scored, it is not currently possible to read the order of nucleobases across an entire chromosome in one go. The reason for this is that sequencing involves taking a specimen’s DNA from multiple cells, and results in each chromosome being chopped into fragments. Many of these fragments are short and might only be about a few tens of nucleobases long. Genetic sequencing involves reading the order of nucleobases in very many fragments of DNA before using computer algorithms to stitch them together. The larger the number of times each nucleobase is scored along with its neighbours on different fragments of DNA, the greater the confidence that the gene sequence of a chromosome can be correctly assembled from all the different fragments.

You might wonder why this is so complicated. Imagine a very short sequence consisting of ACAGTCAGA which is split into two fragments, ACAG and TCAGA. How should these be joined? ACAG first and then TCAGA, or TCAGA and then ACAG? Things are even more complicated because we don’t know whether each fragment represents DNA going from left to right or from right to left in the genetic code. Should it be ACAG or GACA? With a depth of 1.9 times, it is hard to stitch together a whole genome with great confidence. These days it is not unusual to read of genomes being sequenced to a depth of 30 times, and indeed subsequent genomic work on woman X five years after the first study was published achieved close to this.

A useful analogy to help understand genome sequencing is to imagine taking multiple copies of this book, shredding each one, such that you have many fragments of a few words each, then trying to reassemble the book. Except it is much harder with genetic sequences than with a book because you only have four letters, rather than the twenty-six of the English alphabet, no spaces, and, unlike with words, you do not know whether to read each fragment from left to right, or vice versa. Furthermore, there are just over three quarters of a million characters in this book, while genomes of humans and our ancestors are much longer. Constructing genetic sequences is hard work.

The genome size of Neanderthals, humans and Denisovans is each about 3 billion base pairs. The scientists who genotyped woman X compared genome sequences between the Denisovan, a Neanderthal, twelve humans from different parts of the world and a chimpanzee. That is a huge amount of data: about 45 billion nucleobases. These comparisons revealed sections of Denisovan DNA in modern-day humans living in parts of South Asia and Oceania, while Neanderthal DNA snippets were found in modern-day Eurasian populations.

What are you most likely to trust when considering whether species similar to humans exist or once existed? My word for the existence of leprechauns, a grainy out-of-focus photo and the testimony of a man who used to be a pigeon for the Orang pendek, or 45 billion nucleobases and some extremely careful genome sequencing and formal statistical analyses for woman X? Science is evidence-based, and the evidence that our ancestors mated with Neanderthals and Denisovans is extremely compelling. Despite this, there is still the odd scientist who is attempting to rule out other processes, such as biased patterns of genetic replication errors (mutations in DNA), that could potentially generate the same patterns in data, but few people find these explanations compelling. Nevertheless, posing alternative hypotheses is important. Science is all about posing and testing hypotheses to identify those that can produce an observed pattern.