Monday, October 27, 2014

Why the surprising pattern of no genetic continuity between people living in the same area but from different periods of time? Think the flu virus!

I used three slides as shown below to illustrate the idea of informative DNAs in my talk in last month’s workshop on genome and evolution in Naples, Italy.

The antigenic sites in human influenza A virus mutate and turn over quickly, which is critical for their survival or escape from human neutralizing antibodies and hence responsible for flu epidemics. As shown in Figure 1, two amino acid positions in hemagglutinin (156 and 145, panel a and b) turned over several times within a 30 year period, while two others (138 and 194, panel c and d) stayed largely unchanged (Figure from Shih et al, 2007). 

The flu results illustrate two important points with regard to evolutionary dynamics of a genome that have so far been grossly overlooked by the evolution and popgen field. First, fast evolving or less conserved DNAs are also functional rather than neutral as they are essential for quick adaptive needs in response to fast changing environments. Second, fast evolving DNAs turn over quickly and can be shown to violate the infinite sites model.  Hence, they cannot be used for phylogenetic inference. If one uses the fast changing sites in a flu virus to infer the phylogenetic relationship of the virus isolates responsible for different epidemics in a past period of say 10 years, one would reach the absurd conclusion that each epidemic was caused by a distinct type of flu virus with no genetic continuity among them rather than just minor variations of the same type.

Mutation rates in humans are of course much slower than that in a flu virus. But just like a flu virus, there are also fast and slow changing sites (Figure 2). The time scales are different but the principle is the same.  The fast changing sites may turn over every few thousand years and in fact make up the majority of the observed variant sites in humans when properly examined by us (Figure 3). This is why the field of ancient DNA kept producing the absurd pattern of no genetic continuity between people living in the same area but from different periods of time. All of the published analyses have simply used the wrong sites that are equivalent to the fast changing antigenic sites in a flu virus. What one should be using are sites with very slow mutation rates, like 1 mutation every 50,000 years. We have been busy reinterpreting the published DNAs for several years now and hope to submit our work soon.

Figure 1. (a and b) Frequency changes at residue sites 156 (a) and 145 (b) were highly dynamic. (c and d) Sites 138 (c) and 194 (d) did not undergo major frequency change over time.

Figure 2. A priori model of evolutionary dynamics of human genomic DNAs.

Figure 3. Difference between slow and fast evolving sites. Shown are a piece of homologous DNA in three different individuals or species. In the fast evolving DNAs making up the vast majority of human genome, there is obvious and verifiable violation of the infinite sites model. These DNAs have abundant overlapped mutant sites where independent mutations have occurred on the same site in different individuals or species. 


Shih, C-C., Hsiao, T-C., Ho, M-S., and Li, W-H. (2007) Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc Natl Acad Sci U S A. 104:6283-6288.

Thursday, October 23, 2014

Surprises from the 45,000 year old Siberian Ust'-Ishim: why is he not closer to Africans than East Asians are?

The genome of the 45,000 year old Siberian Ust'-Ishim published yesterday in Nature (see John Hawks blog) again repeated the same absurd pattern of no genetic continuity between local people living in different periods of time. The Ust'-Ishim genome is no more related to the 24,000 year old Siberian MA1 than to living East Asians. But this kind of surprises is getting boring for me to mention in this blog.  (John Hawks said this in his blog: "This is not an isolated case, it is another example of what we see throughout the world: Ancient people represented by DNA that seem to have very little to do with the people who live in the same areas today. We're not finding the ancestors of living populations so much as we are finding branches of populations we did not know existed.")

A new kind of surprise is the failure to do all necessary studies or to present all relevant studies. One expect that the Ust'-Ishim genome should be almost 2 fold less distant to living Africans than East Asians are because he had 45,000 years less time to accumulate distance as shown in Figure 1A. But the paper made no mention of this key expectation from the Out of Africa model. 

It also makes no sense for Ust'-Ishim to be an outlier to living East Asians on a PCA plot (Figure 2) since the distance between Ust'-Ishim and East Asians should be almost 2 fold less distant than between certain pair of East Asians, again because Ust'-Ishim had 45,000 years less time to accumulate mutations/distance (Figure 1A).

Our results with the 1000 genomes data showed that East Asians CHS and Europeans GBR are equidistant to Africans LWK or YRI in fast evolving SNPs representing genome average (Figure 1B). This of course has nothing to do with mutation rate and time but represents maximum genetic distance and natural selection. We are going to soon analyse the Ust'-Ishim genome in the same way and we fully expect Ust'-Ishim to be equidistant or more distant to Africans than East Asians are, which would be the same pattern as our first blog post here in 2007 had shown for the Neanderthals. Now such a result would be truly inconvenient for the Out of Africa model, which is probably why it was left out in the paper. 

Figure 1

Figure 2