Monday, October 27, 2014

Why the surprising pattern of no genetic continuity between people living in the same area but from different periods of time? Think the flu virus!

I used three slides as shown below to illustrate the idea of informative DNAs in my talk in last month’s workshop on genome and evolution in Naples, Italy.

The antigenic sites in human influenza A virus mutate and turn over quickly, which is critical for their survival or escape from human neutralizing antibodies and hence responsible for flu epidemics. As shown in Figure 1, two amino acid positions in hemagglutinin (156 and 145, panel a and b) turned over several times within a 30 year period, while two others (138 and 194, panel c and d) stayed largely unchanged (Figure from Shih et al, 2007). 

The flu results illustrate two important points with regard to evolutionary dynamics of a genome that have so far been grossly overlooked by the evolution and popgen field. First, fast evolving or less conserved DNAs are also functional rather than neutral as they are essential for quick adaptive needs in response to fast changing environments. Second, fast evolving DNAs turn over quickly and can be shown to violate the infinite sites model.  Hence, they cannot be used for phylogenetic inference. If one uses the fast changing sites in a flu virus to infer the phylogenetic relationship of the virus isolates responsible for different epidemics in a past period of say 10 years, one would reach the absurd conclusion that each epidemic was caused by a distinct type of flu virus with no genetic continuity among them rather than just minor variations of the same type.

Mutation rates in humans are of course much slower than that in a flu virus. But just like a flu virus, there are also fast and slow changing sites (Figure 2). The time scales are different but the principle is the same.  The fast changing sites may turn over every few thousand years and in fact make up the majority of the observed variant sites in humans when properly examined by us (Figure 3). This is why the field of ancient DNA kept producing the absurd pattern of no genetic continuity between people living in the same area but from different periods of time. All of the published analyses have simply used the wrong sites that are equivalent to the fast changing antigenic sites in a flu virus. What one should be using are sites with very slow mutation rates, like 1 mutation every 50,000 years. We have been busy reinterpreting the published DNAs for several years now and hope to submit our work soon.

Figure 1. (a and b) Frequency changes at residue sites 156 (a) and 145 (b) were highly dynamic. (c and d) Sites 138 (c) and 194 (d) did not undergo major frequency change over time.

Figure 2. A priori model of evolutionary dynamics of human genomic DNAs.

Figure 3. Difference between slow and fast evolving sites. Shown are a piece of homologous DNA in three different individuals or species. In the fast evolving DNAs making up the vast majority of human genome, there is obvious and verifiable violation of the infinite sites model. These DNAs have abundant overlapped mutant sites where independent mutations have occurred on the same site in different individuals or species. 


Shih, C-C., Hsiao, T-C., Ho, M-S., and Li, W-H. (2007) Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc Natl Acad Sci U S A. 104:6283-6288.

Thursday, October 23, 2014

Surprises from the 45,000 year old Siberian Ust'-Ishim: why is he not closer to Africans than East Asians are?

The genome of the 45,000 year old Siberian Ust'-Ishim published yesterday in Nature (see John Hawks blog) again repeated the same absurd pattern of no genetic continuity between local people living in different periods of time. The Ust'-Ishim genome is no more related to the 24,000 year old Siberian MA1 than to living East Asians. But this kind of surprises is getting boring for me to mention in this blog.  (John Hawks said this in his blog: "This is not an isolated case, it is another example of what we see throughout the world: Ancient people represented by DNA that seem to have very little to do with the people who live in the same areas today. We're not finding the ancestors of living populations so much as we are finding branches of populations we did not know existed.")

A new kind of surprise is the failure to do all necessary studies or to present all relevant studies. One expect that the Ust'-Ishim genome should be almost 2 fold less distant to living Africans than East Asians are because he had 45,000 years less time to accumulate distance as shown in Figure 1A. But the paper made no mention of this key expectation from the Out of Africa model. 

It also makes no sense for Ust'-Ishim to be an outlier to living East Asians on a PCA plot (Figure 2) since the distance between Ust'-Ishim and East Asians should be almost 2 fold less distant than between certain pair of East Asians, again because Ust'-Ishim had 45,000 years less time to accumulate mutations/distance (Figure 1A).

Our results with the 1000 genomes data showed that East Asians CHS and Europeans GBR are equidistant to Africans LWK or YRI in fast evolving SNPs representing genome average (Figure 1B). This of course has nothing to do with mutation rate and time but represents maximum genetic distance and natural selection. We are going to soon analyse the Ust'-Ishim genome in the same way and we fully expect Ust'-Ishim to be equidistant or more distant to Africans than East Asians are, which would be the same pattern as our first blog post here in 2007 had shown for the Neanderthals. Now such a result would be truly inconvenient for the Out of Africa model, which is probably why it was left out in the paper. 

Figure 1

Figure 2

Thursday, September 11, 2014

More ancient DNA surprises from ASHG 2014 abstracts

Two interesting ancient DNA abstracts from the ASHG 2014 meeting.  Just like my last post here, the surprise is again (and again and again.....again....) that there is no genetic continuity between local people living today and those locals in the past, or between local people living in different periods in the past. 

Capture of 390,000 SNPs in dozens of ancient central Europeans reveals a population turnover in Europe thousands of years after the advent of farming. I. Lazaridis, W. Haak, N. Patterson, N. Rohland, S. Mallick, B. Llamas, S. Nordenfelt, E. Harney, A. Cooper, K. W. Alt, D. Reich.
   To understand the population transformations that took place in Europe since the early Neolithic, we used a DNA capture technique to obtain reads covering ~390 thousand single nucleotide polymorphisms (SNPs) from a number of different archaeological cultures of central Europe (Germany and Hungary). The samples spanned the time period from 7,500 BP to 3,500 BP (Early Neolithic to Early Bronze Age periods) and most of them were previously studied using mtDNA (Brandt, Haak et al., Science, 2013). The captured SNPs include about 360,000 SNPs from the Affymetrix Human Origins Array that were discovered in African individuals, as well as about 30,000 SNPs chosen for other reasons (that are thought to have been affected by natural selection, or to have phenotypic effects, or are useful in determining Y-chromosome haplogroups). By analyzing this data together with a dataset of 2,345 present-day humans and other published ancient genomes, we show that late Neolithic inhabitants of central Europe belonging to the Corded Ware culture were not a continuation of the earlier occupants of the region. Our results highlight the importance of migration and major population turnover in Europe long after the arrival of farming. * Contributed equally to this work.

Insights into British and European population history from ancient DNA sequencing of Iron Age and Anglo-Saxon samples from Hinxton, England. S. Schiffels, W. Haak, B. Llamas, E. Popescu, L. Loe, R. Clarke, A. Lyons, P. Paajanen, D. Sayer, R. Mortimer, C. Tyler-Smith, A. Cooper, R. Durbin.
   British population history is shaped by a complex series of repeated immigration periods and associated changes in population structure. It is an open question however, to what extent each of these changes is reflected in the genetic ancestry of the current British population. Here we use ancient DNA sequencing to help address that question. We present whole genome sequences generated from five individuals that were found in archaeological excavations at the Wellcome Trust Genome Campus near Cambridge (UK), two of which are dated to around 2,000 years before present (Iron Age), and three to around 1,300 years before present (Anglo-Saxon period). Good preservation status allowed us to generate one high coverage sequence (12x) from an Iron Age individual, and four low coverage sequences (1x-4x) from the other samples.   By providing the first ancient whole genome sequences from Britain, we get a unique picture of the ancestral populations in Britain before and after the Anglo-Saxon immigrations. We use modern genetic reference panels such as the 1000 Genomes Project to examine the relationship of these ancient samples with present day population genetic data. Results from principal component analysis suggest that all samples fall consistently within the broader Northern European context, which is also consistent with mtDNA haplogroups. In addition, we obtain a finer structural genetic classification from rare genetic variants and haplotype based methods such as FineStructure. Reflecting more recent genetic ancestry, results from these methods suggest significant differences between the Iron Age and the Anglo-Saxon period samples when compared to other European samples. We find in particular that while the Anglo-Saxon samples resemble more closely the modern British population than the earlier samples, the Iron Age samples share more low frequency variation than the later ones with present day samples from southern Europe, in particular Spain (1000GP IBS). In addition the Anglo-Saxon period samples appear to share a stronger older component with Finnish (1000GP FIN) individuals. Our findings help characterize the ancestral European populations involved in major European migration movements into Britain in the last 2,000 years and thus provide more insights into the genetic history of people in northern Europe.

Friday, August 29, 2014

Another ancient DNA surprise: history of the New World Arctic people

It has been widely noticed repeatedly that every ancient DNA research result has been a great surprise, starting from our 2008 paper or the first post on this blog back in 2007. The latest is a Science paper today The genetic prehistory of the New World Arctic. The surprises here are 1) again (and again and again.....again....) that there is no genetic continuity between local people living today and those locals in the past(>2000 years old); again and again ... replacement rather than regional continuity, following exactly the footsteps of the Out of Africa model superseding the Multiregional model. 2) no sex between people who lived side by side;"Elsewhere, as soon as people meet each other, they have sex," says Willerslev. "Even potentially different species like Neanderthals [and modern humans] had sex, so this finding is extremely surprising." (3) extreme low genetic diversity in mtDNA in ancient Paleo-Eskimos. "I can't remember any other group having such low diversity," says Willerslev. For quote by Willerslev, see see this news piece.

Well, just like we said in our post on the 400K year old Heidelbergensis DNA, it would be a complete surprise if the field of ancient DNA as it is presently practiced could produce any sensible and non-surprising result consistent with common sense and fossil and cultural records. When you use noninformative DNAs to do your analytic work, what can you expect other than meaningless trash.

Of course we are working hard to reinterpret these newly published DNA sequences and we should soon publish our results (constantly delayed by newly released DNAs needing reinterpretations) that should be a very pleasant and intellectually satisfying surprise to all. For example, as our new analysis shows, the iceman Otzi was indeed most closely related to the local living Italians as common sense would expect, rather than to remote island people the Sardinians as is now mistakenly concluded by the literature. 

Monday, August 18, 2014

Secrets of the creative brain

There was a recent good article on creativity,Secrets of the creative brain.  A related blog post, The Psycholpathology of Genius.

We are actively working on the genetic basis of complex traits and the most complex is obviously creativity and intelligence. According to the threshold theory, Creativity is not IQ only and a score of 120 is the threshold. Lower or much higher than than that may hurt creativity.

Some quotes from Secrets of the creative brain:

One possible contributory factor is a personality style shared by many of my creative subjects. These subjects are adventuresome and exploratory. They take risks. Particularly in science, the best work tends to occur in new frontiers. (As a popular saying among scientists goes: “When you work at the cutting edge, you are likely to bleed.”) 

I’ve been struck by how many of these people refer to their most creative ideas as “obvious.” Since these ideas are almost always the opposite of obvious to other people, creative luminaries can face doubt and resistance when advocating for them. As one artist told me, “The funny thing about [one’s own] talent is that you are blind to it. You just can’t see what it is when you have it … When you have talent and see things in a particular way, you are amazed that other people can’t see it.” Persisting in the face of doubt or rejection, for artists or for scientists, can be a lonely path—one that may also partially explain why some of these people experience mental illness.

One interesting paradox that has emerged during conversations with subjects about their creative processes is that, though many of them suffer from mood and anxiety disorders, they associate their gifts with strong feelings of joy and excitement. “Doing good science is simply the most pleasurable thing anyone can do,” one scientist told me. “It is like having good sex. It excites you all over and makes you feel as if you are all-powerful and complete.” This is reminiscent of what creative geniuses throughout history have said. 

Many creative people are autodidacts. 

Many creative people are polymaths, as historic geniuses including Michelangelo and Leonardo da Vinci were

Creative people tend to be very persistent, even when confronted with skepticism or rejection. 

Some people see things others cannot, and they are right, and we call them creative geniuses. Some people see things others cannot, and they are wrong, and we call them mentally ill. And some people, like John Nash, are both.

Sunday, August 17, 2014

Testing the infinite sites assumption

We presented the following poster at the "1000 genomes and beyond" meeting, Cambridge, UK, 24-26, June 2014. We are also going to present it in the ASHG 2014 meeting in Oct 2014, San Diego. 

The bottom line is that there are very few neutral or junk DNAs in the human genome, at least when one examines the genome by using experimental approaches. (a new paper of ours on disproving the neutral theory by using an experimental approach has just got published here, titled Scoring the collective effects of SNPs: associations of minor alleles with complex traits in model organisms.)  All previous studies used only bioinformatics approaches. Their conclusions of less than 10% functional genome are based on UNCERTAIN assumptions and therefore are mostly meaningless. The field must realize that it is time to stop such senseless researches based on senseless assumptions. We should either do experiments without any prior assumptions or if we have to, we must only use a priori sound intuitions as our assumptions.

The abstract, introduction and discussion of the our poster are posted here. The poster can be downloaded from my lab website.

  Abstract The infinite sites model of the neutral theory is a fundamental assumption underlying nearly all population genetic and phylogenetic studies today but has yet to be properly tested. We here tested it from two novel perspectives using the 1000 genomes dataset. First, we examined the genetic diversity patterns of different human populations using a variety of different types of SNPs, such as a random set of SNPs representing genome average, stop codon, nonsyn, syn, etc. Patterns shown by a random set of SNPs are expected to be similar to those shown by known functional stop codon SNPs, if most SNPs are not neutral. In contrast, neutral SNPs should show a most different pattern from stop codon SNPs. Second, it has long been well known that most genetic variations are shared among different human groups, which has been interpreted from the infinites sites perspective to mean few genetic differences among the ethnic groups (Lewontin, 1972). But the possibility of saturation or independent mutations to account for this phenomenon has yet to be examined and excluded. We compared the number of shared SNPs in DNAs of different evolutionary rates among different human populations to see if shared SNPs are in fact a result of independent mutations or saturation and hence more common in fast evolving DNAs relative to slow ones. We found that a random set of SNPs are just like the stop codon SNPs in showing Africans to have the largest genetic diversity. Shared SNPs are enriched in fast evolving DNAs. These results suggest that the vast majority of the human genome do not follow the infinite sites model.

     Introduction  Molecular studies have so far relied on the Neutral theory and its infinite sites assumption. The Neutral theory was originally inspired by the so called molecular clock which was in turn inspired by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a simpler outgroup. In recent papers, we have shown that the equidistance result has been incorrectly interpreted by the molecular clock with grave consequences on phylogenetic studies: nearly all past studies have used non-informative DNAs assumed to be neutral but have now been shown by us to be under selection (Hu et al, 2013, Huang, 2010). The neutral theory was mistaken right from its inception. We have developed the maximum genetic diversity (MGD) hypothesis to absorb and supersede the neutral theory (Hu et al, 2013). From this more correct/complete theoretical perspective, we here tested whether the infinite sites model holds for the majority of the human genome as is commonly assumed.  

       Discussion: mutation rate, sequence conservation, and neutrality
       The results suggest that the vast majority of human genome do not follow the infinite sites model and are not neutral. Only a very limited sites: the non-syn slow evolving SNPs as defined here, behaved uniquely among all the SNPs examined and appear to be neutral or follow the infinite sites model. They are not deleterious as they are different from stop codon SNPs.  They are also not under positive selection as positively selected genes tend to be fast evolving.  To the dramatic difference between slow and fast evolving DNAs as shown here, we cannot come up with a meaningful explanation using any known schemes other than the recently proposed idea of maximum genetic diversity. 
       Variation in mutation rate in different regions of the human nuclear genome may exceed 1000 fold.  That a gene is slow evolving could be due to at least two reasons.  One is being located in a region of the genome with slow mutation rates.  This however may not apply to the difference in mutation rates between non-synonymous and synonymous sites of the same gene as found here.  Alternatively, most mutations may hit functional sites and be negatively selected by the need to maintain the internal integrity/order of a biological system. It would take many mutations and hence a long time before a neutral site is hit, thus giving the appearance of a slow mutation rate. Since changes in such neutral sites take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection. 
        Hence, sequence conservation per se may not automatically indicate functionality of variants within such sequences as is commonly assumed.  Less conserved sequences are more important for adaptation to external environment, while the more conserved ones are important for internal integrity of a system. To a virus or bacteria facing elimination by human medicines, the fast evolving parts of their genome is far more critical/functional to their survival than their more conserved parts. The popular assumption of neutrality/non-functionality for the less conserved parts of the genome overlooks their fundamental function in quick adaptation.  


Wednesday, July 9, 2014

Upcoming workshop, The Genome: Structure, Expression, and Evolution

I am going to be an invited speaker on a topic of my choice at the workshop on "The Genome: structure, expression and evolution", Sept 22-24, 2014 at Naples, Italy.  see the announcement here

Monday, June 9, 2014

How the MGD hypothesis has been received by the literature upto May 2014

星星之火可以燎原(A single spark can start a prairie fire
-       Chairman Mao in a letter on Jan 5, 1930 in criticism of certain pessimistic views then existing in the party.

It is super hard to achieve a revolution in science, definitely harder than overthrowing a government of a country. We are working hard to use our theory to solve real world problems that challenge the existing framework. I promise that our best experimental work has yet to come. If our theory cannot do better in solving real world puzzles than the existing paradigm, we would consider it useless and non-scientific, period. 
Not surprisingly to anyone experienced in science, we still face considerable difficulty in publishing our work in Western mainstream journals. Perhaps also not surprisingly, we have no problems in publishing in flagship journals in China, which makes one acutely appreciate the value of a multi-polar world.  But real positive changes have been consistently happening. Here is a brief summary of how the printed literature have so far viewed our Maximum Genetic Diversity (MGD) hypothesis since its publication in 2008. Other than a minor criticism by the 2014 Prigogine Award recipient Dr. Mae Wan Ho, who has given a very detailed description and a glowing view of the hypothesis in her 2009 book chapter, there is not a single formal criticism of the main themes of the hypothesis. All five citations so far have been positive as shown in the following slides. I prepared these slides recently for an academic review committee made of mostly non-specialists in the field, who probably have to rely on others' views to appreciate our work.
Dr. Ho made this criticism: “I believe Huang’s hypothesis that epigenetically complex organisms are less tolerant of genetic or germ line diversity is incomplete, because the level of germ line diversity is actively maintained.” I however believe this is a minor misunderstanding. She was talking about things like RNA editing, splicing, somatic mutations, etc in generating sequence diversity in complex organisms. But most of these are either ordered processes rather than random or somatic events that cannot pass on to the next generation. The genetic diversity defined by the MGD hypothesis that must be suppressed during advances in complexity is on the contrary all generated by random events in the germ line, i.e., random mutations. The key concept here is that random entropy must be suppressed in order to evolve higher levels of ordered diversity/complexity. This is a priori sound intuition that no one so far has dared to risk one’s reputation and sanity to attack. As evidenced by all proven and settled laws of hard sciences, Nature does follow human intuitions in an uncanny and astonishing way, whether some people like it or not. The most brilliant minds of humanity like a Godel or Einstein all know this. Quantum mechanics may be the only exception but no one really understands quantum mechanics and it is far from certain that it is complete and proven. That the micro world may operate in a counter-intuitive way may eventually turn out to be merely a premature and primitive understanding of how the universe really works. 

Tuesday, March 11, 2014

Origins of Australian aborigines and a unified hybridization model of human evolution: integrating informative data from autosomes, Y chr, and mtDNA

I am going to Spain in a few days for the Cell Symposium on Human Evolution on March 16-18th to present a poster as shown below (my lab website has a better resolution file for download, The introduction and main conclusions, as copied from the poster, is in the following.
A common difficulty for all existing models of human evolution has been to account for  the unusual diversity and admixed features of the Australian aborigines (AUA). A list of morphological features aimed at defining modern humans would exclude both modern AUA and Neanderthals, indicating some shared traits between the two. Fossil studies lack fine resolution power and molecular studies have yet to produce even just an internally coherent model uniting autosomes, Y chr, and mtDNA, let alone consistency with fossils/phenotypes and common sense. Almost every ancient DNA finding has been a surprise to the existing models, the latest being the 400K year old Heidelbergensis.
Molecular studies have so far relied on the Neutral theory and its infinite sites assumption. The Neutral theory was originally inspired by the so called molecular clock which was in turn inspired by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a simpler outgroup. In recent papers, we have shown that the equidistance result has been incorrectly interpreted by the molecular clock with grave consequences on phylogenetic studies: nearly all past studies have used non-informative DNAs assumed to be neutral but have now been shown by us to be under selection. We have developed the maximum genetic diversity (MGD) hypothesis to absorb and supersede the neutral theory. From this more correct/complete theoretical perspective that has coherently accounted for all major known observations, we developed new methods and studied the origins of humans using the 1000 genomes project and other public data. 
We found direct evidence of higher MGD or lower purifying selection for African San and Pygmy specific Y chr that will dramatically rewrite Y phylogeny. Certain regions of Y with low contents of repetitive DNAs have few SNPs in all Y haplotypes except being highly variable in A and B. But in other regions, similar degrees of variations are found for all haplotypes. Thus, the apparent deep separation or distance between A and B or between A/B and other haplotypes is not related to time as is commonly assumed. San/Pygmies have more positions in their genomes that when mutated are not deleterious to their physiology, which creates higher MGD and in turn higher innate or natural adaptive capacity required for a primitive life style.
Two types of DNAs are a priori informative to phylogenetic studies: slow evolving neutral DNAs from autosomes are informative to separation time and functional DNAs from mitochondria (mt) and Y chr are informative to shared internal physiology. Non-informative to phylogeny but informative to adaptation to outside environments are the fast evolving DNAs (speed is essential for quick adaptation to fast changing environments). We verified the informative nature of different SNPs by testing whether autosomes can reveal the known admixture history of Mexicans (MXL): those with R1b3 Y haplotype should have relatively more European and less Chinese autosomes than those with Q haplotype; those with B2 mtDNA should be the closest to the autosomes of B2 Northern Chinese. The slow evolving SNPs gave the expected result while all other autosome SNPs produced only senseless results.
Integrating autosomes, Y and mtDNA, we found just two original branches of humans, Europeans with R1a Y chr and U2 mtDNA and Africans with B and L1c, with a conservatively estimated split time of 2.1 million years. All other Y or mt branches originated by admixture and adaptive co-evolution as evidenced by autosomes data. We found a farmers-associated SNP3010 with highest frequency in D4 Chinese and what may be gene conversions in mtDNA in admixed humans (data not shown due to space limitations). Southern Chinese (CHS) O2 Y-chr and B2 mtDNA were the oldest or least admixed Asian group and derived from admixture of R1a/U2 Europeans and B/L1c Africans before leaving Africa for Asia. Heidelbergensis, Denisovans, and Neanderthals were descendants of a more recent interbreeding event in Iberia between migrating Africans and local Europeans, who then moved across Eurasia and became direct ancestors of AUA. These results suggest a unified hybridization model consistent with physical/cultural traits and fossil records, which explains the wide diversity of Australian aborigines. Since the original branching of two groups, further diversification of humans occurred largely by repeated cycles of migration/hybridization followed by independent evolution followed by further migration/hybridization, an intuitively expected pattern if homo has been a single species since the first co-appearance of H. habilis and H. rudolfensis.