Friday, August 29, 2014

Another ancient DNA surprise: history of the New World Arctic people

It has been widely noticed repeatedly that every ancient DNA research result has been a great surprise, starting from our 2008 paper or the first post on this blog back in 2007. The latest is a Science paper today The genetic prehistory of the New World Arctic. The surprises here are 1) again (and again and again.....again....) that there is no genetic continuity between local people living today and those locals in the past(>2000 years old); again and again ... replacement rather than regional continuity, following exactly the footsteps of the Out of Africa model superseding the Multiregional model. 2) no sex between people who lived side by side;"Elsewhere, as soon as people meet each other, they have sex," says Willerslev. "Even potentially different species like Neanderthals [and modern humans] had sex, so this finding is extremely surprising." (3) extreme low genetic diversity in mtDNA in ancient Paleo-Eskimos. "I can't remember any other group having such low diversity," says Willerslev. For quote by Willerslev, see see this news piece.

Well, just like we said in our post on the 400K year old Heidelbergensis DNA, it would be a complete surprise if the field of ancient DNA as it is presently practiced could produce any sensible and non-surprising result consistent with common sense and fossil and cultural records. When you use noninformative DNAs to do your analytic work, what can you expect other than meaningless trash.

Of course we are working hard to reinterpret these newly published DNA sequences and we should soon publish our results (constantly delayed by newly released DNAs needing reinterpretations) that should be a very pleasant and intellectually satisfying surprise to all. For example, as our new analysis shows, the iceman Otzi was indeed most closely related to the local living Italians as common sense would expect, rather than to remote island people the Sardinians as is now mistakenly concluded by the literature. 

Monday, August 18, 2014

Secrets of the creative brain

There was a recent good article on creativity,Secrets of the creative brain.  A related blog post, The Psycholpathology of Genius.

We are actively working on the genetic basis of complex traits and the most complex is obviously creativity and intelligence. According to the threshold theory, Creativity is not IQ only and a score of 120 is the threshold. Lower or much higher than than that may hurt creativity.

Some quotes from Secrets of the creative brain:

One possible contributory factor is a personality style shared by many of my creative subjects. These subjects are adventuresome and exploratory. They take risks. Particularly in science, the best work tends to occur in new frontiers. (As a popular saying among scientists goes: “When you work at the cutting edge, you are likely to bleed.”) 

I’ve been struck by how many of these people refer to their most creative ideas as “obvious.” Since these ideas are almost always the opposite of obvious to other people, creative luminaries can face doubt and resistance when advocating for them. As one artist told me, “The funny thing about [one’s own] talent is that you are blind to it. You just can’t see what it is when you have it … When you have talent and see things in a particular way, you are amazed that other people can’t see it.” Persisting in the face of doubt or rejection, for artists or for scientists, can be a lonely path—one that may also partially explain why some of these people experience mental illness.

One interesting paradox that has emerged during conversations with subjects about their creative processes is that, though many of them suffer from mood and anxiety disorders, they associate their gifts with strong feelings of joy and excitement. “Doing good science is simply the most pleasurable thing anyone can do,” one scientist told me. “It is like having good sex. It excites you all over and makes you feel as if you are all-powerful and complete.” This is reminiscent of what creative geniuses throughout history have said. 

Many creative people are autodidacts. 

Many creative people are polymaths, as historic geniuses including Michelangelo and Leonardo da Vinci were

Creative people tend to be very persistent, even when confronted with skepticism or rejection. 

Some people see things others cannot, and they are right, and we call them creative geniuses. Some people see things others cannot, and they are wrong, and we call them mentally ill. And some people, like John Nash, are both.

Sunday, August 17, 2014

Testing the infinite sites assumption

We presented the following poster at the "1000 genomes and beyond" meeting, Cambridge, UK, 24-26, June 2014. We are also going to present it in the ASHG 2014 meeting in Oct 2014, San Diego. 

The bottom line is that there are very few neutral or junk DNAs in the human genome, at least when one examines the genome by using experimental approaches. (a new paper of ours on disproving the neutral theory by using an experimental approach has just got published here, titled Scoring the collective effects of SNPs: associations of minor alleles with complex traits in model organisms.)  All previous studies used only bioinformatics approaches. Their conclusions of less than 10% functional genome are based on UNCERTAIN assumptions and therefore are mostly meaningless. The field must realize that it is time to stop such senseless researches based on senseless assumptions. We should either do experiments without any prior assumptions or if we have to, we must only use a priori sound intuitions as our assumptions.

The abstract, introduction and discussion of the our poster are posted here. The poster can be downloaded from my lab website.

  Abstract The infinite sites model of the neutral theory is a fundamental assumption underlying nearly all population genetic and phylogenetic studies today but has yet to be properly tested. We here tested it from two novel perspectives using the 1000 genomes dataset. First, we examined the genetic diversity patterns of different human populations using a variety of different types of SNPs, such as a random set of SNPs representing genome average, stop codon, nonsyn, syn, etc. Patterns shown by a random set of SNPs are expected to be similar to those shown by known functional stop codon SNPs, if most SNPs are not neutral. In contrast, neutral SNPs should show a most different pattern from stop codon SNPs. Second, it has long been well known that most genetic variations are shared among different human groups, which has been interpreted from the infinites sites perspective to mean few genetic differences among the ethnic groups (Lewontin, 1972). But the possibility of saturation or independent mutations to account for this phenomenon has yet to be examined and excluded. We compared the number of shared SNPs in DNAs of different evolutionary rates among different human populations to see if shared SNPs are in fact a result of independent mutations or saturation and hence more common in fast evolving DNAs relative to slow ones. We found that a random set of SNPs are just like the stop codon SNPs in showing Africans to have the largest genetic diversity. Shared SNPs are enriched in fast evolving DNAs. These results suggest that the vast majority of the human genome do not follow the infinite sites model.

     Introduction  Molecular studies have so far relied on the Neutral theory and its infinite sites assumption. The Neutral theory was originally inspired by the so called molecular clock which was in turn inspired by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a simpler outgroup. In recent papers, we have shown that the equidistance result has been incorrectly interpreted by the molecular clock with grave consequences on phylogenetic studies: nearly all past studies have used non-informative DNAs assumed to be neutral but have now been shown by us to be under selection (Hu et al, 2013, Huang, 2010). The neutral theory was mistaken right from its inception. We have developed the maximum genetic diversity (MGD) hypothesis to absorb and supersede the neutral theory (Hu et al, 2013). From this more correct/complete theoretical perspective, we here tested whether the infinite sites model holds for the majority of the human genome as is commonly assumed.  

       Discussion: mutation rate, sequence conservation, and neutrality
       The results suggest that the vast majority of human genome do not follow the infinite sites model and are not neutral. Only a very limited sites: the non-syn slow evolving SNPs as defined here, behaved uniquely among all the SNPs examined and appear to be neutral or follow the infinite sites model. They are not deleterious as they are different from stop codon SNPs.  They are also not under positive selection as positively selected genes tend to be fast evolving.  To the dramatic difference between slow and fast evolving DNAs as shown here, we cannot come up with a meaningful explanation using any known schemes other than the recently proposed idea of maximum genetic diversity. 
       Variation in mutation rate in different regions of the human nuclear genome may exceed 1000 fold.  That a gene is slow evolving could be due to at least two reasons.  One is being located in a region of the genome with slow mutation rates.  This however may not apply to the difference in mutation rates between non-synonymous and synonymous sites of the same gene as found here.  Alternatively, most mutations may hit functional sites and be negatively selected by the need to maintain the internal integrity/order of a biological system. It would take many mutations and hence a long time before a neutral site is hit, thus giving the appearance of a slow mutation rate. Since changes in such neutral sites take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection. 
        Hence, sequence conservation per se may not automatically indicate functionality of variants within such sequences as is commonly assumed.  Less conserved sequences are more important for adaptation to external environment, while the more conserved ones are important for internal integrity of a system. To a virus or bacteria facing elimination by human medicines, the fast evolving parts of their genome is far more critical/functional to their survival than their more conserved parts. The popular assumption of neutrality/non-functionality for the less conserved parts of the genome overlooks their fundamental function in quick adaptation.  


Wednesday, July 9, 2014

Upcoming workshop, The Genome: Structure, Expression, and Evolution

I am going to be an invited speaker on a topic of my choice at the workshop on "The Genome: structure, expression and evolution", Sept 22-24, 2014 at Naples, Italy.  see the announcement here

Monday, June 9, 2014

How the MGD hypothesis has been received by the literature upto May 2014

星星之火可以燎原(A single spark can start a prairie fire
-       Chairman Mao in a letter on Jan 5, 1930 in criticism of certain pessimistic views then existing in the party.

It is super hard to achieve a revolution in science, definitely harder than overthrowing a government of a country. We are working hard to use our theory to solve real world problems that challenge the existing framework. I promise that our best experimental work has yet to come. If our theory cannot do better in solving real world puzzles than the existing paradigm, we would consider it useless and non-scientific, period. 
Not surprisingly to anyone experienced in science, we still face considerable difficulty in publishing our work in Western mainstream journals. Perhaps also not surprisingly, we have no problems in publishing in flagship journals in China, which makes one acutely appreciate the value of a multi-polar world.  But real positive changes have been consistently happening. Here is a brief summary of how the printed literature have so far viewed our Maximum Genetic Diversity (MGD) hypothesis since its publication in 2008. Other than a minor criticism by the 2014 Prigogine Award recipient Dr. Mae Wan Ho, who has given a very detailed description and a glowing view of the hypothesis in her 2009 book chapter, there is not a single formal criticism of the main themes of the hypothesis. All five citations so far have been positive as shown in the following slides. I prepared these slides recently for an academic review committee made of mostly non-specialists in the field, who probably have to rely on others' views to appreciate our work.
Dr. Ho made this criticism: “I believe Huang’s hypothesis that epigenetically complex organisms are less tolerant of genetic or germ line diversity is incomplete, because the level of germ line diversity is actively maintained.” I however believe this is a minor misunderstanding. She was talking about things like RNA editing, splicing, somatic mutations, etc in generating sequence diversity in complex organisms. But most of these are either ordered processes rather than random or somatic events that cannot pass on to the next generation. The genetic diversity defined by the MGD hypothesis that must be suppressed during advances in complexity is on the contrary all generated by random events in the germ line, i.e., random mutations. The key concept here is that random entropy must be suppressed in order to evolve higher levels of ordered diversity/complexity. This is a priori sound intuition that no one so far has dared to risk one’s reputation and sanity to attack. As evidenced by all proven and settled laws of hard sciences, Nature does follow human intuitions in an uncanny and astonishing way, whether some people like it or not. The most brilliant minds of humanity like a Godel or Einstein all know this. Quantum mechanics may be the only exception but no one really understands quantum mechanics and it is far from certain that it is complete and proven. That the micro world may operate in a counter-intuitive way may eventually turn out to be merely a premature and primitive understanding of how the universe really works. 

Tuesday, March 11, 2014

Origins of Australian aborigines and a unified hybridization model of human evolution: integrating informative data from autosomes, Y chr, and mtDNA

I am going to Spain in a few days for the Cell Symposium on Human Evolution on March 16-18th to present a poster as shown below (my lab website has a better resolution file for download, The introduction and main conclusions, as copied from the poster, is in the following.
A common difficulty for all existing models of human evolution has been to account for  the unusual diversity and admixed features of the Australian aborigines (AUA). A list of morphological features aimed at defining modern humans would exclude both modern AUA and Neanderthals, indicating some shared traits between the two. Fossil studies lack fine resolution power and molecular studies have yet to produce even just an internally coherent model uniting autosomes, Y chr, and mtDNA, let alone consistency with fossils/phenotypes and common sense. Almost every ancient DNA finding has been a surprise to the existing models, the latest being the 400K year old Heidelbergensis.
Molecular studies have so far relied on the Neutral theory and its infinite sites assumption. The Neutral theory was originally inspired by the so called molecular clock which was in turn inspired by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a simpler outgroup. In recent papers, we have shown that the equidistance result has been incorrectly interpreted by the molecular clock with grave consequences on phylogenetic studies: nearly all past studies have used non-informative DNAs assumed to be neutral but have now been shown by us to be under selection. We have developed the maximum genetic diversity (MGD) hypothesis to absorb and supersede the neutral theory. From this more correct/complete theoretical perspective that has coherently accounted for all major known observations, we developed new methods and studied the origins of humans using the 1000 genomes project and other public data. 
We found direct evidence of higher MGD or lower purifying selection for African San and Pygmy specific Y chr that will dramatically rewrite Y phylogeny. Certain regions of Y with low contents of repetitive DNAs have few SNPs in all Y haplotypes except being highly variable in A and B. But in other regions, similar degrees of variations are found for all haplotypes. Thus, the apparent deep separation or distance between A and B or between A/B and other haplotypes is not related to time as is commonly assumed. San/Pygmies have more positions in their genomes that when mutated are not deleterious to their physiology, which creates higher MGD and in turn higher innate or natural adaptive capacity required for a primitive life style.
Two types of DNAs are a priori informative to phylogenetic studies: slow evolving neutral DNAs from autosomes are informative to separation time and functional DNAs from mitochondria (mt) and Y chr are informative to shared internal physiology. Non-informative to phylogeny but informative to adaptation to outside environments are the fast evolving DNAs (speed is essential for quick adaptation to fast changing environments). We verified the informative nature of different SNPs by testing whether autosomes can reveal the known admixture history of Mexicans (MXL): those with R1b3 Y haplotype should have relatively more European and less Chinese autosomes than those with Q haplotype; those with B2 mtDNA should be the closest to the autosomes of B2 Northern Chinese. The slow evolving SNPs gave the expected result while all other autosome SNPs produced only senseless results.
Integrating autosomes, Y and mtDNA, we found just two original branches of humans, Europeans with R1a Y chr and U2 mtDNA and Africans with B and L1c, with a conservatively estimated split time of 2.1 million years. All other Y or mt branches originated by admixture and adaptive co-evolution as evidenced by autosomes data. We found a farmers-associated SNP3010 with highest frequency in D4 Chinese and what may be gene conversions in mtDNA in admixed humans (data not shown due to space limitations). Southern Chinese (CHS) O2 Y-chr and B2 mtDNA were the oldest or least admixed Asian group and derived from admixture of R1a/U2 Europeans and B/L1c Africans before leaving Africa for Asia. Heidelbergensis, Denisovans, and Neanderthals were descendants of a more recent interbreeding event in Iberia between migrating Africans and local Europeans, who then moved across Eurasia and became direct ancestors of AUA. These results suggest a unified hybridization model consistent with physical/cultural traits and fossil records, which explains the wide diversity of Australian aborigines. Since the original branching of two groups, further diversification of humans occurred largely by repeated cycles of migration/hybridization followed by independent evolution followed by further migration/hybridization, an intuitively expected pattern if homo has been a single species since the first co-appearance of H. habilis and H. rudolfensis.

Thursday, December 5, 2013

A mitochondrial genome sequence of a hominin from Sima de los Huesos confirms our findings

The DNA of the ~400,000 year old human from Sima de los Huesos is a big surprise to everyone except perhaps us. A look at the following abstracts of ours from several past presentations would easily show why.
(see the paper
 and Nature news on the surprise here:

My talk last October at the CAS-MPG Partner Institute for Computational Biology in Shanghai.
Here is the abstract that is still available from that website.  

Seminar Talk by Prof. Shi Huang
Author:webmaster    News time:2012-10-22
Speaker: Prof. Shi Huang, Central South University
Time: 13:00-14:45pm, Oct. 25, 2012
Place: Room 223, SIBS main building
Host: Dr. Shuhua Xu
Title: Evidence for the Multiregional hypothesis of modern human origins from phylogeny-informative sequences
Abstract:Recent studies support the Maximum Genetic Diversity hypothesis that nucleotide diversities in fast evolving sequences are mostly at optimum level and hence cannot be informative to phylogeny inference.  Consistently, here we found that fast evolving sequences are enriched with coincident or overlapping substitutions in humans and chimpanzees and also with shared SNPs among human races.  We therefore analyzed the phylogeny-informative slow evolving sequences to revisit the question of modern human origins.   We found that Europeans have greater nucleotide diversity than East Asians and Africans.  The split time between Europeans and non-Europeans and between East Asians and Africans were estimated to be ~1.98 and ~1.78 Myr ago, respectively.  East Asians are slightly closer to Europeans than Africans are, confirming genetic exchange between East Asians and Europeans as previously revealed by morphological studies.  Europeans show distinct SNP patterns in non-synonymous SNPs from non-Europeans. Denisovans may be Africa H. heidelbergensis who may have migrated to Europe ~0.38 Myr ago and interbred with female H. antecessor in Europe to give rise to Neanderthals who had predominantly Denisovan nuclear genomes. These molecular results are consistent with fossil records and the Multiregional hypothesis. 

This October I gave a talk on human origins at the HGV2013 in Seoul, Korea. Here below is the abstract.  (Ironically and a surprise to us, the same abstract was rejected even a chance for a poster presentation at the ASHG2013 meeting in Boston.  But we did present two other abstracts, one on Autism and the other on transgenerational inheritance in yeast).  

Understanding human evolutionary history may help solve health problems today. Using the 1000 genomes and other public sequences, we show that genetic diversity patterns of various human groups in most types of SNPs, including nonsyn sites in fast evolving proteins, syn sites in any proteins, non-coding, and mtDNAs, are all broadly similar to that of functional SNPs such as those altering/creating stop codons and splicing sites, thereby indicating purifying selection for most common SNPs and rendering them non-informative for phylogenetics. No population has uniformly lowest diversity in all types of SNPs, with East Asians lowest in non-coding, fast non-syn, slow non-syn, and syn sites, Europeans in stop codon, splicing sites, and mtDNA, and YRI in Y chromosome.  Furthermore, shared SNPs among different populations mostly cluster in fast evolving DNAs consistent with independent mutations repeatedly hitting the same sites, invalidating the infinite sites assumption.  Only slow evolving nonsyn SNPs that are theoretically predicted to be phylogenetically informative show the expected higher diversity for admixed populations relative to populations that contributed to such admixture, e.g., African Americans relative to Africans and Europeans.  These 4923 SNPs are located in 178 proteins with much slower evolutionary rates than the average of those known to be under positive selection.  We estimated a very conservative date for the split of Africans and non-Africans to be no less than one million years ago,  which could be easily extended by 2-fold pending better determination of the human mutation rate in these slow evolving sitesOur analysis also provides data for Denisovans and Neanderthals as descendants of interbreeding between San and Iberians and as ancestors of Papuan and Australian aborigines, thus genetically linking all speakers of the click language.  This molecular study presents a unified view of the complex multiregional evolutionary history of modern humans.

Also, the following abstract has been accepted for presentation (do not know if talk or poster yet) at the Cell Symposium on Human Origins in March 2014, Spain. Given our surprisingly negative experience at ASHG2013, we must give special thanks to the organizers of the Cell Symposium for their open mindedness.   

Prior work has indicated that nucleotide diversities are mostly at optimum levels. From this perspective, we studied the origins of humans using the 1000 genomes project and other public data. We found empirical evidence of purifying selection for most common SNPs but neutrality for slow evolving nonsyn SNPs. Such slow SNPs allowed us to estimate a conservative time of no less than 2.11 million years for the split of Europeans and non-Europeans. Denisovans and Neanderthals/Altai were descendants of interbreeding between Africans and Europeans and ancestors of Australian aborigines. Analysis of mtDNA and Y chr provided independent evidence for the autosomes results. We also found a farmers-associated SNP and what may be gene conversions in mtDNA in admixed humans. These results suggest a unified human origins model consistent with physical/cultural traits and fossil records, which explains the wide diversity of Australian aborigines.  

In October this year in Beijing, researchers from China held a summit on nearly all relevant topics of anthropology.  I gave a talk right after professor Li Jin of Fudan University, the chief proponent in China of the Out of Africa model.  Professor/Academician Xinzhi Wu of Chinese Academy of Sciences (CAS), the chief proponent in China of the Multiregional model gave a keynote speech at the meeting.  Below is a picture taken at the meeting.  The impressively healthy 84 year old Xinzhi Wu is at the middle of the front row.  To his left is Li Jin, Xing Gao of CAS, and me. A book in Chinese from talks at this meeting is coming out in Spring 2014. 

Our paper on human origins should be ready for submission soon.  But every time we were getting ready, some fossil DNA or other data would turn up in the literature, such as the 24000 year old Siberian genome and now this 400000 year old Heidelbergensis, which just kept adding to the large amount of evidence for our story.  We just have to keep updating our manuscript, which is of course very pleasant to us. It is truly an exciting time to study human evolution thanks to the remarkable fossil sequencing effort of professor Paabo and others. 

Just a reminder, we held our story or any story of human origins to the highest standard of science: no single contradiction allowed, period. No future findings of any kind should be a surprise to us. If it is, we would concede we had the wrong story. We could be this confident because ever since the main theme of our story emerged about 3 years ago from the analysis of low quality autosomes HapMap data, we have analyzed numerous other datasets and we have not been surprised once. The thing that does surprise us maybe just a bit is how lawful and coherent an evolution story could turn out to be when studied in the right way.