Wednesday, July 9, 2014

Upcoming workshop, The Genome: Structure, Expression, and Evolution

I am going to be an invited speaker on a topic of my choice at the workshop on "The Genome: structure, expression and evolution", Sept 22-24, 2014 at Naples, Italy.  see the announcement here

Monday, June 9, 2014

How the MGD hypothesis has been received by the literature upto May 2014

星星之火可以燎原(A single spark can start a prairie fire
-       Chairman Mao in a letter on Jan 5, 1930 in criticism of certain pessimistic views then existing in the party.

It is super hard to achieve a revolution in science, definitely harder than overthrowing a government of a country. We are working hard to use our theory to solve real world problems that challenge the existing framework. I promise that our best experimental work has yet to come. If our theory cannot do better in solving real world puzzles than the existing paradigm, we would consider it useless and non-scientific, period. 
Not surprisingly to anyone experienced in science, we still face considerable difficulty in publishing our work in Western mainstream journals. Perhaps also not surprisingly, we have no problems in publishing in flagship journals in China, which makes one acutely appreciate the value of a multi-polar world.  But real positive changes have been consistently happening. Here is a brief summary of how the printed literature have so far viewed our Maximum Genetic Diversity (MGD) hypothesis since its publication in 2008. Other than a minor criticism by the 2014 Prigogine Award recipient Dr. Mae Wan Ho, who has given a very detailed description and a glowing view of the hypothesis in her 2009 book chapter, there is not a single formal criticism of the main themes of the hypothesis. All five citations so far have been positive as shown in the following slides. I prepared these slides recently for an academic review committee made of mostly non-specialists in the field, who probably have to rely on others' views to appreciate our work.
Dr. Ho made this criticism: “I believe Huang’s hypothesis that epigenetically complex organisms are less tolerant of genetic or germ line diversity is incomplete, because the level of germ line diversity is actively maintained.” I however believe this is a minor misunderstanding. She was talking about things like RNA editing, splicing, somatic mutations, etc in generating sequence diversity in complex organisms. But most of these are either ordered processes rather than random or somatic events that cannot pass on to the next generation. The genetic diversity defined by the MGD hypothesis that must be suppressed during advances in complexity is on the contrary all generated by random events in the germ line, i.e., random mutations. The key concept here is that random entropy must be suppressed in order to evolve higher levels of ordered diversity/complexity. This is a priori sound intuition that no one so far has dared to risk one’s reputation and sanity to attack. As evidenced by all proven and settled laws of hard sciences, Nature does follow human intuitions in an uncanny and astonishing way, whether some people like it or not. The most brilliant minds of humanity like a Godel or Einstein all know this. Quantum mechanics may be the only exception but no one really understands quantum mechanics and it is far from certain that it is complete and proven. That the micro world may operate in a counter-intuitive way may eventually turn out to be merely a premature and primitive understanding of how the universe really works. 

Tuesday, March 11, 2014

Origins of Australian aborigines and a unified hybridization model of human evolution: integrating informative data from autosomes, Y chr, and mtDNA

I am going to Spain in a few days for the Cell Symposium on Human Evolution on March 16-18th to present a poster as shown below (my lab website has a better resolution file for download, The introduction and main conclusions, as copied from the poster, is in the following.
A common difficulty for all existing models of human evolution has been to account for  the unusual diversity and admixed features of the Australian aborigines (AUA). A list of morphological features aimed at defining modern humans would exclude both modern AUA and Neanderthals, indicating some shared traits between the two. Fossil studies lack fine resolution power and molecular studies have yet to produce even just an internally coherent model uniting autosomes, Y chr, and mtDNA, let alone consistency with fossils/phenotypes and common sense. Almost every ancient DNA finding has been a surprise to the existing models, the latest being the 400K year old Heidelbergensis.
Molecular studies have so far relied on the Neutral theory and its infinite sites assumption. The Neutral theory was originally inspired by the so called molecular clock which was in turn inspired by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a simpler outgroup. In recent papers, we have shown that the equidistance result has been incorrectly interpreted by the molecular clock with grave consequences on phylogenetic studies: nearly all past studies have used non-informative DNAs assumed to be neutral but have now been shown by us to be under selection. We have developed the maximum genetic diversity (MGD) hypothesis to absorb and supersede the neutral theory. From this more correct/complete theoretical perspective that has coherently accounted for all major known observations, we developed new methods and studied the origins of humans using the 1000 genomes project and other public data. 
We found direct evidence of higher MGD or lower purifying selection for African San and Pygmy specific Y chr that will dramatically rewrite Y phylogeny. Certain regions of Y with low contents of repetitive DNAs have few SNPs in all Y haplotypes except being highly variable in A and B. But in other regions, similar degrees of variations are found for all haplotypes. Thus, the apparent deep separation or distance between A and B or between A/B and other haplotypes is not related to time as is commonly assumed. San/Pygmies have more positions in their genomes that when mutated are not deleterious to their physiology, which creates higher MGD and in turn higher innate or natural adaptive capacity required for a primitive life style.
Two types of DNAs are a priori informative to phylogenetic studies: slow evolving neutral DNAs from autosomes are informative to separation time and functional DNAs from mitochondria (mt) and Y chr are informative to shared internal physiology. Non-informative to phylogeny but informative to adaptation to outside environments are the fast evolving DNAs (speed is essential for quick adaptation to fast changing environments). We verified the informative nature of different SNPs by testing whether autosomes can reveal the known admixture history of Mexicans (MXL): those with R1b3 Y haplotype should have relatively more European and less Chinese autosomes than those with Q haplotype; those with B2 mtDNA should be the closest to the autosomes of B2 Northern Chinese. The slow evolving SNPs gave the expected result while all other autosome SNPs produced only senseless results.
Integrating autosomes, Y and mtDNA, we found just two original branches of humans, Europeans with R1a Y chr and U2 mtDNA and Africans with B and L1c, with a conservatively estimated split time of 2.1 million years. All other Y or mt branches originated by admixture and adaptive co-evolution as evidenced by autosomes data. We found a farmers-associated SNP3010 with highest frequency in D4 Chinese and what may be gene conversions in mtDNA in admixed humans (data not shown due to space limitations). Southern Chinese (CHS) O2 Y-chr and B2 mtDNA were the oldest or least admixed Asian group and derived from admixture of R1a/U2 Europeans and B/L1c Africans before leaving Africa for Asia. Heidelbergensis, Denisovans, and Neanderthals were descendants of a more recent interbreeding event in Iberia between migrating Africans and local Europeans, who then moved across Eurasia and became direct ancestors of AUA. These results suggest a unified hybridization model consistent with physical/cultural traits and fossil records, which explains the wide diversity of Australian aborigines. Since the original branching of two groups, further diversification of humans occurred largely by repeated cycles of migration/hybridization followed by independent evolution followed by further migration/hybridization, an intuitively expected pattern if homo has been a single species since the first co-appearance of H. habilis and H. rudolfensis.

Thursday, December 5, 2013

A mitochondrial genome sequence of a hominin from Sima de los Huesos confirms our findings

The DNA of the ~400,000 year old human from Sima de los Huesos is a big surprise to everyone except perhaps us. A look at the following abstracts of ours from several past presentations would easily show why.
(see the paper
 and Nature news on the surprise here:

My talk last October at the CAS-MPG Partner Institute for Computational Biology in Shanghai.
Here is the abstract that is still available from that website.  

Seminar Talk by Prof. Shi Huang
Author:webmaster    News time:2012-10-22
Speaker: Prof. Shi Huang, Central South University
Time: 13:00-14:45pm, Oct. 25, 2012
Place: Room 223, SIBS main building
Host: Dr. Shuhua Xu
Title: Evidence for the Multiregional hypothesis of modern human origins from phylogeny-informative sequences
Abstract:Recent studies support the Maximum Genetic Diversity hypothesis that nucleotide diversities in fast evolving sequences are mostly at optimum level and hence cannot be informative to phylogeny inference.  Consistently, here we found that fast evolving sequences are enriched with coincident or overlapping substitutions in humans and chimpanzees and also with shared SNPs among human races.  We therefore analyzed the phylogeny-informative slow evolving sequences to revisit the question of modern human origins.   We found that Europeans have greater nucleotide diversity than East Asians and Africans.  The split time between Europeans and non-Europeans and between East Asians and Africans were estimated to be ~1.98 and ~1.78 Myr ago, respectively.  East Asians are slightly closer to Europeans than Africans are, confirming genetic exchange between East Asians and Europeans as previously revealed by morphological studies.  Europeans show distinct SNP patterns in non-synonymous SNPs from non-Europeans. Denisovans may be Africa H. heidelbergensis who may have migrated to Europe ~0.38 Myr ago and interbred with female H. antecessor in Europe to give rise to Neanderthals who had predominantly Denisovan nuclear genomes. These molecular results are consistent with fossil records and the Multiregional hypothesis. 

This October I gave a talk on human origins at the HGV2013 in Seoul, Korea. Here below is the abstract.  (Ironically and a surprise to us, the same abstract was rejected even a chance for a poster presentation at the ASHG2013 meeting in Boston.  But we did present two other abstracts, one on Autism and the other on transgenerational inheritance in yeast).  

Understanding human evolutionary history may help solve health problems today. Using the 1000 genomes and other public sequences, we show that genetic diversity patterns of various human groups in most types of SNPs, including nonsyn sites in fast evolving proteins, syn sites in any proteins, non-coding, and mtDNAs, are all broadly similar to that of functional SNPs such as those altering/creating stop codons and splicing sites, thereby indicating purifying selection for most common SNPs and rendering them non-informative for phylogenetics. No population has uniformly lowest diversity in all types of SNPs, with East Asians lowest in non-coding, fast non-syn, slow non-syn, and syn sites, Europeans in stop codon, splicing sites, and mtDNA, and YRI in Y chromosome.  Furthermore, shared SNPs among different populations mostly cluster in fast evolving DNAs consistent with independent mutations repeatedly hitting the same sites, invalidating the infinite sites assumption.  Only slow evolving nonsyn SNPs that are theoretically predicted to be phylogenetically informative show the expected higher diversity for admixed populations relative to populations that contributed to such admixture, e.g., African Americans relative to Africans and Europeans.  These 4923 SNPs are located in 178 proteins with much slower evolutionary rates than the average of those known to be under positive selection.  We estimated a very conservative date for the split of Africans and non-Africans to be no less than one million years ago,  which could be easily extended by 2-fold pending better determination of the human mutation rate in these slow evolving sitesOur analysis also provides data for Denisovans and Neanderthals as descendants of interbreeding between San and Iberians and as ancestors of Papuan and Australian aborigines, thus genetically linking all speakers of the click language.  This molecular study presents a unified view of the complex multiregional evolutionary history of modern humans.

Also, the following abstract has been accepted for presentation (do not know if talk or poster yet) at the Cell Symposium on Human Origins in March 2014, Spain. Given our surprisingly negative experience at ASHG2013, we must give special thanks to the organizers of the Cell Symposium for their open mindedness.   

Prior work has indicated that nucleotide diversities are mostly at optimum levels. From this perspective, we studied the origins of humans using the 1000 genomes project and other public data. We found empirical evidence of purifying selection for most common SNPs but neutrality for slow evolving nonsyn SNPs. Such slow SNPs allowed us to estimate a conservative time of no less than 2.11 million years for the split of Europeans and non-Europeans. Denisovans and Neanderthals/Altai were descendants of interbreeding between Africans and Europeans and ancestors of Australian aborigines. Analysis of mtDNA and Y chr provided independent evidence for the autosomes results. We also found a farmers-associated SNP and what may be gene conversions in mtDNA in admixed humans. These results suggest a unified human origins model consistent with physical/cultural traits and fossil records, which explains the wide diversity of Australian aborigines.  

In October this year in Beijing, researchers from China held a summit on nearly all relevant topics of anthropology.  I gave a talk right after professor Li Jin of Fudan University, the chief proponent in China of the Out of Africa model.  Professor/Academician Xinzhi Wu of Chinese Academy of Sciences (CAS), the chief proponent in China of the Multiregional model gave a keynote speech at the meeting.  Below is a picture taken at the meeting.  The impressively healthy 84 year old Xinzhi Wu is at the middle of the front row.  To his left is Li Jin, Xing Gao of CAS, and me. A book in Chinese from talks at this meeting is coming out in Spring 2014. 

Our paper on human origins should be ready for submission soon.  But every time we were getting ready, some fossil DNA or other data would turn up in the literature, such as the 24000 year old Siberian genome and now this 400000 year old Heidelbergensis, which just kept adding to the large amount of evidence for our story.  We just have to keep updating our manuscript, which is of course very pleasant to us. It is truly an exciting time to study human evolution thanks to the remarkable fossil sequencing effort of professor Paabo and others. 

Just a reminder, we held our story or any story of human origins to the highest standard of science: no single contradiction allowed, period. No future findings of any kind should be a surprise to us. If it is, we would concede we had the wrong story. We could be this confident because ever since the main theme of our story emerged about 3 years ago from the analysis of low quality autosomes HapMap data, we have analyzed numerous other datasets and we have not been surprised once. The thing that does surprise us maybe just a bit is how lawful and coherent an evolution story could turn out to be when studied in the right way. 

Friday, March 1, 2013

Role of genetic polymorphisms in transgenerational inheritance of inherent as well as acquired traits in budding yeast

Zhu, Z., Lu, Q., Yuan, D., Li, Y., Man, X., Zhu, Y., and Huang, S.  (2013)  Role of genetic polymorphisms in transgenerational inheritance of inherent as well as acquired traits in budding yeastarXiv:1302.7276 [q-bio.GN] submitted.  pdf

Role of genetic polymorphisms in transgenerational inheritance of inherent as well as acquired traits in budding yeast

Zuobin Zhu, Qing LuDejian YuanYanke LiXian ManYueran Zhu and Shi Huang*

State Key Laboratory of Medical Genetics, Central South University, 110 Xiangya Road, Changsha, Hunan 410078, P.R. China


Both inherent and acquired traits can be transmitted through multiple generations with some traits more stable than others.  But the relationship between the stability of such transgenerational inheritance and the genetic variations in an individual or cell has yet to be explored.  We studied the effect of common genetic polymorphisms on transgenerational inheritance of yeast segregants that were derived from a cross between a laboratory strain and a wild strain of Saccharomyces cerevisiae.  For each of 2835 common SNPs analyzed, the parental allele present in less than half of the 124 segregants panel was called the minor allele (MA).  We found a nonrandom distribution of MAs in the segregants, indicating natural selection, as segregants with high MA content or amount (MAC) were not enriched with MAs from the parental strain that contributed significantly more to the whole set of MAs.  We compared segregants with high MAC relative to those with less and found a more dramatic shortening of the lag phase length for the high MAC group in response to 14 days of ethanol training.  Also, the short lag phase as acquired and epigenetically memorized by ethanol training was more dramatically lost after 7 days of recovery in ethanol free medium for the high MAC group. Sodium chloride treatment produced similar observations.  Using public datasets, we found MAC linkage to mRNA expression of hundreds of genes.  Finally, by analyzing a recently published datasets of 1009 yeast segregants that identified numerous additive QTLs for 46 traits, we found by multivariate regression analysis preferential effect of MAC on traits with high number of known additive QTLs (average 16 QTLs for the 5 MAC-linked traits vs 12 for the whole set of 46 traits), consistent with an additive effect of a large number of SNPs or MAs whose individual effect would be too minor to be detectable by existing methods.  These results provide evidence for the slightly deleterious nature of most MAs and a lower capacity to maintain inheritance of traits in individuals or cells with greater MAC, which have implications for disease prevention and treatment.  Individuals with high MAC may be more susceptible to environmental pathogens, but they may also be more treatable if treatment was administered relatively early before the disease has progressed past the threshold of no return, because the acquired disease trait may be less stably maintained in these individuals.  The concept and method of MAC are broadly applicable, and may have solved a large part of the “missing heritability” problem in complex traits by simply relying on a priori truth/intuition that no mutation can be truly neutral or devoid of an entropy generating effect.


Monday, November 12, 2012

Purifying selection and phylogenetic trees

The author of the Y chr work Dr. Melissa Wilson Sayres thought that purifying selection may only affect estimation of divergence time but not the general topology of a tree. This essay explains why this may be mistaken. There are several independent ways to explain why purifying selection will affect tree topology in addition to timing.


1) The neutral theory never predicts that sequences under purifying selection are equally suitable for building phylogenetic trees, even if just for the topology part of it. The key concept of the neutral theory is that most observed natural variants are not under purifying selection.  Most are neutral and some (very few) are beneficial. The new data on Y chr invalidate the neutral theory.  And if the neutral theory is invalid, all molecular trees today would have no sound theoretical basis.  In fact, in our view, the neutral theory was mistaken right from the start when it mistakenly interpreted the genetic equidistance result that got the field started.  That result was the best evidence for purifying selection and absence of junk/neutral DNAs.   


2) Common purifying selection would lead to common shared sequences, which would dramatically affect topology. (It may be easier for one to imagine shared sequences due to positive selection. But it really is pretty easy to think the same for purifying selection.)  Thus, the close similarity between human and chimpanzee in fast evolving sequences including the Y, which are all under purifying selection, merely indicates common purifying selection rather than common ancestry. Our recent paper in Science China shows that when using slow evolving sequences not under selection, chimp and human can be shown to belong to separate clades with all three great apes in the pongid clade.


3)  If population A has high genetic diversity while B low in most genome sequences, the typical interpretation today is that A evolved longer than B and gave rise to B.  But this topology could be completely reversed if most sequences are under purifying selection with A under more relaxed selection than B.  Here the true topology as revealed by the slow evolving sequences may show that B evolved first and has higher genetic diversity in the informative sequences.  We will soon have a paper to this effect. 


4) Nothing is truly neutral.  All variants, being random and disorderly in origin, have a deleterious aspect.  A major effect variant causing great harm never has a chance to behave as neutral and is negatively selected immediately within one generation after it emerges.  In contrast, most minor effect variants would exist as neutral for a long time or many generations before being negatively selected when the accumulation of too many such variants exceeds the maximum tolerable level that an organism can tolerate.  Simple organisms can tolerate more.  Therefore, only slow evolving sequences that have variant numbers still below the maximum tolerable level are informative to tree topology as well as timing.  That more slow evolving and hence more conserved sequences have apparently observed neutral variants may seem counter-intuitive but actually makes sense. Since changes in the slow evolving sequences take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection.


5) Nearly all the ‘surprise’ results reported at the ASHG2012 meeting can be easily explained by purifying selection and the MGD.  Iceman Otzi from ~5000 years ago was found not to show similarity to Europeans today in Central Europe in most fast evolving sequences, but rather to Sardinians, which is considered surprising.  Also, Iceman is related to other Central European farmers (but not hunter gathers) from 5000 years ago.  Even more surprising is that the mtDNA of Iceman does not resemble any humans today.  Well, all these are evidence for the MGD.  The sequences under purifying selection 5000 years ago are of course expected to be very different from those of today.  My graduate students are right now busy verifying that the Iceman will be inseparable from Central Europeans today in slow evolving sequences.  Another surprise mentioned at the meeting was that the timings of out of Africa modern humans to arrive at Spain (very close to Africa) and at Australia (very far) are about the same ~45K years ago.  In truth, these timing are based on genetic diversity levels from fast evolving sequences.


6) The fact that the Y chr trees are in general agreement with those of mtDNA trees and genome average trees merely indicates that all these sequences share something in common in terms of relevancy to phylogeny.  They may be equally informative or equally non-informative to phylogeny.  If any one of these sequences is shown to be non-informative, it would mean the same for all these sequences. Thus the fact the y chr is under strong purifying selection means the same for mtDNA and genome average.