Monday, November 12, 2012

Purifying selection and phylogenetic trees

The author of the Y chr work Dr. Melissa Wilson Sayres thought that purifying selection may only affect estimation of divergence time but not the general topology of a tree. This essay explains why this may be mistaken. There are several independent ways to explain why purifying selection will affect tree topology in addition to timing.


1) The neutral theory never predicts that sequences under purifying selection are equally suitable for building phylogenetic trees, even if just for the topology part of it. The key concept of the neutral theory is that most observed natural variants are not under purifying selection.  Most are neutral and some (very few) are beneficial. The new data on Y chr invalidate the neutral theory.  And if the neutral theory is invalid, all molecular trees today would have no sound theoretical basis.  In fact, in our view, the neutral theory was mistaken right from the start when it mistakenly interpreted the genetic equidistance result that got the field started.  That result was the best evidence for purifying selection and absence of junk/neutral DNAs.   


2) Common purifying selection would lead to common shared sequences, which would dramatically affect topology. (It may be easier for one to imagine shared sequences due to positive selection. But it really is pretty easy to think the same for purifying selection.)  Thus, the close similarity between human and chimpanzee in fast evolving sequences including the Y, which are all under purifying selection, merely indicates common purifying selection rather than common ancestry. Our recent paper in Science China shows that when using slow evolving sequences not under selection, chimp and human can be shown to belong to separate clades with all three great apes in the pongid clade.


3)  If population A has high genetic diversity while B low in most genome sequences, the typical interpretation today is that A evolved longer than B and gave rise to B.  But this topology could be completely reversed if most sequences are under purifying selection with A under more relaxed selection than B.  Here the true topology as revealed by the slow evolving sequences may show that B evolved first and has higher genetic diversity in the informative sequences.  We will soon have a paper to this effect. 


4) Nothing is truly neutral.  All variants, being random and disorderly in origin, have a deleterious aspect.  A major effect variant causing great harm never has a chance to behave as neutral and is negatively selected immediately within one generation after it emerges.  In contrast, most minor effect variants would exist as neutral for a long time or many generations before being negatively selected when the accumulation of too many such variants exceeds the maximum tolerable level that an organism can tolerate.  Simple organisms can tolerate more.  Therefore, only slow evolving sequences that have variant numbers still below the maximum tolerable level are informative to tree topology as well as timing.  That more slow evolving and hence more conserved sequences have apparently observed neutral variants may seem counter-intuitive but actually makes sense. Since changes in the slow evolving sequences take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection.


5) Nearly all the ‘surprise’ results reported at the ASHG2012 meeting can be easily explained by purifying selection and the MGD.  Iceman Otzi from ~5000 years ago was found not to show similarity to Europeans today in Central Europe in most fast evolving sequences, but rather to Sardinians, which is considered surprising.  Also, Iceman is related to other Central European farmers (but not hunter gathers) from 5000 years ago.  Even more surprising is that the mtDNA of Iceman does not resemble any humans today.  Well, all these are evidence for the MGD.  The sequences under purifying selection 5000 years ago are of course expected to be very different from those of today.  My graduate students are right now busy verifying that the Iceman will be inseparable from Central Europeans today in slow evolving sequences.  Another surprise mentioned at the meeting was that the timings of out of Africa modern humans to arrive at Spain (very close to Africa) and at Australia (very far) are about the same ~45K years ago.  In truth, these timing are based on genetic diversity levels from fast evolving sequences.


6) The fact that the Y chr trees are in general agreement with those of mtDNA trees and genome average trees merely indicates that all these sequences share something in common in terms of relevancy to phylogeny.  They may be equally informative or equally non-informative to phylogeny.  If any one of these sequences is shown to be non-informative, it would mean the same for all these sequences. Thus the fact the y chr is under strong purifying selection means the same for mtDNA and genome average. 

Sunday, November 11, 2012

Evidence of natural selection of Y chr invalidates dating of human divergence history using Y chr markers

I attended the American Society of Human Genetics 2012 meeting last week.  The following work presented by a Berkeley lab at the meeting shows that the whole Y chr is under strong purifying natural selection and shows extremely low diversity.  Thus the markers on Y chr are not neutral and should not be used for phylogeny inferences.  If two different human populations share some hypolotypes in Y chr, it does not indicate shared ancestry but rather some sort of common selection.  We have been saying in recent years that nearly all human genome variations are not neutral and are under natural selection.  There are essentially no junks.  To date human evolution history, one must use slow evolving neutral sequences as we have been advocating.  All existing literature on human history using Y chr or mtDNA or any other sequences are mistaken. 

There are only one scientific theory known so far that advocates no junk DNAs, i.e., the MGD hypothesis.  Much work in recent years have essentially killed the junk DNA concept, most recently by the ENCODE finding of at least 80% human genome being functional.  But the theory that predicts the neutral and junk DNA concept still remains to be overthrown.  The MGD represents the best chance to explain nearly 100% functional genome and to supercede the neutral theory. 

Abundant selection explains low diversity on human Y chromosomes. M. Wilson Sayres1,2, K. Lohmueller1,2, R. Nielsen1,2 1) Integrative Biology, University of California, Berkeley, Berkeley, CA; 2) Statistics, University of California, Berkeley, Berkeley, CA.

The human Y chromosome exhibits levels of diversity that are significantly lower than expected under neutral population genetic theory. Variance in male reproductive success (reducing the effective population size of males relative to females) has recently been proposed as an alternative neutral model to explain reduced diversity on the Y relative to mtDNA. Generally Y chromosomes are not included in whole genome analyses, so explicit tests of this hypothesis have yet to be conducted. Here we show that neutral models with unequal male and female effective population sizes are not consistent with observed genome-wide diversity on autosomes, X, Y and mtDNA across completely sequenced males. Instead, a model including selection is needed to explain the departure of observed Y diversity from expectations. We found that models with similar estimates of the strength of background selection can explain diversity for both the Y chromosome and mitochondrial genomes. Our results suggest that strong selection is necessary for explaining the evolutionary history of the human Y chromosome, and argue against the concept of the "junk" Y chromosome .