Monday, November 12, 2012

Purifying selection and phylogenetic trees


 
The author of the Y chr work Dr. Melissa Wilson Sayres thought that purifying selection may only affect estimation of divergence time but not the general topology of a tree. This essay explains why this may be mistaken. There are several independent ways to explain why purifying selection will affect tree topology in addition to timing.

 

1) The neutral theory never predicts that sequences under purifying selection are equally suitable for building phylogenetic trees, even if just for the topology part of it. The key concept of the neutral theory is that most observed natural variants are not under purifying selection.  Most are neutral and some (very few) are beneficial. The new data on Y chr invalidate the neutral theory.  And if the neutral theory is invalid, all molecular trees today would have no sound theoretical basis.  In fact, in our view, the neutral theory was mistaken right from the start when it mistakenly interpreted the genetic equidistance result that got the field started.  That result was the best evidence for purifying selection and absence of junk/neutral DNAs.   

 

2) Common purifying selection would lead to common shared sequences, which would dramatically affect topology. (It may be easier for one to imagine shared sequences due to positive selection. But it really is pretty easy to think the same for purifying selection.)  Thus, the close similarity between human and chimpanzee in fast evolving sequences including the Y, which are all under purifying selection, merely indicates common purifying selection rather than common ancestry. Our recent paper in Science China shows that when using slow evolving sequences not under selection, chimp and human can be shown to belong to separate clades with all three great apes in the pongid clade.

 

3)  If population A has high genetic diversity while B low in most genome sequences, the typical interpretation today is that A evolved longer than B and gave rise to B.  But this topology could be completely reversed if most sequences are under purifying selection with A under more relaxed selection than B.  Here the true topology as revealed by the slow evolving sequences may show that B evolved first and has higher genetic diversity in the informative sequences.  We will soon have a paper to this effect. 

 

4) Nothing is truly neutral.  All variants, being random and disorderly in origin, have a deleterious aspect.  A major effect variant causing great harm never has a chance to behave as neutral and is negatively selected immediately within one generation after it emerges.  In contrast, most minor effect variants would exist as neutral for a long time or many generations before being negatively selected when the accumulation of too many such variants exceeds the maximum tolerable level that an organism can tolerate.  Simple organisms can tolerate more.  Therefore, only slow evolving sequences that have variant numbers still below the maximum tolerable level are informative to tree topology as well as timing.  That more slow evolving and hence more conserved sequences have apparently observed neutral variants may seem counter-intuitive but actually makes sense. Since changes in the slow evolving sequences take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection.

 

5) Nearly all the ‘surprise’ results reported at the ASHG2012 meeting can be easily explained by purifying selection and the MGD.  Iceman Otzi from ~5000 years ago was found not to show similarity to Europeans today in Central Europe in most fast evolving sequences, but rather to Sardinians, which is considered surprising.  Also, Iceman is related to other Central European farmers (but not hunter gathers) from 5000 years ago.  Even more surprising is that the mtDNA of Iceman does not resemble any humans today.  Well, all these are evidence for the MGD.  The sequences under purifying selection 5000 years ago are of course expected to be very different from those of today.  My graduate students are right now busy verifying that the Iceman will be inseparable from Central Europeans today in slow evolving sequences.  Another surprise mentioned at the meeting was that the timings of out of Africa modern humans to arrive at Spain (very close to Africa) and at Australia (very far) are about the same ~45K years ago.  In truth, these timing are based on genetic diversity levels from fast evolving sequences.

 

6) The fact that the Y chr trees are in general agreement with those of mtDNA trees and genome average trees merely indicates that all these sequences share something in common in terms of relevancy to phylogeny.  They may be equally informative or equally non-informative to phylogeny.  If any one of these sequences is shown to be non-informative, it would mean the same for all these sequences. Thus the fact the y chr is under strong purifying selection means the same for mtDNA and genome average. 
 
 

1 comment:

SeriousCat said...

"3) If population A has high genetic diversity while B low in most genome sequences, the typical interpretation today is that A evolved longer than B and gave rise to B. But this topology could be completely reversed if most sequences are under purifying selection with A under more relaxed selection than B."

What other implications would this have for the traditional Out-of-Africa hypothesis? I remember reading on Dienekes blog that, although he was sceptical in regard to strong interpretations of the 'multi-regional model', it seems that new data from introgression studies also cast doubt on the 'standard' model as well. Is there any other news about their current status in population genetics?

Nice blog, by the way. I'm new to this stuff, so please forgive me for any stupid questions.