Thursday, April 30, 2009

Molecular clock at best explains half the story on ‘genetic equidistance’ and at worst explains none

The genetic equidistance result (sister species are equidistant to a simpler outgroup) has been interpreted by a tautology, the molecular clock hypothesis, which says that vastly different lineages have very similar mutation rates.  The neutral theory was invented to explain the molecular clock by postulating that the vast majority of residue differences between species are neutral mutations. 

 

On surface, the similar mutation rate idea seems to explain the equidistance result in terms of percent identity.  But one fatal weakness with this idea that was pointed out in my previous paper is that there is no independent evidence for this idea.  In contrast, there are ample evidence against this idea.  That observation alone has in part led me to invent the MGD hypothesis as the correct interpretation for the equidistance result.  The MGD interpretation came to me from logical reasoning based on basic biological principles.  Thus, I had deduced an important feature of the equidistance result from an axiom before I had a full grasp of the complete story of equidistance.  That feature is: most of the residue positions differing between one sister lineage and the outgroup are also different between another sister lineage and the outgroup.  In other words, suppose that sister species A and B are equidistant to the simpler outgroup C, where A and B has separated for much longer time than the time of separation between C and the common ancestor of A and B.  We would observe that most of the residue positions that differ between A and C are also different between B and C.  Below, I illustrate this fundamental feature of the equidistance result by using the example of cytochrome c which was used originally in 1963 to discover the equidistance result, with the baker’s yeast as the outgroup to the sister species of drosophila and human. 

 


Yeast blastp against drosophila:

Identities = 67/104 (64%), Positives = 78/104 (75%), Gaps = 0/104 (0%)

 

Yeast  5    AGSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTDANIKKNVLW  64

            AG  +KG  LF  RC QCHTVE GG HKVGPNLHG+ GR +GQA G++YTDAN  K + W

Droso  5    AGDVEKGKKLFVQRCAQCHTVEAGGKHKVGPNLHGLIGRKTGQAAGFAYTDANKAKGITW  64

 

Yeast  65   DENNMSEYLTNPKKYIPGTKMAFGGLKKEKDRNDLITYLKKATE  108

            +E+ + EYL NPKKYIPGTKM F GLKK  +R DLI YLK AT+

Droso  65   NEDTLFEYLENPKKYIPGTKMIFAGLKKPNERGDLIAYLKSATK  108

 

 

Yeast blastp against human:

Identities = 66/102 (64%), Positives = 79/102 (77%), Gaps = 0/102 (0%)

 

Yeast  6    GSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTDANIKKNVLWD  65

            G  +KG  +F  +C QCHTVEKGG HK GPNLHG+FGR +GQA GYSYT AN  K ++W

Human  2    GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWG  61

 

Yeast  66   ENNMSEYLTNPKKYIPGTKMAFGGLKKEKDRNDLITYLKKAT  107

            E+ + EYL NPKKYIPGTKM F G+KK+++R DLI YLKKAT

human  62   EDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKAT  103

 

 

Drosophila blastp against human:

Identities = 80/102 (78%), Positives = 87/102 (85%), Gaps = 0/102 (0%)

 

Droso  6    GDVEKGKKLFVQRCAQCHTVEAGGKHKVGPNLHGLIGRKTGQAAGFAYTDANKAKGITWN  65

            GDVEKGKK+F+ +C+QCHTVE GGKHK GPNLHGL GRKTGQA G++YT ANK KGI W

human  2    GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIWG  61

 

Droso  66   EDTLFEYLENPKKYIPGTKMIFAGLKKPNERGDLIAYLKSAT  107

            EDTL EYLENPKKYIPGTKMIF G+KK  ER DLIAYLK AT

human  62   EDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKAT  103

 

 

Overlap variant positions among Sc, Dm, and Hs:

 

Sc     6     GSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTDANIKKNVLW  64

Sc=Dm        G  +KG  LF  RC QCHTVE GG HKVGPNLHG+ GR +GQA G++YTDAN  K + W

Dm     6     GDVEKGKKLFVQRCAQCHTVEAGGKHKVGPNLHGLIGRKTGQAAGFAYTDANKAKGITW  64

Sc=Hs        G  +KG  +F  +C QCHTVEKGG HK GPNLHG+FGR +GQA GYSYT AN  K ++W

Hs     2     GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW  60

Overlap       xxx  xx  xx  x         x         x   xx   x        xx xxx

 

Sc     65   DENNMSEYLTNPKKYIPGTKMAFGGLKKEKDRNDLITYLKKAT  107

Sc=Dm       +E+ + EYL NPKKYIPGTKM F GLKK  +R DLI YLK AT

Dm     65   NEDTLFEYLENPKKYIPGTKMIFAGLKKPNERGDLIAYLKSAT  107

Sc=Hs        E+ + EYL NPKKYIPGTKM F G+KK+++R DLI YLKKAT

Hs     62   GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKAT  103

Overlap     x xxxx   x           x x    xxx x   x

 

 

 

The above alignment data show that yeast is approximately equidistant to drosophila (67/104 identity) and to human (66/102 identity).  If one carefully compares the alignments, one would find that among those 36 residue differences between yeast and human, 31 are also different between yeast and drosophila. 

 

This nearly complete overlap in mutated residue positions in two separate sister lineages is one of the two fundamental features of the genetic equidistance phenomenon (the other is of course the equidistance in terms of percent identity).  However, it, dubbed the overlap feature, has been completely ignored or overlooked in the past 46 years.  The molecular clock interpretation and the neutral theory were invented based on a complete ignorance of this feature.  They would not have been invented in the first place if people had paid attention to the overlap feature because they are clearly contradicted by this feature.  It is astonishing that this obvious contradiction has never been recognized for the past 46 years in a large field of study that has produced several dozen members of the National Academy of Sciences but is nonetheless completely misled by a false paradigm.

 

The molecular clock and the neutral theory cannot predict a majority of all mutant residue positions between yeast and human to be also mutant positions between yeast and drosophila.  The predicted number is at best 20 residue positions, far short of the observed 31.  This is easily calculated as follows:

 

As the above alignment shows, drosophila and human differ at 22 of 102 positions.  Among these, 17 drosophila or human positions are also different from yeast.  So, of the total 31 mutant/changed positions between yeast and human that are also altered between yeast and drosophila, 14 could be assigned to changes occurred during the time period when the common ancestor of human and drosophila has been separate from the yeast lineage but has yet to split out human and drosophila.  After the split of human and drosophila, the chance for a residue to be different between yeast and the human lineage or between yeast and drosophila or between drosophila and human is approximately 22/81 = 0.27.  (There are only 7 residues that are absolutely conserved among all life forms that contan cytochrome c.  So the positions that are changeable are 88 - 7 = 81).  The chance for the same residue position to be altered in both the yeast-human comparison and the yeast-drosophila comparison is 0.27 x 0.27 = 0.073.  Together with the 14 shared mutant positions accumulated in the common ancestor lineage of drosophila and human, this means that there should only be 14 + 6 =20 residue positions that are altered in both the yeast-human comparison and the yeast-drosophila comparison.

 

To get to 31, we must invoke that there are only 28 residues that are neutral (22/28 x 22/28 x 28 = 17).  This means that the observed distance between yeast and human or between yeast and drosophila or between human and drosophila represents nearly the maximum possible.  But a maximum cap concept on genetic distance is entirely missing in the practical application of the molecular clock and the neutral theory.  That concept is nonexistent in the past until the recent MGD hypothesis.

 

In short, while the molecular clock and the neutral theory may predict half of the equidistance result (equidistance in terms of percent identity), they cannot predict or are contradicted by the other half of the result where most of the mutant positions relative to the outgroup are shared between the two sister lineages.  Therefore, the molecular clock and the neutral theory are not at all valid explanation for the equidistance result.  This way of invalidating the existing theory did not occur to me until recently, which is why I did not include it in my previous paper that refutes the molecular clock interpretation of the equidistance result.  Ref. Huang, S.  “The genetic equidistance result of molecular evolution is independent of mutation rates.”

 

The MGD hypothesis is the only viable and complete explanation so far for the equidistance result.  It has proven to be the correct one as it easily passes the highest standard for a scientific theory, i.e. to explain all relevant facts and to have not a single factual and logical contradiction.  The example here with cytochrome c should provide the actual data for the simplistic illustration of the MGD explanation as shown in Table 1 of my MGD paper (Inverse relationship between genetic diversity and epigenetic complexity).   

 

BTW, an example of how the molecular clock interpretation was used in practice by the field to produce the famous 5 million year divergence time between human and chimpanzees, as first reported in 1969. Wilson AC, Sarich VM (1969) A molecular time scale for human evolution. Proc Natl Acad Sci U S A 63: 1088-1093.

 

Wilson and Sarich wrote in their 1969 paper:

“Table 1 shows that the four primate hemoglobins are about equally distinct in sequence from that of the horse. Therefore, the hemoglobins of monkeys on the one hand, and those of the apes and man on the other, have changed to about the same extent

since these species last shared a common ancestor.  These results are neither unique nor surprising. Others have already recognized that protein molecules often appear to have evolved in a regular fashion with respect to time. The bulk of the available sequence information is consistent with the hypothesis that for any given protein, such as hemoglobin, the probability of an amino acid substitution occurring in a given interval of time is the same in every lineage.”

 

The above shows that Wilson and Sarich interpreted the equidistance result (horse is equidistant to the primates in hemoglobins) by assuming the same mutation rate in every primate lineage.  From there, they went on to calculate a 5 million year split for humans and chimpanzees.  Now, given that I have proven that the molecular clock interpretation of the equidistance result is completely false, it follows that any result based on such interpretation is automatically false.  Indeed, my calculation based on the MGD hypothesis gave a human-pongid split time of 19.2 million years (manuscript in preparation).   

 

Acknowledgements:


I thank my college classmate Dr. Wei Shen for discussion and correcting a mistake on the number of overlap residues in the first draft of this assay. 

 

No comments: