Showing posts with label overlap. Show all posts
Showing posts with label overlap. Show all posts

Thursday, May 7, 2009

A more accurate rendering of Table 1 in the MGD paper

The following modification of Table 1 of the MGD paper take into account those positions that are shared between species due to common adaptation to a common environmental selection, depicted by ‘#’.   Variant positions are denoted by 'x'. This more realistic rendering explains both equidistance and overlap features. It explains why not all variants between Hs and Dm are also variants between Hs and Sc or between Nc and Sc. Also see the related post on May 1, 2009 titled 'more on the MGD interpretation of certain facts.'


Table 1. Alignment of a hypothetical 20 amino acid peptide.


Nc     1 2 3 4 5 6 7 8 # 1 x 3 4 # x # x # x x     N. crassa

Sc     1 2 3 4 5 6 7 8 # 1 2 x x # x # # # x x     S. cerevisiae

Dm     1 2 3 4 5 6 7 8 # 1 2 3 4 5 6 # # x x x     D. melanogaster

Hs     1 2 3 4 5 6 7 8 # 1 2 3 4 # 6 7 # 9 9 x     H. sapiens



Equidistance feature:


Identity

Nc-Sc, 65% (13/20)

Dm-Sc, 65%

Dm-Nc, 65%

Hs-Sc, 65%

Hs-Nc, 65%

Hs-Dm, 75%

Overlap feature:

Of 5 variants between Hs-Dm, four are variants between Hs-Sc.

Of 5 variants between Hs-Dm, four are variants between Hs-Nc.

Of 5 variants between Hs-Dm, two are variants between Sc-Nc.  



Actual alignment of the above four species for cytochrome c:


Dm              -GDVEKGKKLFVQRCAQCHTVEAGGKHKVGPNLHGLIGRKTGQAAGFAYTDANKA

Hs              -GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKN

 

Sc              -GSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTDANIK

Nc              -GDSKKGANLFKTRCAQCHTLEEGGGNKIGPALHGLFGRKTGSVDGYAYTDANKQ

                 *   **   *   * **** * **  * ** ***  **  *   *  ** ** 

 

Dm              KGITWNEDTLFEYLENPKKYIPGTKMIFAGLKKPNERGDLIAYLKSAT

Hs              KGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKAT

 

Sc              KNVLWDENNMSEYLTNPKKYIPGTKMAFGGLKKEKDRNDLITYLKKAT

Nc              KGITWDENTLFEYLENPKKYIPGTKMAFGGLKKDKDRNDIITFMKEAT

                *   * *    *** *********** * * **   * * *   * **  

 


The existence of shared residues due to environmental selection is key to the adaptive survival of species over long evolutionary time.  The MGD predicts that while a simple organism may have great genetic diversity potential, only a fraction of the diversity is expressed at any one time due to natural selection to be the most optimal.  This view is fully consistent with one of the two most remarkable early results of molecular evolution: the levels of allozyme diversity vary by no more than a few fold across almost all species.  (the other result is of course the seemingly similar mutation rates in different species.)  This view is also fully supported by the work Eviatar Nevo on the ‘evolution canyon’ in Israel.


Nevo: “Natural selection in some form proved to be the major determinant of genetic population structure and differentiation. The parallel genetic patterns obtained across all geographical scales in nature were once again inconsistent with the neutral theory of molecular evolution.  Natural selection in its various forms appears to maintain genetic polymorphism and orient molecular evolution at both the protein and DNA level, as was also supported theoretically.”


Nevo E, Beiles A, Ben-Shlomo R (1984) The evolutionary significance of genetic diversity: ecological, demographic and life histroy correlates. In: Mani GS, editor. Evolutionary Dynamics of Genetic Diversity. Berlin: Spinger-Verlag. pp. 13–213.

Nevo E (2001) Evolution of genome-phenome diversity under environmental stress. Proc Natl Acad Sci U S A 98: 6233-6240.

Acknowledgements:

I thank my college classmate Dr. Wei Shen for providing the actual sequence alignment shown here and for the helpful discussion on Table 1 of the MGD paper.

Wednesday, May 6, 2009

The overlap feature of hemoglobin

Hemoglobin was used in 1962 by Zuckerkandl and Pauling to derive data that led to the molecular clock idea.  Here I examined the overlap feature using human Hemoglobin alpha 1 (AAK61216, hba1) to compare with horse and chicken, in a way similar to what Zuckerkandl and Pauling had done.

Hs, Human [Homo sapiens]

Ec, Horse [Equus caballus]

Gg, Chicken [Gallus gallus]

Hs     1    MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG  60

Hs=Ec       MVLS ADKTNVKAAW KVG HAGEYGAEALERMFL FPTTKTYFPHFDLSHGSAQVK HG

Ec     1    MVLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHG  60

Hs=Gg       MVLS ADK NVK  + K+  HA EYGAE LERMF ++P TKTYFPHFDLSHGSAQ+KGHG

Gg     1    MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHFDLSHGSAQIKGHG  60

Overlap         x          x   x               x 

Non-overlap                                                          *           


Hs     61   KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP  120

Hs=Ec       KKV DALT AV H+DD+P ALS LSDLHAHKLRVDPVNFKLLSHCLL TLA HLP +FTP

Ec     61   KKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTP  120

Hs=Gg       KKV  AL  A  H+DD+   LS LSDLHAHKLRVDPVNFKLL  C LV +A H PA  TP

Gg     61   KKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRVDPVNFKLLGQCFLVVVAIHHPAALTP  120

Overlap        x    x  x x  x x   x                            x    x

Non-overlap                                                *       *           

 

Hs     121  AVHASLDKFLASVSTVLTSKYR  142

Hs=Ec       AVHASLDKFL+SVSTVLTSKYR

Ec     121  AVHASLDKFLSSVSTVLTSKYR  142

Hs=Gg        VHASLDKFL +V TVLT+KYR

Gg     121  EVHASLDKFLCAVGTVLTAKYR  142

Overlap               x

 

Results:

Of 17 variants between human and horse, 14 are also variants between human and chicken.  So there are 14 overlaps and 3 non-overlaps. 

Molecular clock prediction:  17/142 x 42/142 x 142 = 5 overlap residues, accounting for 36% of total overlap.  If we generously grant 50% positions as absolutely non-variable, we have 17/71 x 42/71 x 71 = 10 overlap residues, accounting for only 71% of total overlap.  To account for 14 overlaps, we need 91 absolutely non-variable residues or require that there are only 51 residues that can vary.  But we know that there are at least 71 variable positions between human and fish.  So, molecular clock simply cannot account for the 14 overlaps.  But the existence of significant overlap is a prime prediction of the MGD hypothesis.


In fact, there are only 21 residues that are absolutely conserved among human, bony fish, lungfish, coelacanths, and sharks.  So, a most realistic calculation of overlap should be 17/121x42/121x122=5.9 residues, far short of 14. 

BTW, chicken is equidistant (70% identity) to human and horse as shown below.  Of 17 variants between human and horse, 14 are also variant between horse and chicken, a significant overlap.

Human-Horse [Equus caballus]:

Identities = 125/142 (88%)

Hs     1    MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG  60

            MVLS ADKTNVKAAW KVG HAGEYGAEALERMFL FPTTKTYFPHFDLSHGSAQVK HG

Ec     1    MVLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHG  60

 

Hs     61   KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP  120

            KKV DALT AV H+DD+P ALS LSDLHAHKLRVDPVNFKLLSHCLL TLA HLP +FTP

Ec     61   KKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTP  120

 

Hs     121  AVHASLDKFLASVSTVLTSKYR  142

            AVHASLDKFL+SVSTVLTSKYR

Ec     121  AVHASLDKFLSSVSTVLTSKYR  142

 

Human-Chicken [Gallus gallus]

Identities = 100/142 (70%)

Hs     1    MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG  60

            MVLS ADK NVK  + K+  HA EYGAE LERMF ++P TKTYFPHFDLSHGSAQ+KGHG

Gg     1    MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHFDLSHGSAQIKGHG  60

 

Hs     61   KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP  120

            KKV  AL  A  H+DD+   LS LSDLHAHKLRVDPVNFKLL  C LV +A H PA  TP

Gg     61   KKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRVDPVNFKLLGQCFLVVVAIHHPAALTP  120

 

Hs     121  AVHASLDKFLASVSTVLTSKYR  142

             VHASLDKFL +V TVLT+KYR

Gg     121  EVHASLDKFLCAVGTVLTAKYR  142

 

Horse-Chicken [Gallus gallus]

Identities = 100/142 (70%),

Ec     1    MVLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHG  60

            MVLSAADK NVK  ++K+ GHA EYGAE LERMF  +P TKTYFPHFDLSHGSAQ+K HG

Gg     1    MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHFDLSHGSAQIKGHG  60

 

Ec     61   KKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTP  120

            KKV  AL  A  H+DD+ G LS LSDLHAHKLRVDPVNFKLL  C L  +A+H P   TP

Gg     61   KKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRVDPVNFKLLGQCFLVVVAIHHPAALTP  120

 

Ec     121  AVHASLDKFLSSVSTVLTSKYR  142

             VHASLDKFL +V TVLT+KYR

Gg     121  EVHASLDKFLCAVGTVLTAKYR  142

The molecular clock should never have been invented in the first place for macroevolution

Two kinds of sequence alignment can be made using the same set of sequence data.  The first aligns a recently evolved organism such as a mammal against those simpler or less complex species that evolved earlier such as amphibians and fishes.  The second aligns a simpler outgroup organism such as fishes against those more complex sister species that appeared later such as amphibians and mammals.  The first alignment indicates a near linear correlation between genetic distance and time of divergence, implying indirectly a constant mutation rate among different species.  The second alignment shows the genetic equidistance result where sister species are approximately equidistant to the simpler outgroup. This directly triggered the idea of constant mutation rate among different species.  Since both alignments use the same sequence data set, certain information may be revealed by either alone.  But the data that most directly and obviously support the interpretation of a constant mutation rate is the genetic equidistance result. 

The molecular clock hypothesis was first informally proposed by Zuckerkandl and Pauling in 1962 based largely on data from the first alignment [1].  Margoliash in 1963 performed both alignments and made a formal statement of the molecular clock after noticing the genetic equidistance result [2, 3].  “It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds.  Since fish diverges from the main stem of vertebrate evolution earlier than ether birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish.  Similarly, all vertebrate cytochrome c should be equally different from the yeast protein.”

The results of both alignments have two features.  One is obvious: distance in terms of percent identity, which directly provoked the clock idea.  The other is the overlap feature.  In the post of April 30th, 2009, I explained the overlap feature of the genetic equidistance result.  Here, I show that the first kind of alignment performed by Zuckerkandl and Pauling also shows the overlap feature, as would be expected since both alignments use the same sequence information and should tell the same story.  The clock idea should never have been invented in the first place if Zuckerkandl and Pauling had paid attention to this feature.

 

Again, I use cytochrome c of yeast (Sc), drosophila (Dm), and human (Hs) as an example.  What Zuckerkandl and Pauling had found, when applied in our cytochrome c case here, is that human is closer to drosophila than to yeast.  Human differs from drosophila in 22 positions and from yeast in 36 positions.  The overlap feature in this case is that most of 22 variant positions between human and drosophila are also variant between human and yeast.  This can be easily illustrated in the following alignment:

Dm              -GDVEKGKKLFVQRCAQCHTVEAGGKHKVGPNLHGLIGRKTGQAAGFAYTDANKA

Hs              -GDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKN

Sc              -GSAKKGATLFKTRCLQCHTVEKGGPHKVGPNLHGIFGRHSGQAEGYSYTDANIK

                 *..:** .:*  :* ****** ** **.******::**::*** *::** ** 

 

Dm              KGITWNEDTLFEYLENPKKYIPGTKMIFAGLKKPNERGDLIAYLKSAT

Hs              KGIIWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKAT

Sc              KNVLWDENNMSEYLTNPKKYIPGTKMAFGGLKKEKDRNDLITYLKKAT

                *.: *.*:.: *** *********** * *:** ::* ***:***.**  

The result shows that 17 of the 22 are also variant between human and yeast (these 17 positions are colored in purple and green).  The fact that the overlap is not 100% is because residues conserved due to common adaptation to environment between human and drosophila are different from those between human and yeast. 

The molecular clock predicts: 

The chance for a position to be different between human and yeast is 36/102.

The chance for a position to be different between human and drosophila is 22/102.

The number of overlap positions: 36/102 x 22/102 x 102 = 7.76. far short of 17.



There are only 7 positions as underlined below that are absolutely conserved among bacteria, yeast, plants, nematodes, and human.

Human  1    MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW  60

       61   GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKAT  103

So, a most realistic calculation of overlap should be 36/95 x 22/95 x 95 = 8.3 residues, far short of 17.

 

Even if we generously grant that 40 residues are absolutely non-neutral or non-variable, we still only get 36/62 x 22/62 x 62 = 12.77, short of 17. 


Again, Zuckerkandl, Pauling, and Margoliash all could have noticed the overlap feature.  If they had done that 46 years ago, the molecular clock (vastly different species have very similar mutation rates) would never have been invented in the first place for macroevolution.  It may have been invented for studying microevolution (identical or very similar species have very similar mutation rates) and may still apply in some cases of microevolution.  But its impact on the understanding of molecular evolution would be trivial. 

Acknowledgements:

I thank my college classmate Dr. Wei Shen for providing the alignment picture shown here, and for many valuable discussions. 

 

Reference:

1.         Zuckerkandl E, Pauling L: Molecular disease, evolution, and genetic heterogeneity, Horizons in Biochemistry. New York: Academic Press; 1962.

2.         Margoliash E: Primary structure and evolution of cytochrome c. Proc Natl Acad Sci 1963, 50:672-679.

3.         Kumar S: Molecular clocks: four decades of evolution. Nat Rev Genet 2005, 6:654-662.