Wednesday, October 28, 2009

The MGD hypothesis in brief

A colleague recently asked to briefly summarize the MGD in laymen terms and its distinction from the modern evolution theory. I gave it a try in the following.

1) The reality of maximum genetic distance/diversity (MGD):
For any gene with a biological function, certain mutations that destruct the function will not be tolerated by the organism while those neutral or beneficial ones will. So functionally important residues cannot be mutated. For two closely related or identical individual organisms, their genetic distance will increase with time due to accumulation of mostly neutral mutations. After a while, they reach a maximum, e.g, their distance may go from 0% in the beginning to maximum 60% non-identity in 60 million years. They cannot pass the 60% maximum because any more mutations will hit the key residues and will abolish gene function and thus affect organism viability. MGD is the maximum amount of mutation a gene can tolerate in a particular organism. A gene with a MGD of 60% in organism X means that a maximum 60% of this gene’s sequence can be mutated or tolerated in organism X (the 40% non-changeable residues consists of mostly key residues as well as some important for adaptation to environment that may change form time to time or environment to environment).

The more function a gene performs, the more functional constraints on its mutation, and the less the MGD for this gene. A gene can perform related but slightly different functions in different cell types. A gene performs more functions in complex organisms with more cell types than in simple ones. Thus, complex organisms confer more functional constraints on genes. Therefore, the MGD of a gene in complex organisms is lower than that of its ortholog in simple organisms. For example the bare bone function of a gene in a single cell organism may require only 30% key or non-changeable residue. The same gene in a complex multicell organism will on top of the bare bone function gain extra key residues that will play important functions only relevant to complex organisms. The first example of this kind of CAPS (complexity associated protein sectors) is recently described in this Cell paper (1). And also see my comment on the paper at the Cell website (2).

2) The reality of fast and slow evolving genes:
Fast evolving/mutating genes reach MGD faster than slow mutating genes. Once a gene reaches MGD, genetic distance between two species as measured by this gene will no longer correlate with time. Thus, only slow evolving genes prior to reaching MGD are informative for phylogeny.

3) The modern evolution theory fails to recognize two key realities:
All existing methods of molecular phylogeny rely on assumptions that are not 100% true. They assume that genetic distance always correlates with time of divergence, which is not at all the case in reality. They do not take into account two key realities: the MGD and the difference between slow and fast evolving genes. Thus, their failure to take into account all major realities guarantees that they cannot produce conclusive and correct phylogenies.

4) The slow clock method, the first method that takes into account all reality:
The slow clock method based on the MGD is the first phylogeny method that is based on a 100% true description or pattern of reality. That is, genetic distance always correlates with time of divergence only for slow evolving genes prior to reaching MGD. The slow clock method is described here (3).

Ref.
1. Halabi, N., Rivoire, O., Leibler, S., and Ranganathan, R. (2009). Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774-786.

2. Huang, S. (2009) Complexity associated protein sectors, comment on Cell website, http://www.cell.com/comments/S0092-8674%2809%2900963-5

3. Huang, S. Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosimian clade containing tarsiers . Available from Nature Precedings (2009)

4. Huang, S. (2009) Inverse relationship between genetic diversity and epigenetic complexity. Preprints at Nature Precedings, http://precedings.nature.com/documents/1751/version/2

I hope you find this helpful and you are always welcome to come to me with any questions. I may not have expressed in the best possible way. I necessarily simplified things more than I wished given that you only want a brief description. For more accurate descriptions, my papers would be a more reliable source.

An old paper and the MGD

In recent email discussions with one of the pioneers of molecular evolution, I became aware of a 1986 paper that suggested the idea of a limit on protein sequence divergence, similar to the MGD. But these authors only recognized a limit for bacteria but failed to see similar limit also exists for nearly all organisms regardless of time of divergence, so long evolution is long enough or more than a few million years. They failed to see the inverse relationship between 'limit' and organismal complexity. They correctly recognized that the limit is determined by function but failed to see the key role of epigenetic complexity in adding more limit in addition to that determined by barebone function. They also did not explicitly point out that the idea of limit is contradictory to the molecular clock paradigm that does not recognize the concept of limit and assumes that everything is more or less on a linear range of divergence during the past 3 billion years of life.

Meyer, T. E., M. A. Cusanovich, and M. D. Kamen. 1986. Evidence against use of bacterial amino acid sequence data for construction of all-inclusive phylogenetic trees. Proc. Natl. Acad. Sci. USA 83:217–220.

abstract
It has been proposed that phylogenetic trees, intended to show divergence of eukaryotic protein and nucleic acid sequences, be extended to include those from bacteria. However, we have compared the amino acid sequences of 18 of the most divergent mitochondrial cytochromes c with those of 18 bacterial cytochromes c2 and have found that the average percentage difference between these mitochondrial cytochromes c and cytochromes c2 was not significantly greater than that among the cytochromes c2 alone. The large discontinuities in physical-chemical properties recognized between the prokaryote and eukaryote cytochromes render it highly improbable that members of the two classes should be no more different from one another than members of either class alone, assuming that sequence differences can accurately reveal evolutionary divergence. Instead, we propose that divergent amino acid sequences approach a limit of change considerably less than for comparison of random sequences. This limit of change presumably is determined by the structure/function relationship. When two homologous protein sequences have reached such a limit, convergence or back-mutations and parallel mutations become as frequent as divergent mutations. As two diverging proteins approach this steady-state condition, sequence differences no longer reflect the numbers of mutations resulting in amino acid substitution and therefore species cannot be positioned on a phylogenetic tree. Insertions and deletions are less reversible than are amino acid substitutions and, provided they are well-documented, might be more reliable indicators of bacterial relationships. Nevertheless, we suggest that data available on bacterial protein sequences do not permit construction of all-inclusive phylogenetic trees. Comparisons of protein and rRNA trees suggest that similar restrictions apply to use of rRNA sequence data.