Thursday, October 14, 2010

Models of micro- and macro-evolution and the overlap feature

I hope the following figure serves as an easy to grasp illustration of the evolution process according to the MGD hypothesis.

A. Microevolution

..........................................B ________________________ B


..........................................A ________________________ A

B. Macroevolution

..........................................C ________________________ C (more complex)


..........................................A ________________________ A

Figure 1. Evolutionary events according to the MGD hypothesis. A 10 amino acid peptide with each position represented by a number is used to illustrate the mutation events during evolution. X represents any amino acid. If two species have the same number at the same position, they share the same amino acid. If they each have X at the same position, they have different amino acid at that position. Positions 0-3 represent non-changeable residues due to functional or epigenetic restriction in all species. Position 0-5 represent non-changeable positions only in the more complex species C in the panel B model of macroevolution

Panel A. Microevolution. The ancestor species A’ undergoes microevolution and produces a pair of sister individuals A and B at some point during its evolution. Sister species A and B starts by sharing the same sequence for an othologous gene at the beginning of speciation. The genetic distance between of A and B gradually increases until reaching a plateau or maximum genetic distance (6 difference in this case).

Panel B. Macroevolution. The ancestor species A’ undergoes microevolution and gradually reaches some level of genetic diversity, during which time nearly every genome variation allowed within the MGD or compatible with the survival of A’ may have a chance to exist for a while. When one of these genome variations also happens to be compatible with a higher level of epigenetic complexity such as the genome as shown for sister individuals A and C at beginning of speciation, a punctuational increase in epigenetic complexity would take place in one of these sisters such as C. At the beginning of speciation sisters A and C have the same sequence for an othologous gene but only C has undergone an increase in epigenetic complexity, which in turn reduced the number of changeable positions from 6 to 4. After the epigenetic speciation phase at the beginning of speciation, the genetic microevolutionary phase immediately follows that would gradually create greater genetic distance between A and C. C can only undergo substitutions in a maximum of 4 positions while A can change in a maximum of 6. And the changeable positions of C largely overlap with those of A but not vice versa.

Based on this Figure, I can now better describe below the overlap feature and the distinction between the MGD hypothesis and the modern evolution theory, in a way that does not require readers to read my previous papers. This is part of an introduction section for a new manuscript that I am working on.

The genetic equidistance result, the most important and remarkable phenomenon of molecular evolution, shows that different species are approximately equidistant to a simpler outgroup in protein sequence similarity, as first reported by Margoliash in 1963 [1]. This result, together with those of Zuckerkandl and Pauling in 1962, inspired the molecular clock and in turn the Kimura neutral theory of macroevolution, which has been the foundation for the field of molecular evolution ever since its inception [2,3]. However, the molecular clock or constant substitution rate interpretation of the genetic equidistance result is in fact a tautology since it has not been verified by any independent observation and has on the contrary been contradicted by a large number of factual observations [4,5,6,7,8,9,10,11,12,13,14,15].

Recent work shows that the genetic equidistance result has in fact another characteristic, the overlap feature, which has been completely overlooked for nearly half-a-century [16]. A position where two or more species have each had a substitution event is termed an overlap position (Figure 1A, species A and B have 6 overlap positions). The genetic equidistance phenomenon minimally requires three species for sequence alignment, including two sister species and an outgroup that is not more complex. The overlap feature shows a large number of overlapped mutant positions where any pair of these three species is different in sequence. If after speciation, two species randomly accumulate substitutions with similar rate as assumed by the neutral theory, then the chance for a substitution in one species to occur at the same overlap position where the other species also has a substitution should largely follow chance or probability theory. Indeed, for microevolution of similar species such as among different strains of yeasts, the number of overlap positions relative to the total number of substitutions is small and consistent with probability calculation based on the neutral theory. However, for macroevolution of distinct species of different biological complexity such as yeast versus drosophila or orangutan versus human, the number of overlap positions is much greater than expected by chance. Thus, the overlap feature is one of the best pieces of evidence for a clear distinction between macroevolution and microevolution, where macroevolution is mostly about major changes in organismal complexity whereas microevolution is not.

The modern evolution theory consists of the Neo-Darwinian theory of natural selection and the neutral theory. The Neo-Darwinian theory is largely inadequate for understanding molecular evolution. As a result, the neutral theory, which trivializes natural selection and disconnects genotypes and phenotypes, was invented in order to at least have an ad hoc understanding of molecular evolution. While the neutral theory was originally only a population genetics theory, it was turned into a macroevolution theory by Kimura when he used it to explain the molecular clock, which treats macroevolution the same as population genetics [17]. However, as the overlap feature convincingly shows, the molecular clock and the neutral theory, while largely a complete theory for microevolution and population genetics, is not so for macroevolution and should never have been applied to macroevolution in the same way as to microevolution if people had not overlooked the overlap feature.

Unlike the modern evolution theory, the Maximum Genetic Diversity (MGD) hypothesis tightly unites genotypes and phenotypes and explains all the major facts of evolution in a coherent fashion via a single universal theme [7,18]. It is a necessary deduction of the First Axiom of Biology, which states that there exists an inverse relationship between genetic diversity and epigenetic complexity. Genetic diversity here is defined as percent position difference in aligned sequence in a homologous protein or DNA, which is largely contributed by point mutations. Epigenetic complexity is defined by the total number of cell types and epigenetic molecules, which is largely equivalent to our naïve notion of organismal complexity and consistent with an independent calculation of organismal complexity based on information theory [19]. It is common sense that genetic diversity cannot increase indefinitely with time and has a maximum limit being restricted by function or epigenetic complexity. A gene may function in many different cell types or epigenetic states. The more cell types in which a gene functions, the more functions it performs and the more functional constraints on the genetic diversity/mutation of the gene. The maximum genetic diversity of simple organisms is greater than that of complex organisms. The idea of functional constraints on sequence variation is also a well-accepted concept within the traditional theoretical framework. What is missing there however, which the MGD hypothesis now provides, is the intuitive idea supported by numerous facts and yet to be contradicted by any that different species have different functional/epigenetic constraints with simple species having less.
The maximum genetic distance concept of the MGD hypothesis appears superficially similar to the saturation idea within the traditional theoretical framework. However, there is a key difference. With the saturation idea, people can do corrections and change a 10% non-identity into 20% distance while still believing that such distance can go on to increase without a maximum cap, just that the increase is not linear and needs to be corrected for multiple hits. But the maximum cap concept says that the 10% non-identity is the maximum possible and will stay unchanged once reached. In other words, the saturation idea assumes that two sequences will continue to diverge, but at some point substitutions will overwrite previous substitutions, making the ability to discern continuing divergence difficult and requiring corrections. The MGD indicates that there is indeed a maximum divergence that can occur independent of saturation because of functional/epigenetic constraints. Saturation does not take into account the well known non-independence of basepairs in a sequence and the functional/epigenetic constraints on sequence, but MGD does take this into account and thus provides a stronger theoretical framework.

The MGD hypothesis defines macroevolution and microevolution differently from the standard definition and considers them distinctly different. Macroevolution involves major changes in epigenetic complexity or organismal complexity while microevolution does not although it may involve minor epigenetic changes without a major effect on complexity. Macroevolution involves a fast and punctuational epigenetic event whereas microevolution is largely a slow process of random point mutations. Macroevolution, as shown for two splitting species A and C in Figure 1B, automatically includes microevolutionary mechanism as part of the speciation process since the events following the punctuational epigenetic change are largely microevolutionary. Thus macroevolution consists of two different phases with distinctly different mechanisms, the epigenetic punctuational phase and the subsequent genetic microevolutionary phase (Figure 1B). Thus the MGD hypothesis but not the modern evolution theory predicts both punctuation and stasis at the level of epigenotypes and in turn at morphological levels, which is well supported by the major patterns of the fossil record [20].

While molecular changes in the epigenetic phase of macroevolution certainly involves some DNA changes such as chromatin reorganizations and gain/loss of genes encoding epigenetic molecules, such changes are obviously also about epigenetic reorganizations. The MGD hypothesis includes the proven virtues of the modern evolution theory as a component specific for microevolution (Figure 1A) as well as for the microevolutionary phase of macroevolution over time scales that are not yet long enough for a slow evolving gene to reach maximum distance (Figure 1B). For the epigenetic phase of macroevolution towards greater complexity, however, the MGD hypothesis suggests that the genetic diversity of the more complex species would be reduced by the epigenetic change (Figure 1B). In contrast, the modern evolution theory assumes that the same mechanism applies to both macroevolution and microevolution and that there is no suppression of mutations accompanying an increase in organismal complexity. It also implicitly assumes no limit on genetic distance/diversity no matter how long evolution has been going on or how fast mutation rate has been for certain fast evolving genes. These unproven assumptions implicitly negate the First Axiom of Biology and are intuitively implausible and have met with only contradicting factual observations. .

The overlap feature of the genetic equidistance phenomenon is one of the best pieces of evidence for the MGD hypothesis and against the modern macroevolution theory [16]. It is a well-established observation as well as intuitively sensible that most of the changeable or non-constrained positions in a gene in a complex species are also changeable in a less complex species (Figure 1B). When MGD has been reached during macroevolution, most of the changeable positions in any species would have undergone substitutions, thus leading to a large number of overlap positions close to the total number of actual substitutions and much greater than expected by chance (Figure 1B, showing 4 overlap positions out of 4 total actual substitutions in species C). Here, some of the non-changeable positions in a complex species would overlap with the changeable positions in a simple species (Figure 1B, position 4-5). These non-changeable positions do not undergo substitution during evolution but sequence alignments may not reveal this. In fact, the modern evolution theory treats sequence difference between two species as equally contributed by substitutions in each species, which is true only for microevolution (Figure 1A) or for the microevolutionary phase of macroevolution when the MGD of the more complex species has not yet been reached (Figure 1B). For macroevolution over long time when fast evolving genes have reached MGD, the MGD hypothesis suggests that sequence difference in these fast genes between two species of different complexity is a reflection of the MGD of the simple species and is largely caused by substitutions in the simple species since the complex species would undergo less substitutions (Figure 1B, the maximum distance between A and C is 6 and is equal to the MGD of the simple species A).

For microevolution of similar species over long evolutionary time so that MGD has been reached, one also observes a large number of overlap positions close to the total number of actual substitutions and much greater than expected by chance (Figure 1A showing 6 overlap positions out of 6 total actual substitutions). In contrast, for microevolution of short evolutionary time scale or for slow evolving genes prior to reaching MGD, the number of overlap positions relative to the total number of actual substitutions would be small and consistent with calculation from probability theory within the traditional theoretical framework. For example in Figure 1A, if only 3 substitutions have occurred in each species A and B out of 6 changeable positions, the number of overlap positions can be calculated as 3/6 x 3/6 x 6 = 1.3, which is much smaller than the total number of actual substitutions.

The MGD hypothesis is unusual because it actually consists of two different components, one is a genetic mechanism for microevolution and for the microevolutionary phase of macroevolution and the other is an epigenetic mechanism for the epigenetic punctuational phase of macroevolution. But these two are not disconnected and are in fact tightly linked by an inverse relationship as described by the First Axiom of Biology. This relationship dictates that one must be suppressed in order for the other to advance. Thus, point mutations are good for microevolution and for leading to a genotype suitable for a higher level of epigenetic complexity, which is necessary for the epigenetic phase of macroevolution to take place, but must be suppressed if increase in epigenetic complexity is to be maintained during the microevolutionary phase of macroevolution (Figure 1B). Conversely, epigenetic complexity must not increase in order for point mutations to take care of adaptive changes.


1. Margoliash E (1963) Primary structure and evolution of cytochrome c. Proc Natl Acad Sci 50: 672-679.
2. Zuckerkandl E, Pauling L (1962) Molecular disease, evolution, and genetic heterogeneity, Horizons in Biochemistry; Kasha M, Pullman B, editors. New York: Academic Press.
3. Kumar S (2005) Molecular clocks: four decades of evolution. Nat Rev Genet 6: 654-662.
4. Huang S (2009) Molecular evidence for the hadrosaur B. canadensis as an outgroup to a clade containing the dinosaur T. rex and birds. Riv Biol 102: 20-22.
5. Huang S (2008) Ancient fossil specimens are genetically more distant to an outgroup than extant sister species are. Riv Biol 101: 93-108.
6. Huang S (2008) The genetic equidistance result of molecular evolution is independent of mutation rates. J Comp Sci Syst Biol 1: 092-102.
7. Huang S (2009) Inverse relationship between genetic diversity and epigenetic complexity. Preprint available at Nature Precedings
8. Pulquerio MJ, Nichols RA (2007) Dates from the molecular clock: how wrong can we be? Trends Ecol Evol 22: 180-184.
9. Laird CD, McConaughy BL, McCarthy BJ (1969) Rate of fixation of nucleotide substitutions in evolution. Nature 224: 149-154.
10. Jukes TH, Holmquist R (1972) Evolutionary clock: nonconstancy of rate in different species. Science 177: 530-532.
11. Goodman M, Moore GW, Barnabas J, Matsuda G (1974) The phylogeny of human globin genes investigated by the maximum parsimony method. J Mol Evol 3: 1-48.
12. Langley CH, Fitch WM (1974) An examination of the constancy of the rate of molecular evolution. J Mol Evol 3: 161-177.
13. Li W-H (1997) Molecular evolution. Sunderland, MA: Sinauer Associates.
14. Nei M, Kumar S (2000) Molecular evolution and phylogenetics. New York: Oxford University Press.
15. Avise JC (1994) Molecular markers, natural history and evolution. New York, NY: Springer.
16. Huang S (2010) The overlap feature of the genetic equidistance result, a fundamental biological phenomenon overlooked for nearly half of a century. Biological Theory 5: 40-52.
17. Kimura M (1968) Evolutionary rate at the molecular level. Nature 217: 624-626.
18. Huang S (2008) Histone methylation and the initiation of cancer, Cancer Epigenetics; Tollefsbol T, editor. New York: CRC Press.
19. Jiang Y, Xu C (2010) The calculation of information and organismal complexity. Biology Direct 5: 59 doi:10.1186/1745-6150-1185-1159.
20. Gould SJ, Eldredge N (1993) Punctuated equilibrium comes of age. Nature 366: 223-227.