Wednesday, September 23, 2015

The Genetic Equidistance Phenomenon at the Whole Proteomic Level

We have just submitted a manuscript. The abstract is here.

The Genetic Equidistance Phenomenon at the Whole Proteomic Level

Denghui Luo and Shi Huang

Abstract

The field of molecular evolution got started with the alignment of a few protein sequences in the early 1960s. Among the first results found at the time, the genetic equidistance result, has turned out to be also the most astonishing and unexpected by any evolutionary theory of the time. It hence directly inspired the ad hoc universal molecular clock hypothesis that in turn inspired the neutral theory. Unfortunately and unknown to most, however, what is only a maximum distance phenomenon was mistakenly transformed into a mutation rate phenomenon and became known as such. Previous studies have suggested the universality of this phenomenon based on results from a small set of selected proteins. We have now confirmed this by whole proteome wide studies of 7 different sets of proteomes involving a total of 15 species. All 7 sets showed that within each set of 3 species the least complex one is approximately equidistant in average proteome wide identity to the two more complex ones. Thus, the genetic equidistance result is a universal phenomenon of maximum distance. There is a reality of constant but stepwise increase in complexity during evolution, the rate of which is what the original universal molecular clock is really about. These results provide additional lines of evidence for the recently proposed maximum genetic diversity (MGD) hypothesis.



Figure 3. The constant rate of complexity increase. The fraction of identical residues between human and a lower complex species is equivalent to the fraction of non-changeable sites in the lower complexity species. The fraction of identical residues in cytochrome C (identity divided by length) between human and each of the species listed in the figure is plotted against the separation time between human and each of the listed species. Data for plots were obtained using homo cytochrome C to BLASTP Genbank.

Figure 4. The prime number staircase. The graph counts the cumulative number of primes up to 100.

The molecular clock interpretation of the maximum genetic equidistance result is really about the constant rate of complexity increases. People since Aristotle have long appreciated the direction of evolution towards higher complexity. Darwin’s theory has long denied this but only by ignoring inconvenient facts including the genetic equidistance phenomenon. The evidence for complexity increase is commonplace and easy to notice by common sense. The first molecular evidence for it is the maximum genetic equidistance phenomenon. What is most striking is the nearly constant rate as measured in years of the complexity increase, which could be quantitatively studied by the fraction of non-changeable positions in a protein or the fraction of identical residues between human and a lower complexity species (Fig. 3).

As nature is written in the language of mathematics, it would be most unusual if the most fundamental natural phenomenon, i.e., the constant rate of evolution towards higher complexity as measured in years, has no counterpart in mathematics. The most relevant mathematics that we could find is the pattern of prime numbers (du Sautoy 2003). The cumulative increase in prime numbers along the progression in natural numbers is well known to follow a nearly constant rate (Fig. 4). Here the progression in natural numbers is like a time clock, rigid and predictable. The appearance of prime numbers is discontinuous or staircase and unpredictable but follows nonetheless a well defined function Li(N) as shown by the Riemann hypothesis, widely known as the most important unproved problem in mathematics (du Sautoy 2003). Each new appearance of a more complex species is like a new prime number, unpredictable, discontinuous, and yet constant. Individual species are well known to appear in the fossil record abruptly as evidence for the punctuated equilibrium model of macroevolution has shown (Gould, Eldredge 1993). However, the discontinuous appearance of higher and higher complexity species still follows a very smooth and regular pattern as shown by the equidistance phenomenon. We speculate that the mystery behind the constant rate of complexity increase in nature might well turn out to be the same as that behind the constant appearance of prime numbers. Indeed, the common speculative and unproven answer to both mysteries has long been random forces.


Saturday, September 5, 2015

The branch of biology with the most mathematics is also widely known as the most soft! Why?

The field of population genetics and molecular evolution was largely founded by mathematicians/statisticians such as Fisher, Haldene, and Wright. Even pure mathematician like Hardy has contributed a key equation to the field. But contrary to naive expectations, this field of study is more like soft social science than to hard core physics. In the words of Jerry Coyne (an extremely enthusiastic propagator of the Darwinian evolution theory and a professor of evolutionary studies at the University of Chicago): "In science's pecking order, evolutionary biology lurks somewhere near the bottom, far closer to phrenology than to physics".

Why? The reason is simple. Math depends on assumptions or paradigms. The assumptions for hard core sciences are axioms or self evident intuitions. Euclid and Newtons axioms come to mind. They are all a priori true or self evidently true. In contrast, there is not a single assumption in the evolution field that is self evidently true or can qualify as axiom. Nearly all assumptions in that field are in fact self evidently false. Just a few examples, the infinite sites model, the neutral/junk DNA assumption, random mating, and the independent mutations assumptions.

Key figures in the field has also acknowledged this, as Ohta and Gillespie said in 1996: "all current theoretical models suffer either from assumptions that are not quite realistic or from an inability to account readily for all phenomena." (Theoretical Population Biology,1996, 49: 128-142) 

To show a flavor of the amount of math in the field, below are two pages of my notebook from an undergrad evolutionary genetics course taken 32 years ago at my Alma mater Fudan University.