Friday, December 11, 2009

New manuscript on the overlap feature of the genetic equidistance result

I am submitting this week a manuscript by me entitled “The overlap feature of the genetic equidistance result, a fundamental biological phenomenon overlooked for nearly half a century”. The work establishes an overlooked feature of a fundamental result that directly triggered the molecular clock and neutral theory, the dominating theory in the field of molecular evolution. If this old result had been correctly interpreted from the beginning, we would have seen a very different half-century of research on the molecular record of evolution. Epigenetics should have an important role to play in the hereditary transformation of life forms and hence in a complete theory of evolution. It is time to incorporate epigenetic into the field of evolution. Indeed, many facts of evolution simply cannot be explained without epigenetics, including the old result that directly triggered the field of molecular evolution.

The genetic equidistance result shows that different species are approximately equidistant to a simpler outgroup in sequence similarity, as first reported by Margoliash in 1963. This result, together with those of Zuckerkandl and Pauling in 1962, inspired the molecular clock and in turn the neutral theory. Here, it is shown that the clock/neutral theory had from the beginning overlooked another characteristic of the equidistance result, the overlap feature, which shows a large number of overlapped mutant positions where any pair of any three species is different provided that the species concerned differ from one another in complexity as a result of macroevolution. In contrast, when simple organisms of similar complexity and of short evolutionary divergence are compared, there are only a small number of overlaps largely consistent with chance or the neutral theory. Thus, the overlap feature is one of the best pieces of evidence for a clear distinction between macroevolution and microevolution. The full reality of the equidistance result strongly supports the Maximum Genetic Diversity Hypothesis, a more complete account of hereditary changes based on an inverse relationship between genetic diversity and epigenetic complexity.

P.S. The manuscript defines more clearly the overlap feature and corrects some minor errors in my earlier posts on the overlap feature.

What CNE means for the neutral theory?

A new paper of interest this week:
McEwen GK, Goode DK, Parker HJ, Woolfe A, Callaway H, et al. (2009) Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates. PLoS Genet 5(12): e1000762. doi:10.1371/journal.pgen.1000762

Some general features of CNEs are well summarized by this paper on CNE (conserved non-coding elements). It says: “These elements (CNE) appear to be largely absent in invertebrates.” Another paper said: “(CNEs of invertebrates) are less frequent and are smaller in size than in vertebrates.”

Kimura, a founder of the neutral theory, said: “The neutral theory also asserts that most of the intraspecific variability at the molecular level (including protein and DNA polymorphism) is essentially neutral.” (1) It is interesting to note that this new paper managed to avoid the word ‘neutral theory’ completely, the theory most relevant in interpreting sequence similarity. This recent paper (2) is another one that managed doing that.

Most CNEs would be such sequences that differ between vertebrates and invertebrates and should therefore be neutral rather than functional if the neutral theory is correct. But since CNEs are functional, we have no choice here but to conclude that the neutral theory is incorrect. In fact, CNEs are just the latest evidence among many that deem the neutral theory correct only in the domain of microevolution or population genetics dealing with identical or very similar species.

From the neutral theory perspective, here is how it interprets genetic distance in a typical sequence such as cytochrome c. Human is closer to mouse than to chicken. But the closer distance between mouse and human is merely due to short time of divergence. If given ~245 more million years, human and mouse would have a distance similar to what is now observed between human and chicken. This interpretation by the neutral theory is of course the basis for all molecular trees inferred from genetic distances.

But this perspective cannot explain among many things the case regarding vertebrates and invertebrates. Both groups appeared at about the same time during the Cambrian explosion. So if time is the only variable for sequence diversity, conserved sequences found in one group should be similarly conserved in the other group. But this is clearly not the case. As your work and many others (2) show, sequences in vertebrates are more conserved than in invertebrates. For example, vertebrates have a largest observed distance of ~20 aa difference between lamprey and fish in cytochrome c. But within drosophila or insects it is ~30 aa difference.

To say that invertebrates evolve faster will not work because there are just too many pieces of evidence against that. For example, yeast is equidistant to drosophila and human in cytochrome c (36/102 aa difference). Such approximate equidistance holds for nearly all homologous genes among yeast, drosophila and humans (3).

The only sensible explanation to me that has no contradictions is to say that vertebrates are more complex than invertebrates and can tolerate far less random mutations. Many of the neutral sequences in invertebrates become non-neutral in vertebrates, and hence we have far more CNEs in vertebrates. The maximum number of neutral positions in invertebrates is higher than in vertebrates according to the Maximum Genetic Diversity hypothesis (4, 5). What are causing the conservation of CNEs in vertebrates? It is those extra functional constraints only found in vertebrates. More complexity demands more functions from a sequence. Complexity is well known to be not linked with absolute amount of sequences but is with how sequences are used (epigenetics). CNEs are epigenetic elements. There are more ways of using a given sequence in complex organisms, which puts extra constraints on the variability of the sequence.


1. Kimura M. DNA and the neutral theory. Phil Trans R Soc Lond B 1986; 312:343-54.

2. Halabi N, Rivoire O, Leibler S, Ranganathan R. Protein sectors: evolutionary units of three-dimensional structure. Cell 2009; 138:774-86.

3. Huang S. The genetic equidistance result of molecular evolution is independent of mutation rates. J Comp Sci Syst Biol 2008; 1:092-102.

4. Huang S. Inverse relationship between genetic diversity and epigenetic complexity. Preprint available at Nature Precedings 2009;

5. Huang S. Histone methylation and the initiation of cancer, Cancer Epigenetics. New York: CRC Press, 2008.

Sunday, December 6, 2009

Building Complexity and the main pattern in evolution

Building complex and robust machine may have more ways than just one. To have backups or overlapping systems is one way of doing it, the most stupid and awkward way. Instead of a single engine in a small plane, a more complex large plane may have 4 engines with 2 of them as backups. But is that the nature’s way for using DNA to build complex life? Do we have an extra heart or brain or any organ as pure backups? We don’t. If we don’t do it at the phenotype level, do we do it at the genotype level? Mostly not. Our gene numbers are much smaller than any one had predicted because complexity is not linked to an increase in gene numbers or in back up genes. Nature is smarter and does it by combinations of genes and by inventing novel and complex ways of using the same set of genes. And by giving extra functions to an existing gene. We create music not by inventing more notes, and the ways of using the existing notes are already infinite.

To add back up or overlap system is not really an increase in complexity. A four engine plane and a one engine plane has the same level of complexity as far as the engine is concerned. To have a back up brain, we still need to have a complex brain in the first place. For that to happen, disorder and random mutation must be suppressed. Who can imagine a brain capable of infinite order like mathematics could tolerate a level of disorder/randomness in its building blocks like that of a flu virus or any simple virus like early life forms at the beginning of evolution?

The backup way is also not sound for DNA based lives because, in my thinking, it increases the size of the genome and hence the target size for mutations. The backup copy is not expressed or functional in normal situations and therefore not maintained by natural selection and can easily lose its function due to accumulation of mutations. Thus, it is safe to predict that most paralogs of a gene in a complex organism have unique functions and are not just backups (plenty of data for this). The claim of complex organisms have more backups is simply wishful thinking and not supported by facts. Don’t we have a lot of single mutation diseases in humans?

For both genotypes and phenotypes, nature follows the rule of use it or lose it. Backups may have been invented once but would simply be lost due to disuse. Which is more effective in advancing complexity: to decrease mutations or to use backups? All facts of nature say the first. It is simply a reality that a theory based on that notion explains all facts whereas any theory that ignores it meets with countless contradictions.

There is an extremely common mistake in the evolution field that has infected the lay public. It is to ignore the main pattern and use whatever trivial pattern/facts to suit our theory and to invalidate the main pattern when our theory does not predict it. Given the infinite amount of data/facts of nature, any stupid theory can find some factual support, if the goal is not to account for the major patterns or is not to explain all without contradiction. The advance with time in complexity is the dominant pattern in evolution that is so obvious that it is hardly worth stating. (The best ancient Chinese thinkers from 5000-7000 years ago had always placed man above all else in nature and as equal in status to the creative power of nature namely yang/heaven and yin/earth as written in I-ching, which has been the foundation for the most long lasting civilization as well as the world view of the largest population on Earth. Ancients have much better intuitive sense than moderns simply because their focus is less distracted by trivial things or man-made artifacts, and intuition is the foundation of science.) But since our theory does not predict that, we ignore it and cite trivial cases of randomness to support no direction towards complexity. Or we use trivial and much less common cases of complexity loss like loss of limbs in snakes as evidence for no direction towards higher complexity. Or we cite abundant cases of no change in complexity during microevolution. But all these merely indicate that in addition to complexity increase, there is also another trend for stability or no change. One cannot use a single mechanism to explain two opposite major trends, which is what we are doing.

We ignore the order/beauty/complexity of our big brain, and cite examples of imagined imperfections in some organs as evidence for the imperfections of nature or evolution. We ignore the general perfection of the human body and cite examples of rare diseases to fault the power of nature/evolution. In every case, the main pattern says that nature/evolution is all good, order, and beauty. The existence of disorder, randomness, and ugliness are all trivial and minor patterns. It is simply nonsensical to focus on the minor patterns and to turn blind to the main patterns. We should not explain the trivial at the expanse of the main but that has become a habitual behavior to most followers of evolution, another simple indication that we are not on the right path in understanding the main pattern.

Darwin's theory is coherent and hypothesis generating? Can a theory be proven true?

Stanley Sessions responded to my comments at Nature Precedings:
"I think I understand at least some of what you are getting at…wonderful! I think of the schizoid pickle physics is in right now compared to biology, the latter thanks to the remarkable coherency of evolutionary theory. However, you get into some dangerous territory when you speak of scientific theories “proven to be true”. You can do that in math, but isn’t that why mathematics is usually stuck in Humanities with philosophy rather than in the Sciences? Finally, to me the Theory of Evolution (like any good theory) is a good theory not so much because of what it explains, but to the degree that it is useful in generating testable hypotheses."

my reply:
I am not surprised that you would consider the existing evolutionary theory remarkably coherent. It is common for lay people and scientists not in the field or all casual believers to take that view, and I myself was among them just a few years back. But any disinterested professional who is familiar with the details would think otherwise. The devil is in the details. For a true theory, the more details you know, the more you would marvel at its beauty. The opposite is true for a false theory. So, it does not bode well for any theory to claim a lot of lay admirers. “When told that only 3 men in the world understood Relativity, Sir Eddington asked "I wonder who is the third?" Lay people and non-specialist scientists have no business in judging a science theory one way or the other. Evolution theory should not be simpler than relativity or easier to grasp for lay people. After all, it is evolution that created the brain for understanding relativity. A science that is so simple that a lay person could feel like he knows all about it, as is commonly seen in forums on evolution, can hardly be called science. The 3000 year old book I-ching is the most mysterious book ever written and only a handful of people the like of Congzi can really claim some understanding of it, and yet that has not prevented it from being the very foundation for the world view of the largest population on Earth, most of whom have no first hand understanding of it.

A coherent theory should not lead 100% lay people and 99% biologists to have the wrong answer to a most basic question that has been extensively studied for nearly half a century. Which, frog or human, is closer to fish in sequence similarity in any given gene? And yet the other side of the same question can be answered correctly by nearly all people, thanks to 150 years of teaching Darwinism. Which, frog or fish, is closer to human in sequence similarity in any given gene? If you don’t believe me, try these two questions on your biology colleagues. Being a casual believer of Darwinism until a few years ago, I, a professional biologist with 25 years of research experience, answered incorrectly a few years ago and found the right answer the hard way, by reinventing the wheel (no specialist has offered it in any books for the non-specialists and what is being offered can only mislead). That was truly a shocking experience and a big reason for me to get into studying evolution or to try to solve the puzzle myself given that few cared to do it. I have done enough to realize there is a big hole in our present state of affairs. The seeming coherency is a huge bubble. The fact that one must use Darwinism for phenotypes and the negation of it (neutral theory) for genotypes, despite the inseparable unity of genotypes and phenotypes, would easily strike any biology student as odd and incoherent.

The only reason that specialists seem to have a consensus view in public is because they have all agreed that contradictions are no big deal and that they can never be certain about anything in evolution, more for serve-serving than for the love of science/truth. They of course know it is a big deal in their private moment, as is apparent from the fact that you rarely see them teach the contradicting details to the non-specialists in books such as Why Evolution Is True. (While another person may easily read this as a conspiracy, I would just view it as simple human nature) One of the results of their selective teaching is that nearly all people know the answer to one side of a question while nearly no one knows the other side of the same question as mentioned above. In contrast, physicists, more of a hard core scientist, will not easily settle for any theory with contradictions, and hence no consensus yet on the cutting edge physics as you have correctly observed. But that is healthy for science and is what science is all about. Evolution research could for its own great benefit really use some challenge to the paradigm, especially a voice asking for a contradiction free theory and for not stretching a contradiction free theory of microevolution to where it meets contradictions and hence does not belong. No truth is afraid of a competitor because there can only be room for one truth, and there is no possibility of two theories explaining nature equally well.

My point is mostly about proving a theory wrong. It is debatable whether a theory cannot ever be proven true (see below). But it is often overlooked by biologists that a single contradiction/exception should prove a theory wrong. Your point about generating testable hypothesis is absolutely essential. But that is really a minimum, and meeting that standard is not enough. The theory must also be prepared to accept all the consequences of the test, one of which is to kill the theory or at least narrow its domain of relevance. If the theory stands regardless of the test results, then it is not a scientific theory. Unfortunately, the modern evolution theory is like that. I have personally found and published exceptions/contradictions to the theory and few cared. And many others have of course experienced the same. Few textbooks ever highlight those contradictions. And yet a sharp reader can still infer from what is not said. What he does not read anywhere is a statement by a reputed evolutionist that the modern evolution theory has no contradictions. But I do not waste energy to complain and I am fully aware just like any good scientists that a theory is never overthrown by contradictions but must be pushed away by an alternative better theory. Scientists who overlook those contradictions do it at their own peril as they are wasting their careers by barking up the wrong tree. There will always be people though in small number who will pay attention to those contradictions and try to do something about them and hence have a chance to make history.

For theory testing to be meaningful, the standard of no single contradiction is essential. If you allow contradictions, why bother testing in the first place? No doubt the modern evolution theory has been tested true or generated testable hypotheses but of course only in microevolution (e.g., Lenski’s and Grants’ experiments on bacteria and finches). It is equally clear that it is largely irrelevant to macroevolution if we use the same standard as we use for claiming it true for microevolution. All the contradictions are in macroevolution and none in microevolution.

A theory that has not even explained all known relevant facts is not even worth testing because it is already false. It has already allowed exceptions, which makes testing meaningless. To explain all existing relevant facts is therefore a minimum for any theory to qualify to be tested. And if a theory has explained all relevant facts, it would be impossible for it not to be able to generate testable hypotheses that would just confirm it further. Explaining a fact is the same as predicting it when a theory is axiom based rather than induced from factual observations. If a theory is induced from 10 known facts, then none of these 10 can serve as evidence for it (to do so is a tautology) and further testing is necessary. In contrast, if a theory is deduced from an axiom, then it is self evidently true and can exist independent of any facts and can rightfully claim all facts as evidence if it deduces/predicts/explains all of them.

I here provide a simple rationale for why a theory that has explained all known key facts without a single contradiction must be true or has been proven true. We first grant that for any domain of nature there is only one unique true theory or law that explains or governs it. This is at least 100% true for all the known laws of nature. If a theory has explained all key facts without contradiction, it is effectively indistinguishable from an ideal true law. Since there cannot be two true laws or more than one true law for any single domain, any theory that explains all key facts is effectively the true law. What are key facts? A key fact is any representative of a key class of facts that have the same fundamental feature. The knowledge of today cannot possibly have all the facts of a class known, which may be infinite. But it is not unrealistic for today’s knowledge to include at least one fact out of every major key class of facts of certain domains. And explaining one such representative of a key class is of course equivalent to explaining that whole class. In most cases, to explain all the known key classes in a coherent fashion (while granting the existence of some unknown key classes) is simply impossible unless one has the true law, given that all facts are connected by a single law and cannot possibly be connected by any laws other than the single unique true law.

Have Newton’s laws of motion been tested by all possible tests? Of course not, impossible to do. Are they proven laws? Of course they are because they explained every key fact known even if we grant there are unknown key facts and tested true in every test within their domain of relevance. It is also important that they are axiom based or self-evidently true, which means that their truthfulness is only a matter of how general or broad of their domain of relevance rather than having a possibility of being not true at all in any domain. Are they also wrong? Yes, outside of their domain of relevance.

The nearly half century old genetic equidistance result of Margoliash for cytochrome c is a key fact that is representative of most genes of most species. It is the answer to the above simple question on frog/fish that almost all people fail. How this single fact is interpreted determines how modern evolution research is to proceed and how half of the record of evolution (the DNA half vs the other half fossils/phenotypes) is to be interpreted. Its incorrect and tautological interpretation, the molecular clock and the neutral theory, has misled the field of evolution for nearly half a century with grave consequences. For one, no non-specialists know much about it and the less they know it, the more coherent they view Darwinism, which suits the Darwin specialists just fine. Some actual data for this. The wiki page on Motoo Kimura has a pathetic ~1500 views compared to Darwin with ~260, 000 views a month, never mind that it is Kimura’s theory rather than Darwin’s that is most relevant to modern evolution research on DNAs in the past half century. Also, the page on the neutral theory has 3000 while natural selection has 90,000 views a month.

The other grave consequence is to misidentify chimpanzee as our closest blood relative among non-humans in total violation of the fossil record and common sense. Another is to give us endless conflicting interpretations on the phylogenetic position of some species such as tarsiers. A set of conflicting interpretations must contain a false one. Any theory that can turn perfectly solid DNA data into a false interpretation of reality can only be incorrect, regardless whether it sometimes can also produce a correct interpretation by accident. Any theory that has turned DNA data into conflicting interpretations of the reality of phylogeny has of course self-proven itself false.

The fact that Darwin’s theory can coexist in peace with such a false theory of molecular evolution in a form termed the modern evolution theory does not bode well for it. All these theories are from the beginning based on population genetics and should have just stayed where they really belong, microevolution and population genetics. As far as I am concerned, no theory of any kind, except the maximum genetic diversity hypothesis, has correctly explained the equidistance result and thus has any chance of being true for the whole domain of evolution or has even qualified as worth further testing.