Monday, November 12, 2012

Purifying selection and phylogenetic trees


 
The author of the Y chr work Dr. Melissa Wilson Sayres thought that purifying selection may only affect estimation of divergence time but not the general topology of a tree. This essay explains why this may be mistaken. There are several independent ways to explain why purifying selection will affect tree topology in addition to timing.

 

1) The neutral theory never predicts that sequences under purifying selection are equally suitable for building phylogenetic trees, even if just for the topology part of it. The key concept of the neutral theory is that most observed natural variants are not under purifying selection.  Most are neutral and some (very few) are beneficial. The new data on Y chr invalidate the neutral theory.  And if the neutral theory is invalid, all molecular trees today would have no sound theoretical basis.  In fact, in our view, the neutral theory was mistaken right from the start when it mistakenly interpreted the genetic equidistance result that got the field started.  That result was the best evidence for purifying selection and absence of junk/neutral DNAs.   

 

2) Common purifying selection would lead to common shared sequences, which would dramatically affect topology. (It may be easier for one to imagine shared sequences due to positive selection. But it really is pretty easy to think the same for purifying selection.)  Thus, the close similarity between human and chimpanzee in fast evolving sequences including the Y, which are all under purifying selection, merely indicates common purifying selection rather than common ancestry. Our recent paper in Science China shows that when using slow evolving sequences not under selection, chimp and human can be shown to belong to separate clades with all three great apes in the pongid clade.

 

3)  If population A has high genetic diversity while B low in most genome sequences, the typical interpretation today is that A evolved longer than B and gave rise to B.  But this topology could be completely reversed if most sequences are under purifying selection with A under more relaxed selection than B.  Here the true topology as revealed by the slow evolving sequences may show that B evolved first and has higher genetic diversity in the informative sequences.  We will soon have a paper to this effect. 

 

4) Nothing is truly neutral.  All variants, being random and disorderly in origin, have a deleterious aspect.  A major effect variant causing great harm never has a chance to behave as neutral and is negatively selected immediately within one generation after it emerges.  In contrast, most minor effect variants would exist as neutral for a long time or many generations before being negatively selected when the accumulation of too many such variants exceeds the maximum tolerable level that an organism can tolerate.  Simple organisms can tolerate more.  Therefore, only slow evolving sequences that have variant numbers still below the maximum tolerable level are informative to tree topology as well as timing.  That more slow evolving and hence more conserved sequences have apparently observed neutral variants may seem counter-intuitive but actually makes sense. Since changes in the slow evolving sequences take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection.

 

5) Nearly all the ‘surprise’ results reported at the ASHG2012 meeting can be easily explained by purifying selection and the MGD.  Iceman Otzi from ~5000 years ago was found not to show similarity to Europeans today in Central Europe in most fast evolving sequences, but rather to Sardinians, which is considered surprising.  Also, Iceman is related to other Central European farmers (but not hunter gathers) from 5000 years ago.  Even more surprising is that the mtDNA of Iceman does not resemble any humans today.  Well, all these are evidence for the MGD.  The sequences under purifying selection 5000 years ago are of course expected to be very different from those of today.  My graduate students are right now busy verifying that the Iceman will be inseparable from Central Europeans today in slow evolving sequences.  Another surprise mentioned at the meeting was that the timings of out of Africa modern humans to arrive at Spain (very close to Africa) and at Australia (very far) are about the same ~45K years ago.  In truth, these timing are based on genetic diversity levels from fast evolving sequences.

 

6) The fact that the Y chr trees are in general agreement with those of mtDNA trees and genome average trees merely indicates that all these sequences share something in common in terms of relevancy to phylogeny.  They may be equally informative or equally non-informative to phylogeny.  If any one of these sequences is shown to be non-informative, it would mean the same for all these sequences. Thus the fact the y chr is under strong purifying selection means the same for mtDNA and genome average. 
 
 

Sunday, November 11, 2012

Evidence of natural selection of Y chr invalidates dating of human divergence history using Y chr markers

I attended the American Society of Human Genetics 2012 meeting last week.  The following work presented by a Berkeley lab at the meeting shows that the whole Y chr is under strong purifying natural selection and shows extremely low diversity.  Thus the markers on Y chr are not neutral and should not be used for phylogeny inferences.  If two different human populations share some hypolotypes in Y chr, it does not indicate shared ancestry but rather some sort of common selection.  We have been saying in recent years that nearly all human genome variations are not neutral and are under natural selection.  There are essentially no junks.  To date human evolution history, one must use slow evolving neutral sequences as we have been advocating.  All existing literature on human history using Y chr or mtDNA or any other sequences are mistaken. 

There are only one scientific theory known so far that advocates no junk DNAs, i.e., the MGD hypothesis.  Much work in recent years have essentially killed the junk DNA concept, most recently by the ENCODE finding of at least 80% human genome being functional.  But the theory that predicts the neutral and junk DNA concept still remains to be overthrown.  The MGD represents the best chance to explain nearly 100% functional genome and to supercede the neutral theory. 

Abundant selection explains low diversity on human Y chromosomes. M. Wilson Sayres1,2, K. Lohmueller1,2, R. Nielsen1,2 1) Integrative Biology, University of California, Berkeley, Berkeley, CA; 2) Statistics, University of California, Berkeley, Berkeley, CA.

The human Y chromosome exhibits levels of diversity that are significantly lower than expected under neutral population genetic theory. Variance in male reproductive success (reducing the effective population size of males relative to females) has recently been proposed as an alternative neutral model to explain reduced diversity on the Y relative to mtDNA. Generally Y chromosomes are not included in whole genome analyses, so explicit tests of this hypothesis have yet to be conducted. Here we show that neutral models with unequal male and female effective population sizes are not consistent with observed genome-wide diversity on autosomes, X, Y and mtDNA across completely sequenced males. Instead, a model including selection is needed to explain the departure of observed Y diversity from expectations. We found that models with similar estimates of the strength of background selection can explain diversity for both the Y chromosome and mitochondrial genomes. Our results suggest that strong selection is necessary for explaining the evolutionary history of the human Y chromosome, and argue against the concept of the "junk" Y chromosome .

 

Wednesday, October 31, 2012

Recent academic activities

The primate phylogeny paper got published finally.

Huang, S.  (2012)  Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosimian clade containing tarsiers.  Sci. China Life Sci. 55: 709-725  http://life.scichina.com:8082/sciCe/EN/abstract/abstract508538.shtml


A preprint on the fact that nearly all sequences in the human genome are functional and on the genetic basis of complex traits and diseases. 

Yuan, D.,  Zhu, Z., Tan, X., Liang, J., Zeng, C., Zhang, J., Chen, J.,  Ma, L., Dogan, A., Brockmann, G., Goldmann, G., Medina, E., Rice, A.D.,  Moyer, R.W.,  Man, X., Yi, K., Li, Y., Lu, Q., Huang, Y.,  Wang, D., Yu, J., Guo, H., Xia, K., and Huang, S. (2012)  Minor alleles of common SNPs quantitatively affect traits/diseases and are under both positive and negative selection.   arXiv:1209.2911[q-bio.GN]. ( http://arxiv.org/abs/1209.2911)


We are working on a paper to support the multiregional hypothesis of modern human origins using phylogeny informative sequences.  I recently give a talk on this topic at the CAS-MPG Partner Institute for Computational Biology in Shanghai.
http://www.picb.ac.cn/picb-dynamic/Desktop/news/shownews.jsp?ID=1307

My lab website has PDFs of my recent papers for download.
http://www.sklmg.edu.cn/articles_98.shtml

Friday, June 1, 2012

Peer review of my primate phylogeny manuscript

I share my email exchanges with the reviewers and editors of the Journal of Theoretical Biology, to which I have submitted my manuscript entitled "Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosemian clade containing tarsiers".  For almost one year now, these reviewers and editors have failed to find a contradiction to our story but have nonetheless yet to accept our manuscript.  For revolutionary science, peer review is the worst enemy.  But it is of course perfectly consistent with human nature that people will defend their self interest whenever there is a conflict between self interest and truth.  Just how else one could explain what have repeatedly happened in history that revolutionary ideas often take decades to become accepted.


From: Shi Huang
To: JTB (ELS)
Sent: Sunday, March 11, 2012 5:43 AM
Subject: Re: JTB-D-11-00402: Final Decision

Dear Janet,
Just checking the editor has not forgotten our manuscript. If you need more time to find a good scientific reason to either accept or reject, it is understandable.
Best,
Shi Huang

From: JTB (ELS)
To: Shi Huang
Sent: Wednesday, December 7, 2011 12:13 AM
Subject: RE: JTB-D-11-00402: Final Decision

Thank you for your message, Dr. Huang, which is being forwarded to the Editor in Chief.
Best regards,
Janet
Janet Stein Journal Manager - JTB Editorial-Production Dept. Elsevier Science 525 "B" Street, Suite 1800
San Diego, CA 92101 USA Phone: 619-699-6324 Email: jtb@elsevier.com
From: Shi Huang [mailto:shuangtheman@yahoo.com]
Sent: Tuesday, December 06, 2011 1:04 AM
To: JTB (ELS)
Subject: Re: JTB-D-11-00402: Final Decision
Dear Jenet,
I believe that there is now sufficient time for the editors to find one specific reason backed up with evidence to invalidate my manuscript. But I seriously doubt that they can find one. After all, my hypothesis includes all the proven virtues of the existing paradigm as parts of it. Thus every fact or reason that supports the existing paradigm will also support my hypothesis, and every fact that contradicts the existing paradigm will not necessarily contradict mine. In fact, I claim that that every fact that contradict the paradigm will be found to support my hypothesis. So, what could the editors do? It is simply impossible for them to find one fact that supports the paradigm while contradicts my hypothesis. The best they can do is to come up with one fact that will contradict both the paradigm and my hypothesis. This of course they will never do because it would contradict their cherished paradigm.

Therefore, it is to the best interest of science that we must find another editor to re-review the materials related to this manuscript who is not in a position of conflict of interest. I know this is hard to do and I can give you a few names to select from if you would find it helpful.
Best,
Shi
From: JTB (ELS)
To: Shi Huang
Sent: Monday, November 21, 2011 7:48 AM
Subject: RE: JTB-D-11-00402: Final Decision
Dear Shi,
I am forwarding your message to the editors.
Best regards,

Janet
Janet Stein Journal Manager - JTB Editorial-Production Dept. Elsevier Science 525 "B" Street, Suite 1800
San Diego, CA 92101 USA Phone: 619-699-6324 Email: jtb@elsevier.com
From: Shi Huang [mailto:shuangtheman@yahoo.com]
Sent: Friday, November 18, 2011 5:31 PM
To: JTB (ELS)
Subject: Re: JTB-D-11-00402: Final Decision
Dear Janet,
Thank you for your reply. Indeed, while we can always submit to another journal, it may not be fair for a journal to take 5 months to reject a paper without giving the author any specific reasons. The only reason which may seem quite sound is repeated here - "the burden of proof for revolutionary concepts/suggestions is high. Selective reading of facts do not constitute proof or evidence." But the standard should not be artificially high that no existing theory of any kind has met, which is to demand proof of the assumptions that no theory can live without. (The recent discovery of faster than light neutrinos is a challenge to Einstein's relativity theory which has the non-proven assumption of nothing has higher speed than light) I view the last sentence here as a personal attack on my integrity and competence as a professional scientist. I demand either an apology or specific details laying out which facts I have selectively ignored. I have openly made the claim that my concept/assumption has no contradictions. If the editor has evidence of my ignoring contradictions (which is what meant by selective reading of facts), please lay them out. I have repeatedly asked the editor/reviewers to give me a single contradiction, which is enough for me to withdraw the paper. But instead, they made broad statements which implied that they have found contradictions or facts that I have selectively ignored but they never said what those facts are. So, here I ask again, where is the beef?

Your journal can feel free to reject any papers but as a science journal as opposed to a religious one you must give SPECIFIC REASONS BACKED UP BY EVIDENCE. While the comments I received earlier may seem detailed and specific enough, they have been soundly dealt with in my last rebuttal. In particular, I have asked the reviewers/editors to give reasons to some of their objections that they put forward without any justification. But I have received nothing in response. I have asked them to defend the interpretation of the genetic equidistance phenomenon by their field but they did not. Selective reading of facts?

As part of the paradigm, the editor and the reviewers are in a position of self conflict and hence cannot serve as a fair referee for the present manuscript. (It is only human nature that when there is a conflict between truth and self interest - reputation, careers, grants - self interest always takes priority) Given the 5 months of time I have invested in your journal, I am reluctant to let you off the hook too easily without me gaining anything useful scientifically. So, please give me 1) a fact that I have selectively ignored, or 2) a fact that can contradict the assumptions of my hypothesis, or 3) an explanation of how the existing theory has brilliantly accounted for the genetic equidistance phenomenon, or 4) another editor who is more objective and if you need names I would be glad to give you a few. I would like to ask an independent re-review of all the materials related to this paper. To be a responsible scientist, one should care about truth. In this case, it is either black or white. Let us just settle once for all as in a boxing match, either my interpretation of the genetic equidistance phenomenon is flawed and the existing interpretation brilliant, or the opposite. For nearly 50 years, those two interpretations are the only two competing parties that are scientifically based. I look forward to hearing from you.
Best,
Shi
From: JTB (ELS)
To: Shi Huang
Sent: Friday, November 18, 2011 8:52 AM
Subject: RE: JTB-D-11-00402: Final Decision
Dear Dr. Huang,
The editor responds, “If the author is correct, no doubt we will all be rewriting molecular evolution in a couple of years. Otherwise, the original criticism stands - the burden of proof for "revolutionary" concepts/suggestions is high. Selective reading of facts do not constitute proof or evidence.”
You are welcome to submit your manuscript to a journal that may view it more favorably.
Best regards,
Janet
Janet Stein Journal Manager - JTB Editorial-Production Dept. Elsevier Science 525 "B" Street, Suite 1800
San Diego, CA 92101 USA Phone: 619-699-6324 Email: jtb@elsevier.com
From: Shi Huang [mailto:shuangtheman@yahoo.com]
Sent: Thursday, November 17, 2011 9:28 PM
To: JTB (ELS)
Subject: Re: JTB-D-11-00402: Final Decision
Hi, Janet,
It has been 5 weeks and I thought I should get some feedbacks by now. Please ask the editor if they intend to give me a full and serious response. If not, I will ask a different editor who has more objective and neutral stance to evaluate the paper, the comments, and my rebuttal. Thank you.
Sincerely,
Shi Huang
From: JTB (ELS)
To: Shi Huang
Sent: Tuesday, October 11, 2011 9:48 AM
Subject: RE: JTB-D-11-00402: Final Decision
Thank you for your message, Dr. Huang, which is being forwarded to the editor.
Best regards,
Janet Stein Journal Manager - JTB Editorial-Production Dept. Elsevier Science 525 "B" Street, Suite 1800
San Diego, CA 92101 USA Phone: 619-699-6324 Email: jtb@elsevier.com
From: Shi Huang [mailto:shuangtheman@yahoo.com]
Sent: Saturday, October 08, 2011 11:38 PM
To: JTB (ELS)
Subject: Re: JTB-D-11-00402: Final Decision
Dear Editor,
Please ignore my last email reply to your decision letter. A revised reply is now attached in this email here as well as copied below. Please forward it to the reviewers and I hope you would consider our request for a accepting a revised version.
Best,
Shi
Dear Editor,
Thank you for the comments. I have now read them carefully after finishing up another manuscript. It seems to me that the reviewers ignored or failed to grasp my rationale for the present work, which is that the existing paradigm of phylogenetics needs to be completely overturned because of a mistake at the beginning stage of the field. My key point is about how to interpret the genetic equidistance phenomenon. Without a valid interpretation of it, all else are meaningless. I have repeatedly stated and explained why the molecular clock and in turn the neutral theory completely failed to explain the equidistance phenomenon of macroevolution while being perfectly fine for that of microevolution. Yet, the reviewer did not in their comments raise any objections to such a most important point of mine, which could only lead me or any readers of their comments to conclude that my criticism of the clock/neutral theory and hence the entire paradigm based on it is sound and irrefutable. This of course would undercut nearly all of their comments since they are nearly all based on the existing paradigm being correct. Hence I suggest that you ask these reviewers to issue a formal opinion to explain to me as well as to the world how the clock/neutral theory has correctly/brilliantly explained the genetic equidistance phenomenon and how my MGD interpretation of it is flawed. If they cannot or refuse to do that, I would suggest that you strongly consider accepting my manuscript (of course a revised and shortened version).

Indeed, extraordinary claims require extraordinary evidence. We are now using our new hypothesis to solve major biological problems such as common diseases and our goal is to show that ours is a much more powerful theory than the existing one because it is not merely relevant to evolutionary phylogenetic problems. In this sense, this ms is presently not a high priority for us, and we have been submitting it for more than two years and we are patient. We obviously cannot put everything we have into a single manuscript. But if you keep an eye on our future papers, you will soon see how our theory will be proven and accepted in major biomedical fields not related to phylogenetics and evolution. Our model will solve major problems of everyday life for which the existing model has no clues about. The existing model will be made obsolete in the next year or two by our new model. Again, we demand as should all people a correct model to be contradiction free. If you apply that standard to our model, you will find it hard to reject and better than the existing one.

Below I have detailed response within the reviewers' comments. I look forward to hearing from you. Sincerely, Shi Huang

Reviewers' comments:
Reviewer #2: Review of the manuscript entitled "Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosimian clade containing tarsiers"
Although the title is about primate phylogeny the author writes about his new hypothesis for several pages, which suggests that all the existing methodologies are inadequate to understand evolution and phylogenetics. Hence he goes back to the simplest methods (such as counting differences between two sequences) to support his hypothesis. There are several fundamental flaws in this work and everything mentioned in the manuscript is anecdotal without any scientific validity. For instance the maximum genetic diversity (MGD) hypothesis (figure 1) suggests that a complex organism (such as human) has more constrained amino acids (in a protein) compared to that of less complex one (such as chimp). But there is no proof for this claim. To examine this, the author could have simply compared the nonsynonymous (amino acid) diversities of the orthologous genes from human and chimp. Instead he compares the amino acid identities between human, chimp and orangutan and this method does not even account simple stochastic errors (which would be very high due to the small divergence between these species). Hence it is no wonder that the author finds some genes are more similar between chimp and orangutan than between chimp and human.


The best proof for the idea of MGD is the genetic equidistance result. The reviewer simply ignored this point. The reviewer also failed to explain our observation of the dramatic difference between slow and fast evolving genes. Such difference is statistically highly significant and cannot be due to stochastic errors.

The study of molecular evolution started in the 50s by counting the number of changes or looking at the % similarities. After that we realized a number of factors such as stochastic error, multiple substitutions in the same amino acid position, difference in rate of evolution between lineages, difference in rate of evolution between amino acid positions (gamma) of a protein, base compositional difference etc. Now the author goes back to the dark ages and uses age-old methods and refutes the phylogenies inferred by a number of studies that are based on more sophisticated phylogenetic methods. Importantly the author did not even provide comparison of his methods with that of the others or showed in what way his methods are better than the other.


Our method is correct while others incorrect for the simple reason that ours is the only method that takes into account of MGD and is based on a correct interpretation of the genetic equidistance result.

The author questions about the currently accepted primate phylogenies by saying that the previous studies did not account for the higher constrained amino acid sites in complex species. However the same phylogenic relationship [eg. ((human, chimp), orangutan)] was obtained using only the non-coding neutral sequences or synonymous positions, which are free from selective constraints (see organgutan genome paper).

All the past studies used fast evolving sequences, regardless whether they are coding or non coding, synonymous or non-synonymous. These sequences have reached MGD and are noninformative to phylogenetics. It is also incorrect to consider synonymous position to be free from selective constraints. Without the concept of MGD and selective constraints, one would have no way to properly interpret the genetic equidistance result and its overlap feature as found for macroevolution. This reviewer has yet to recognize or admit how embarrassing the existing interpretation truly is (please excuse the strong words here which was only meant to wake people up since the polite words in the manuscript has yet to do so).

There are a number of problems in this study.
How did the author choose the comparisons showed in Table 1-6? What is the measure or index used to quantify the epigenetic complexity? For instance in table 1 what is the epigenetic complexity index for human, octopus and for the other mollusk? On what basis the author decided that humans are more complex than chimp? Is it based on the number of cell types in the body? If so what is that index for human and other primates? Everything should be measurable in a scientific investigation.

Obviously, the reviewer knows that it is not presently possible to give a precise count of cell types or epigenetic complexity index. But is it truly necessary for us to explain the observations in Table 1-6? No. Our explanation is the best possible and has a high chance to be true rather than false and has no known contradictions. Human is more complex than octopus and other primates is supported by numerous observations and contradicted by none. In contrast, the existing paradigm has no explanations for those observations in Table 1-6 and is in fact falsified by them. Would this reviewer prefer that no one has any explanation for the observations in Table 1-6? When all else have contradictions, should not one go for the explanation that has no contradictions? Ours may not have all the proof, but it is the only one that has no contradictions.
There is also a double standard here. The existing paradigm has ignored the issue of complexity difference among species without any proof. To me, I would not attack the paradigm by insisting on the proof issue. I would just say, fine I would grant you that assumption and see how far you can get away with it. The reality is of course that it did not go very far before it meets countless contradictions. Thus, the correct attitude towards our ideas should be the same, which is that given our assumptions of complexity difference, how far can we go with it. The fact is that we have gone very far and seen no end. No one has found a contradiction to our story. This is a scientifically accepted way of proving that one has a valid theory even if that theory has assumptions that has yet to be directly proven. We may never have the ultimate theory and we only go with the one that has the least amount of contradictions. As things stand today, the MGD hypothesis is the only one that has yet to meet a single contradiction. The scientific way to attack or refute it is not by attacking its assumptions being without proof. For example, the axioms of math or physics have no proof and yet all math or physics theories are based on them. So long the theory works and has no contradictions, it is a good working theory regardless whether its assumption has proof or not.

What is maximum number of changeable positions? How does the author measure the level of saturation in these variable regions? How does he know in "slow genes" these sites are not saturated compared to fast genes?

We have addressed these problems with more analysis. We can now show easily that slow genes have less or no saturated position, or overlap positions in our nomenclature, or coincident substitutions as used by some people. If given a chance to revise the ms, we will add those data.

Everything mentioned in the manuscript seems to be arbitrary.
I do not thing the phrase "slow evolving neutral sequences" makes any sense. Only constrained sequences evolve slower than neutral sequences. Evolution in a neutral region is simply a stochastic process there is nothing like slow (or fast) evolving neutral sequence unless the actual mutation rate itself is slow in a region of a genome, which I do not think is the case here.

The reality is that different genes evolve at different rates. And yet the neutral theory explains all these genes using the same idea. According to this theory, the slow evolving genes also have neutral sequences but they somehow evolve slower than those in the fast genes. Being located in a region of a genome that has slow mutation rate could be a good reason. The reviewer may not have any proof for thinking that such is not the case. But worth than that, his thinking does not help us understand puzzles, such as how to explain the genetic equidistance result.

The way orthologous sequences of primates were identified is not correct. I am sure the data contains a lot of paralogous genes. Since the complete genomes of many primates are available a reciprocal BLAST hit is necessary at the genome level (comparing all genes) to identify orthologs. Furthermore there are several repositories (like ENSEMBL) where orthologs are readily available.

This can be improved indeed but I doubt it will affect the conclusion in any way.

I can list a number of more errors, inconsistencies and flaws in this manuscript. And the manuscript is poorly written and extremely lengthy. I do not think this work meets the standards of JTB.
Reviewer #3: The current paper elaborates on a previously published hypothesis that states that sequence divergence between any two species is affected by the 'complexity' of the species in question. Based on the assumption that 'slow evolving genes' are inherently more accurate for phylogenetic reconstruction better suited for phylogenetic reconstruction, the author attempts to provide evidence for the grouping of a pongid clade (chimpanzee, orangutan and gorilla) sister to humans. The currently accepted phylogenetic relationships instead indicate that chimpanzees are closer to humans than they are to the other apes.

The idea that that 'slow evolving genes' are inherently more accurate for phylogenetic reconstruction better suited for phylogenetic reconstruction is not an assumption, it is a self evident concept supported by numerous facts, the best of which is the genetic equidistance result. If one accept the notion of MGD, then the use of slow genes necessarily follows. But can one refute the reality of MGD? One could only if one could offer an alternative interpretation of the genetic equidistance result. But for the past 50 years, there is only the molecular clock and in turn the neutral theory interpretation and our MGD interpretation. The clock/neutral theory is valid for microevolution over short time scales and explains the small amount of overlap positions or saturated positions. But precisely because it works for microevolution, it cannot also explain macroevolution over long time scales where there are a large amount of overlap positions. It is where the MGD works. Two opposite phenomena require two different ideas.

I would first like to mention that I like the author's approach of thinking 'out of the box', instead of accepting current practices in phylogenetics. I agree there are several shortcomings with current phylogenetic methodology, and there is a need to develop better methods. The author also provides a good review about evidence supporting the human-pongid split.

This reviewer's position is more objective. I would just like to point out that any methodology with several shortcomings should not be taken seriously. No single contradiction or shortcomings should be the golden standard for a valid theory or methodology. I hold the MGD hypothesis and the methods associated with it to that standard. I welcome all attempts to invalidate it by way of contradicting it with either logic or facts. Unfortunately, I do not see a single contradiction offered by these comments here.

I do have some major concerns about the manuscript as it stands now. It is clear to me that the paper cannot be accepted for publication in its current state. Given how long the paper is, I am unable to give detailed comments. But I highlight some of my major concerns below, and have some suggestions that I think will improve the quality of the manuscript.
Firstly, the manuscript is an extremely lengthy essay, and I really think the message can be conveyed in a paper less than half the current length (it is currently 83 pages and ca 25000 words long).

We can shorten it in a revised ms. Most people would write books on a novel theory. Ours is not a typical filling in detail kinds of papers and an exception to the length rule may not be asking for too much.

The term 'epigenetic complexity' is vague, and something the reader will find hard to grasp. You state 'Epigenetic complexity is defined by the total number of cell types and epigenetic molecules, which is largely consistent with our naïve notion of organismal complexity and an independent calculation of organismal complexity based on information theory'. What are 'epigenetic molecules'? Do you mean that epigenetic complexity is proportional to the number of cell types? If so, why? You need to define the term in a more understandable way, especially since it used very often and is a central theme in the whole paper.

I have explained the concept in more detail in my first published paper on the MGD idea and I am sure the reviewer would find it worth reading. Again, as I explained above, in the worst case scenario, the epigenetic complexity as I defined now could be just an assumption. But whether it is a proven concept is not important as long as it explains nature without contradictions. The axioms of math or physics have no proof but that have not prevented us from using Newton or Einstein's theory to explain nature.

The concept of 'complexity' is again quite obscure. You state 'Complexity is inferred from greater number of cell types as indicated by, e.g., advanced nervous system and hence more variety of neuron types, and time of appearance in the fossil record (complex organisms generally appeared later) (Huang, 2008b; Huang, 2009a)'. All extant organisms have had exactly the same time since the origin of life to evolve. By 'time of appearance in fossil record', you mean the time since the origin of the particular species. Lets consider a bacterium that speciated in the last 10 000 years ago, for instance. Would you consider this more complex than humans? If yes, why? If not, why? Take any extant species. If this species speciates into two new species, does it mean that the two new ones are more complex compared to the ancestor? My point here is that the age of a species is entirely unrelated to its phenotypic complexity as perceived by us humans.

Here we have used species to mean different things. You are talking about subspeciation within a kind whereas I am talking about speciation of different kinds. The bacteria kind or the fish kind evolved earlier than the primate kind. I would of course need to make my point more explicit.

I refer to two examples where you define a species to be more complex in relation to an another one. i) 'The octopuses have complex nervous systems and are considered among the most intelligent invertebrates' ii) 'Snakes maybe simple reptiles without limbs whereas birds have complex flying capacities'. Your analyses about sequence divergence in these taxa are heavily redundant on your perception of how complex they are. Why is being considered intelligent by humans 'complex' in terms of genetic interactions? Why is having wings as opposed to having evolved limbless locomotion more complex? The point I am trying to make is that it does always make sense at the level of genes to talk about perceived complexity of species. True, the number of cell types varies between taxa (for instance unicellular organisms have fewer cell types than humans). However, genes in unicellular organisms can have highly complex epigenetic interactions with other genes. On the other end of the spectrum, a gene in humans may encode a single protein in all cells it is expressed in. Hence, I am not convinced by the argument that 'complexity' of a species is an important determinant of how
sequences evolve in that particular species.

Again, the point here is that you may disagree with my assumption about which is more complex, but you cannot refute the fact that my assumption works in what counts, which is to explain sequence alignment observations without a single contradiction. You may try your favorite assumptions, such as being intelligent is the least complex, and see where and how far that would take you. I can tell you right away that you cannot go very far with any other assumptions other than mine. Just try explain the genetic equidistance result and explain to me how the MGD interpretation is not the best possible explanation. Finally, it is human that cares about a theory to explain nature. That theory must at a minimum makes sense to humans. Making sense is the only way human can judge a theory. If a theory requires humans to be less complex than bacteria, it may be a valid theory to bacteria from another planet but would be senseless to humans. If humans then cannot make sense of that theory, then what practical use does it serve for humans to understand nature and how is human supposed to judge it correct? At this point, to consider intelligent as complex or wings more complex than limbless makes sense to we humans, and that is all that we can ask for in a theory that would only answer to we humans. If there are more objective truth that will make no sense to we humans, we humans would just say, sorry we could care less as it is irrelevant to our brain and our brain has no way of making sense of it and hence judging its validity.

'It is common sense that genetic diversity cannot increase indefinitely with time and has a maximum limit being restricted by function or epigenetic complexity'. This is definitely not 'common sense'!

Okay, I like your upfront approach. Let us see if the opposite of my position is common sense or even reasonable. So without a limit, two cytochrome c gene A and B could one day share no sequence similarity and yet still have similar cytochrome c function. Then, gene A could go on to give rise to C and B could lead to D, and so on it goes. We could therefore one day have countless different genes for cytochrome c. All these infinite number of genes share no sequence similarity with each other and yet all have the same cytochromce c function. Is that a picture you can entertain as common sense or reasonable? To me as well as to all sane people you care to raise the issue, it is completely senseless and against all known laws of science, which is one way of proving that its opposite must be common sense. It is common sense to say that certain constructs/buildings/functions can only be built by a limited number of choice of materials. You cannot build a bridge with water. The opposite of this common sense is that a function can be constructed by an unlimited number of choice of materials. Now, I have justified my position and I would like to hear your reason for justifying that my position is not common sense. I insist on hearing your reasons because I have seen other reviewers to just say no to my position without a reason. It was more like an emotional response, to argue for the sake of argument.

Use of 'sister species' - The term 'sister species' looses its meaning as used in this manuscript. By definition sister/sibling species are closest relatives of each other. However, you have used the term to refer to a pair of ingroup taxa compared to an out group.

I do not follow this comment. Sister species is a relative term, always relative to a third group. If the third group is an outgroup, then of course the two in group species would be closer to each other than either is to the outgroup, and thus meets the definition of sister groups. Relative to monkey, Africans and orangutan are sisters. Relative to mouse, Africans and monkey are sisters. Relative to great apes, Africans and Europeans are sisters.

You attempt to define slow evolving genes in your own way 'Slow evolving genes that contain slow evolving neutral sequences are defined as genes that show high identity between the simpler taxon C and a more complex taxon that is most similar to C in phenotypes'. Similarity in phenotype is extremely subjective. I would say a chimpanzee is more similar to a human compared to an orangutan based on hair colour and/or other characters. Would you agree? The point here is that if you want to provide a definition, it has to be an objective definition. Moreover the term slow evolving gene is self-descriptive, and you cant attempt to change its meaning. I suggest you use some other term to avoid confusion.

In some sense it is subjective but that is only because for some small number of closely related species we have yet to know how to classify phenotype similarities. But in most cases we do know how, like vertebrates versus invertebrates. Thus my definition works in most cases and is therefore a workable one in most cases. A few cases of exceptions due simply to our limited knowledge should not be reason to invalidate a workable definition that is inherently sound and logical. Being subjective is not inherently bad so long it can offer a coherent scientific view. all scientific laws of math and physics are subjective as they are based on axioms that are sensible to humans for which humans have no way of knowing if also sensible to other brains. The danger of being subjective is that one can too easily change views to accommodate objective facts. But such kind of subjective views can be easily exposed as non factual and nonscientific by factual contradictions or by its failure to give a coherent account of nature. This is the reason I place high premium on the notion of no single contradiction to a theory that necessarily will have unproven assumptions and subjective notions. The scientific way of judging a theory is not by demanding all its assumptions and subjective notions are proven, which has never been done to any theory known so far, but is by demanding no single contradiction in its story of nature. The existing paradigm has numerous assumptions and subjective notions such as universal constant substitution rate, the neutrality of most mutations, and no complexity difference among species. I do not buy this paradigm not because of these assumptions but because these assumptions do not lead one to a coherent story of nature that has no contradictions. It has not even explained to me the genetic equidistance result of macroevo while it did for that of microevo.


In your analyses where you attempt to show that humans are sister to a clade consisting of chimpanzees, orangutans and gorillas, you have used only slowly evolving genes as defined by you.  Your classificaiton into slow and fast evolving genes is arbitrary (98% similarity between 'simple' species, 85% similarity), etc. Why is this? Why don't you use a standard 80% or 50%?


The fact that I can show a group of genes being different from the rest such as those above 98% identity from those below is a major contradiction to the existing paradigm. There is nothing unreasonable in my definition. Percent identity is just one of the factor to consider in this definition. Another is that the slow and fast group as defined by a seemingly arbitrary number such as 80% should show clear difference in a equidistance testing analysis. If they dont show a difference, we would know that we have not defined the cutoff number correctly and we will just seek a new number to test until we found one that meets our criterion. We have now another way to distinguish fast and slow genes. Fast genes should show a high number of overlap positions or saturated mutation sites whereas slow ones should not. This can be easily counted.


You choose genes that are more similar among 'simpler' species compared to humans. For instance, in the human-orangutan-chimpanzee comparison, you choose the genes that are 98% or more similar between orangutan and a 'similarly complex' species (chimpanzee), and designate these as 'slow evolving genes'. Based on similarity between such genes, you come to the conclusion that orangutans and chimpanzee are more closely related to each other than either of them is to humans This is circular reasoning. You first choose genes on the criterion that they are very similar between two taxa, and eventually conclude that the two taxa are similar to each other based on similarity. That said, I am not trying to assert that humans are chimpanzees are more closely related. Your hypothesis may well be true, but you need proper evidence to support it. Your current reasoning is not tenable.


My way to chose such genes is logically sound as explained in the ms. This way may have potential pitfalls as you have correctly noticed. But such pitfall is not real problem for the soundness of the method because we do have negative controls. One control is that fast evolving genes selected by the same notion of similarity between orangutan and gorilla do not show orang and gorilla to be closer to each other than orang and human. If your circular argument is truely the reason for our observation on the slow genes, you would have a hard time to explain the opposite observation with the fast evolving genes. A second control is that the slow genes selected based on orang and gorilla similarity shows gorilla and chimpanzee to be equidistant to orangutan, which is not expected if your circular argument is true (you would expect to see gorilla closer to orang). Third, the same method did not fail to correctly identify other well supported clades such as monkey ape clade to the exclusion of prosimeans. Thus, there is a point to be learned here for evolutionary biologists in general. One should always keep an eye on the coherence of the whole picture. This means no contradictions. This also means when all other negative controls work as expected, then we should accept the positive result regardless whether we like it or not or whether it fits with our old notions. Because if we dont, we allow ourselves contradictions. When all other patterns follow our method, why should human and chimpanzee be an exception to that pattern? If you can easily accept that exception without a reason, you grant to much incoherency into nature, which has of course become a habit given the existing paradigm is alive only because people has a habit of overlooking contradictions. It is a well known motto among evolutionary biologists that all rules/patterns/laws in biology have exceptions. I dont buy that and I challenge every one to contradict the MGD. A law with contradictions simply means you have not found the complete law. Why is such plain truth which deems us stupid being overlooked while the self-deceiving and self-promoting notion of all rules have exceptions being favored among us? Of course, no one should be surprised given that evolutionary biologists are only human.


Ignoring the circularity, the 'phylogenetic analysis' of primates presented here is not a true phylogenetic reconstruction, but based on distance data i.e. a phenetic method.


As explained just above, there is no circularity here, not if one looked all of the results presented. Whether a distance method is phenetic or whatwever is irrelevant to a truthful construction of phylogeny. The inherent soundness of the logic and reason behind the method should be the deciding factor. It is non sensible to use genetic distance or similarity that is really a reflection of MGD to infer phylogeny, which is what is being done by all existing methods. It is non-sensible to even attempt to construct phylogeny when one has no real understanding of the reason for the genetic equidistance phenomenon.


'Although proteins have more sequences under natural selection than non-coding DNA, they stillhave neutral sequences so long they show difference between species and the method here only makes use of the neutral sequences' The relationship between showing difference between species and having neutral sequences is not clear. If two species are similar in sequences at a particular chromosomal, does it mean that selection acts on this region? On the other side of the coin, why can't differential selection in species lead to changes in non-neutral regions of the chromosome?


I am only applying the neutral theory here since this theory treats most of such differences as neutral. More correctly, many such differences are due to natural selection. But even if due to natural selection, the accumulation of such differences may still follow a clock like pattern, as is demonstrated by a recent Richard Lenski Nature paper on bacteria evolution in the lab. One sound way to make sense of the similar behavior for neutral and non neutral mutations is to notice that they share the common property of random fixation. Natural selection is in contrast to artificial selection, one is random and the other intentional and non random. Selection the process per se is not random but the outcome is random because the environmental change is random for natural selection. The outcome of artificial selection is non random because the environmental change is non random. Thus fixation of mutations by artificial selection is not random and should not follow a clock like pattern, but fixation of mutations by natural selection is random, in the same sense as random drift, and thus should follow the behavior of neutral mutations.


'A molecular clock calibrated from one fossil split time should produce divergence times consistent with other fossil records and other independently calibrated molecular clocks.' This is not entirely true. Fossils only provide minimum estimates of times of divergence, they can never provide absolute times. You also need to take this into account in your discussion about divergence times within the hominoid clade.


In an ideal situation, this should be true. Granted that some fossil record may not be complete, but here we are talking about those well established beyond any reasonable doubt. The mammal radiation or the Cambrian explosion are such records.


Overall, I don't think the author provides convincing evidence or reasoning to support his/her argument that 'complexity of species' (however defined) affects the rate of evolution of sequences. The 'phylogenetic analysis of primates has several problems, some of which I have discussed in this review.

In the final analysis, I suggest that you all pay attention to the genetic equidistance result. Once you fully understood it, you may see where I am coming from. I can see clearly that you have yet to understand how destructing it truly is to the existing paradigm and that you have not paid attention to what I wrote about it, since you are still thinking as if the interpretation of this result by the existing paradigm is sound rather than completely embarrassing. Thank you all again and I would greatly appreciate your feedbacks. I am not biased in any way and much more objective than most scientists. You dont have to do much talking to convince me your position or theory, just give me a few sentences on how your position or theory could easily take care of the genetic equidistance result of both microevol and macroevo. To help you out, I grant you that your theory is perfect for the equidistance phenomenon of microevo. I just have trouble seeing how it could also explain that of macroevo, where there are a large amount of overlap positions or saturated positions. I look forward to hearing your reply.