Friday, November 13, 2009

Real reason for the endless production of conflicting results on tarsiers

A couple of weeks ago, a new paper by Chatterjee et al. appeared on primate phylogeny that groups tarsiers with prosimians.

“Estimating the phylogeny and divergence times of primates using a supermatrix approach” Helen J Chatterjee, Simon YW Ho, Ian Barnes, and Colin Groves
BMC Evolutionary Biology 2009, 9:259 doi:10.1186/1471-2148-9-259

I sent the following comment titled “Real reason for the endless production of conflicting results on tarsiers” to the Journal’s website:

On the position of tarsiers, Chatterjee et al wrote: “The majority of molecular evidence supports the latter grouping [4,10-13] (grouping tarsiers with higher primates), although a large number of molecular studies still provide support for the Prosimii concept [14-18].”

When a method or technique can lead to two opposite results repeatedly and seemingly endlessly while only one of the two can be true, it is time to ask whether something is fundamentally missing with our method (all existing popular methods are slightly different from one another but are fundamentally the same kind). Let us start from the very beginning and examine the assumptions for our method. The key assumption for all sequence similarity based methods is that sequence dissimilarity always correlates with time of divergence. Well, is this true? We don’t have to be a specialist to know that this is sometimes true and sometimes not. Thus, for our method to be able to produce accurate and uncertainty-free result, it must take into account the reality that sequence dissimilarity sometimes does not correlate with time of divergence. Many of the sequence comparisons are not informative and should and must be excluded from our method. When they are not as is the case with all existing methods, they contribute to the high noise level that can sometimes overwhelm the signal. It is by accident that these methods sometimes give correct results and sometimes wrong results and no one knows why the difference or when to view such a result correct and final. Therefore, we have a peculiar non-scientific situation: no one is taking anyone else’s results as the final say. Never mind that we only have one true phylogeny of life on Earth. Once you know it, it is done and no more work needed. The existing methods are perfect for keeping some of us employed forever but will never give us truth. Truth is not judged by a quantitative difference in the number of studies that support it versus those against it. The correct method should produce zero number of studies that is against truth or should be immune to the production of conflicting results.

Data + method = result. The data here in molecular phylogeny is just sequence facts and cannot possibly be wrong so long one is not making sequencing errors. Thus the only way to produce a false result or conflicting results in phylogeny is through an incorrect method. Since all existing methods are perfectly capable of producing false results and have all in practice produced false results or conflicting results, it is another simple proof that the existing popular methods are simply incorrect.

By the own admission of the leading experts, the existing popular methods are flawed in the sense that they can easily produce incorrect results that are totally out of the hands of the scientists:
“Unlike the case in physics, the predictive power of a model in biology is quite low. It seems to us that if the prediction (e.g., a phylogenetic tree reconstructed) of a model is correct in 80% of the cases, it is a good model at least at the present time.” From Masatoshi Nei and Sudhir Kumar, 2000, Molecular Evolution and Phylogenetics, (p85):

When a result is only 80% certain, it can be completely wrong. We either know or we don’t know. Knowing with 80% certainty or anything less than 100% means we don’t know. We are much better off without it because it often leads the non-specialists into the wrong idea that we know with 100% certainty. Does not everyone in academic think that we are 100% certain that chimp is closest to humans when in fact we are only 80% certain and can therefore be completely wrong? When they then act and work based on that knowledge (they have been doing just that for years now), should we feel perfectly comfortable for misleading them into that?

Of course, nothing we know says that biology has to be different from physics. The present situation merely means that we have much to learn. When we know better, we should be able to have a model or method that is correct in 100% of the cases. Until then, some of what we are doing is just kidding ourselves. I have now offered the slow clock method as the best candidate for a method that takes into account all reality and is capable of 100% certainty (1). While the result of Chatterjee et al., like many others, does support my result on tarsiers using the slow clock method (1), I do not view their result as confirming mine, because their method is flawed. By using the same kind of method, another group could easily produce a result opposite of theirs by just picking a new set of genes (this of course has been done many times already). I of course do not view such result as valid contradiction to mine just like I do not view the result of Chatterjee et al. as valid support.

A flawed method automatically qualifies its result as meaningless, regardless whether the result happens to be consistent with reality or not. The definition of a flawed method is simply that which can turn a perfectly solid set of factual data into a false interpretation of reality. Any method that has produced conflicting interpretations has of course automatically self-proven itself false. The present situation we have with tarsiers is just one of many that says flatly it is time for a fundamental change in our method of interpreting sequence data.

Ref:
Huang, S. (2009) Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosimian clade containing tarsiers. Available from Nature Precedings, http://hdl.handle.net/10101/npre.2009.3794.1

No comments: