Friday, July 31, 2015

The onion test or the protozoa test?

Regarding the onion test, I would like to say it is asking the wrong question. One should be asking is: why the simple organism protozoa kind has a genome size variation range from the small to large of ~20000 fold, or why flowering plants ~2000 fold, whereas mammals only less than 10 fold? Don't even try to invoke time of evolution as the reason as mammals and flowering plants appeared about the same time. The actual numbers cited here came directly from the a paper by the author who invented the onion test (see Figure 1 in the paper). 

From the onion test author: “the onion test simply asks: if most eukaryotic DNA is functional at the organism level, be it for gene regulation, protection against mutations, maintenance of chromosome structure, or any other such role, then why does an onion require five times more of it than a human?”

Answer: 99.9% of human genome is functional for internal construction purposes while 0.1% is for normal variations among humans and for adaptation to environments. HIV is 20% vs 80%. Onions have a large fraction of their genomes for adaptive purposes, which can be freely changed without much effect on its internal integrity (in this sense, junks). So, to say most human DNA is functional for its construction does not necessarily precludes one from saying that most onion DNA is not or most HIV DNA is not for construction. So the question why does an onion require more (functional genomes in terms of construction purpose) is not a valid question. No one is saying so and it does not. No one, at least I am not, is saying that every species has the same proportion of functional genomes (in terms of internal construction not much related to adaptation). So, the onion test has a straw man premise. What onion does have more than human does, or an HIV virus does more than human does, is that it has more fraction of its genome as the so called junks (that in fact play adaptive roles in response to environments). Once you accept as you did that some species can tolerate more junks, the size of the junks, whether 1x or 5x of human genome size, is irrelevant.

So, why portozoa have the largest variation in genome size? Protozoa are all unicellular, which is the key. They are hence all simple relative to multicellular organisms. Simple systems can tolerate more random error type of variations in their building parts, including the dimensions or amounts, which just means a large stdev from the ideal form. So, if a part for a toy car is specified to have a length of 10000 +/- 9999, then both 19999 or 1 will be allowed random errors. 

All researchers in the field base their papers on the infinite sites model, which says there are infinite number of neutral or junk sites for any genome regardless whether it is human or onion. They do not acknowledge that different species tolerate different amounts of junks. They don't think that monkey or mouse or onion can tolerate more junks than humans do. or maybe they do in their heart but at least they disregard that in their work.I think they are going to have a hard time answering my protozoa test.

Palazzo AF, Gregory TR (2014) The Case for Junk DNA. PLoS Genet 10(5): e1004351. doi:10.1371/
journal.pgen.1004351



A theory in crisis

It is really satisfying to hear some honest voices from the leaders of the field, despite the fact that most in the field would never say anything like it when challenged or when one submit a paper challenging their bread and butter theory.

"As this short history demonstrates, population genetics has made
remarkable strides in understanding both the phenomenology and the
theoretical models of molecular evolution. However, it also demonstrates
that we have yet to find a mechanistic theory of molecular evolution that
can readily account for all of the phenomenology. Thus, while the 1990s
will most likely be a decade dominated by the gathering of data, we would
like to call attention to a looming crisis as theoretical investigations lag
behind the phenomenology."

Ohta, T. and Gillespie,J.H. Development of Neutral and Nearly Neutral Theories. Theoretical population biology 49, 128 142 (1996)

The existing theory is at least incomplete and will forecast many false things, because it has not even accounted for all the phenomenology known at the time of 1996 as acknowledged by the above quote from Ohta and Gillespie. Maybe the difference between me and some people in the field is that to me a correct theory means accounting for all relevant data without a single contradiction. If anyone does not think that is possible, in biology at least, just remember that all seemingly impossible things are viewed as quite simple after they have been accomplished. Also keep in mind, a single contradiction to a theory is equivalent to an infinite number of contradictions. When a theory allows a single contradiction or refuses to be falsified by it, it no longer qualifies as testable (it would be meaningless to use the word test). 

So, what one really needs is a more complete or correct theory, which was what the above quote means. 



Tuesday, July 28, 2015

Begging the question, a common practice in evolutionary genetics (junks assumed, then deduced)

The molecular evolution and popgen field is known to have the most mathematics among all branches of biology. But precisely because of that, it needs many simplifying assumptions or premises, which often lead to the fallacy of begging the question. I here gave a few examples regarding the concept of the mutation or genetic load and the genetic load argument for junk DNA, based on reading a paper on the genetic load (Lesecque et al 2012). Some have used the genetic load as the best argument for the junk DNA notion (Palazzo and Gregory, 2014; Graur, 2015). I also discuss the assumption of non-conservation equaling non function and the assumption of infinite sites.

Lesecque et al say: “The mutation load was more formally defined as the proportional reduction in mean fitness of a population relative to that of a mutation-free genotype, brought about by deleterious mutations (Crow 1970):

L = (Wmax - Wmean)/Wmax

where Wmean is the mean fitness of the population at equilibrium and Wmax is the mean fitness of a deleterious mutation-free individual.”

Is there a deleterious mutation-free individual in a real world or even an imagined world? All mutations, as random mistakes, have a deleterious aspect to an ordered system, if not individually, then collectively. Many mutations could be both deleterious and beneficial. For example, they could be beneficial to adaptive immunity that requires genome variation for producing diverse antibody responses but deleterious to innate immunity that requires conserved proteins to recognize conserved sequences shared by a certain class of microorganisms. By failing to recognize the both deleterious and beneficial nature of most mutations and by classifying mutations into two kinds (deleterious and non-deleterious with the latter consisted of mostly neutral ones), the assumption on the concept of deleterious and non-deleterious mutations eventually led to the genetic load argument for the conclusion that most mutations must be neutral. Here, one sees that the neutral conclusion is already embedded in the premise that led to it. The premise does not recognize the fact that most mutations appear neutral or nearly neutral as a result of balancing selection, and the fact that all mutations have a deleterious aspect as noises to a finely tuned system. Of course, that premise works for a junkyard like system.

Lesecque et al say: ““If the fitness effects of deleterious mutations are independent from one another, the mutation load across all loci subject to recurrent mutation is approximately

L = 1-e-U

(Kimura and Maruyama 1966), where U is the overall rate of deleterious mutation per diploid genome per generation. This simple formula is a classic result of evolutionary genetics.”

So, a classic formula for the genetic load argument is based on the assumption that the fitness effects of deleterious mutations are independent from one another. For a junk yard, yes, the consequences of errors in the building parts are independent from one another. However, for a system that is ordered and built by network-like interactions among the building parts, no, the consequences of errors in the building parts are NOT independent from one another. In fact, recent studies in genomics are constantly discovering epistatic interactions among mutations. So, here one sees clearly again, the neutral or junk DNA conclusion is already embedded in the premise that treats an organism more as a junkyard than a highly ordered system with components organized in a network fashion. When you have already assumed an organism to be junk like, why bother showing us the math formula and deduction leading to the junk DNA conclusion? You should just say that most DNAs are junks because I said so.

Finally, none of the premises related to the genetic load concept recognized the fact that a large collection of otherwise harmless mutations within an individual could be deleterious, as our recent papers have shown. Well, again, such a fact certainly does not exist for a junkyard-like system. By not recognizing that fact or being too naïve to see it, the practitioners in the popgen field have again and again assumed biological systems to be junk like before setting out to prove/deduce that they are made of largely junks.

I also briefly comment on a paper by the Ponting group concluding that human genome is only about 8% functional (Rands et al, 2014). The premise for that deduction is that non-conservation means non-function. Again, building parts for different junk yards are not conserved and nonfunctional. So, non-conservation means non function holds for junk yards. But for organisms relying on mutations to adapt to fast changing environments, recurrent or repeated mutations at the same sites at different time points in their life history are absolutely essential for their survival. Less conserved sequences are more important for adaptation to external environment, while the more conserved ones are important for internal integrity of a system. For bacteria or flu viruses to escape human immunity or medicines, the fast changing or non-conserved parts of their genome are absolutely essential. So, here again, by assuming non-function for the non-conserved parts of the genome, one is assuming an organism to be like a junk yard.

Other key assumptions like the infinite sites model (means neutral sites) are critical for phylogenetics as it is practiced today and for the absurd Out of Africa model of human origin that uses imagined bottlenecks to explain away the extremely low genetic diversity of humans. Well, a junk yard can certainly have an infinite number of parts and tolerate an infinite number of errors. An organism’s genome is finite in size and essentially nothing compared to infinite size. Within such finite size genomes, the proportion that can be free to change without consequences is even more limited or finite.

A paradigm shift (or revolutionary science) is, according to Thomas Kuhn, a change in the basic assumptions, or paradigms, within the ruling theory of science. The above analyses show that the assumptions for the popgen and molecular evolution field are largely out of touch with reality as more reality becomes known, and must be changed quickly if the field wants to avoid fading into oblivion and stay relevant to mainstream bench biology, genomic medicine, archeology, and paleontology. Those assumptions have produced few useful and definitive deductions that can be independently verified and avoid the fate of constant and endless revisions, like we have seen from 1987 to now for the Out of Africa model or the Neanderthals.

Lesecque Y, Keightley PD, Eyre-Walker A (2012) A resolution of the mutation load paradox in humans. Genetics 191: 1321–1330 .

Palazzo AF, Gregory TR (2014) The Case for Junk DNA. PLoS Genet 10(5): e1004351. doi:10.1371/journal.pgen.1004351

Dan Graur (2015) If @ENCODE_NIH is right each of us should have on average from 3 × 10^19 to 5 × 10^35 children. https://www.dropbox.com/s/4bj3andtlu3y9hk/Genetic%20mutational%20load.docx?dl=0 …


Rands CM, Meader S, Ponting CP, Lunter G (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genet 10(7): e1004525.


Sunday, July 26, 2015

Some quotations from our Parkinson's disease paper just published in PLoS One

Some quotations from our Parkinson's disease paper just published in PLoS One:

Recent studies have begun to show that a much larger than expected portion of the human genome may be functional [2429].

An organism can certainly accommodate some limited amounts of random variations within its building parts or DNAs, but too much random errors or mutations may exceed an organisms maximum level of tolerable disorder or entropy. Thus overall level of randomness or minor allele amounts may be expected to be higher in complex diseases relative to controls.

In fact, while most bench biologists have thought otherwise, nearly all in the population genetics field still believe that most SNPs are neutral or that most minor alleles are minor because of random drift rather than because of disease-association.

The findings of higher MAC in PD cases is consistent with our intuitive hypothesis that a highly complex and ordered system such as the human brain must have an optimum limit on the level of randomness or entropy in its building parts or DNAs. Too much randomness over a critical threshold may trigger complex diseases. There may be only one unique and optimum way to build a complex system but there could be numerous ways to break it.While it may only take one single major effect error in a major pathway to cause diseases, it would require the collective effects of a large number of minor effect errors in many different pathways to achieve a similar outcome.


Saturday, July 25, 2015

One of the most astonishing findings of modern science: the genetic equidistance result

The genetic equidistance result has been called, rightly in my opinion, by the biologist Mike Denton as “one of the most astonishing findings of modern science” in his 1986 book “Evolution, A Theory in Crisis”. No one had expected the result or could have guessed it and all would be shocked by it. Nearly all scientists today either don’t know it or have no idea about what it means. In fact, it has been mistakenly interpreted ever since its discovery, which has unfortunately misled the field of molecular evolution and population genetics into the wrong path. It was the reason for the universal molecular clock idea and the junk or neutral DNA idea.

The genetic equidistance result was originally discovered by Margoliash in 1963, who states:

“It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein.”

Margoliash E (1963) Primary structure and evolution of cytochrome c. Proceedings of the National Academy of Sciences of the USA 50: 672–679.

Half of a century later with numerous genomes sequenced and compared, we all know that Margoliash is correct in noticing the equidistance result. Indeed, all vertebrate cytochrome c are approximately equally different from the yeast protein, or the bacteria protein for that matter. However, one could have just as easily used common sense to interpret the equidistance result in the following alternative way by changing a few words in the above Margolaish version:

“It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the species with lower organismal complexity. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish has lower complexity than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein.”

Typical textbooks mention nothing about the original equidistance result and only present the Margoliash interpretation, known as the molecular clock. For example, Dan Graur and Wen-Hsiung Li in their “Fundamentals of Molecular Evolution” (2000) said this:

“In their comparative studies of hemoglobin and cytochrome c protein sequences from different species, Zuckerkandl and Pauling (1962, 1965) and Margoliash (1963) first noticed that the rates of amino acid replacement were approximately the same among various mammalian lineages.”

In other words, these scientists noticed that the equidistance result could be interpreted to mean a universal molecular clock that all mammalian species, or all species for that matter, have approximately the same substitution rate for any given protein. However, another person could have noticed the alternative that the equidistance is a result of lower complexity species having more tolerable sequence variations. This alternative is the maximum genetic diversity (MGD) hypothesis.


So, which is right? The universal molecular clock has now been proven invalid, as acknowledged by nearly all in the field. The only other alternative is the more intuitive MGD interpretation, which has yet to encounter a single piece of contradicting data. The molecular clock has led to nonsensical ideas such as neutral or junk DNAs as if an organism is like a junk yard or a dead body, but the MGD theory has led to the exact opposite. 


Thursday, July 2, 2015

Application of the MGD theory on complex diseases, first success Parkinson's disease

We have a new research paper on Parkinson's disease in press in PLoS One

It is merely the first success of the MGD theory in solving complex dieseases problems.

Enrichment of Minor Alleles of Common SNPs and Improved Risk Prediction for Parkinson's Disease

Zuobin Zhu, Dejian Yuan, Denghui LuoXitong Lu and Shi Huang*
State Key Laboratory of Medical Genetics, Central South University, Changsha, Hunan, China
Abstract

Parkinson disease (PD) is the second most common neurodegenerative disorder in the aged population and thought to involve many genetic loci. While a number of individual single nucleotide polymorphisms (SNPs) have been linked with PD, many remain to be found and no known markers or combinations of them have a useful predictive value for sporadic PD cases. The collective effects of genome wide minor alleles of common SNPs, or the minor allele content (MAC) in an individual, have recently been shown to be linked with quantitative variations of numerous complex traits in model organisms with higher MAC more likely linked with lower fitness. Here we found that PD cases had higher MAC than matched controls. A set of 37564 SNPs with MA (MAF < 0.4) more common in cases (P < 0.05) was found to have the best predictive accuracy. A weighted risk score calculated by using this set can predict 2% of PD cases (100% specificity), which is comparable to using familial PD genes to identify familial PD cases. These results suggest a novel genetic component in PD and provide a useful genetic method to identify a small fraction of PD cases.


Friday, May 1, 2015

Genomics call for papers: Special issue on the functionality of genomic DNAs

I am coediting a special issue on the functionality of genomic DNAs.Genomics call for papers, special issue on the functionality of genomic DNAs

Guest Editors:
Prof. Shi Huang
State Key Laboratory of Medical Genetics
Central South University , China
huangshi@sklmg.edu.cn
Prof James Shapiro
Department of Biochemistry and Molecular Biology
University of Chicago
jsha@uchicago.edu
The field of genome evolution and population genetics has for the past half of a century assumed that genomic DNA can be divided into functional and non-functional (“junk”) regions. Experimental molecular science has found little evidence for this assumption. A majority of the noncoding parts of the human genome are transcribed, and numerous experimental researchers have now recognized an important functional role in the so called junk DNA regions, such as syn sites, lncRNA, psudogene transcripts, antisense transcripts, microRNA, and mobile elements. In fact, evidence for functional constraints on noncoding genome regions has long been recognized. New theoretical frameworks based on less arbitrary foundations have also appeared in recent years that can coherently account for the reality of far more functional DNAs, as well as all other major known facts of evolution and population genetics. Nonetheless, there still remains a large gap in opinions between bench scientists in experimental biology and those on the theory side in bioinformatics and population genetics. This special issue will aim to close that gap and provide a view of evidence from a perspective that all genome regions have (or can easily acquire) functionality.
The special issue on the functionality of genome will focus on the following tentative topics:
  1. Theoretical foundation for all genome regions to be functional. It will cover both the theory and all major features of genome evolution.
  2. Functional studies on junk DNA regions, including lncRNA sequences, viral DNAs and mobile elements
  3. Functionalities associated with genome spatial organization in the nucleus
  4. Isocores and compositional constraints on genomes
  5. Genetic basis of complex traits and diseases focusing on the collective effects of normal genetic variations
  6. Cancer genomics
  7. Roles of repetitive DNA elements in major evolutionary transitions
  8. Correlations of genome composition and organismal complexity
  9. Epigenetics
10. Evo Devo and extended synthesis
Important dates:
  • First submission date: July 1, 2015
  • Deadline for paper submissions: October 1, 2015
  • Deadline for final revised version: December 1, 2015
  • Expected publication: February 2016
Submission Guidelines
All manuscripts and any supplementary material should be submitted through Elsevier Editorial System located at: http://ees.elsevier.com/geno
Authors must select << Functionality of genomic DNAs>> at the first step of "Select ArticleType” during submission to ensure that the manuscript is correctly identified for inclusion into this special issue. Guide for Authors or other instructions could be also found on the website.