Tuesday, July 28, 2015

Begging the question, a common practice in evolutionary genetics (junks assumed, then deduced)

The molecular evolution and popgen field is known to have the most mathematics among all branches of biology. But precisely because of that, it needs many simplifying assumptions or premises, which often lead to the fallacy of begging the question. I here gave a few examples regarding the concept of the mutation or genetic load and the genetic load argument for junk DNA, based on reading a paper on the genetic load (Lesecque et al 2012). Some have used the genetic load as the best argument for the junk DNA notion (Palazzo and Gregory, 2014; Graur, 2015). I also discuss the assumption of non-conservation equaling non function and the assumption of infinite sites.

Lesecque et al say: “The mutation load was more formally defined as the proportional reduction in mean fitness of a population relative to that of a mutation-free genotype, brought about by deleterious mutations (Crow 1970):

L = (Wmax - Wmean)/Wmax

where Wmean is the mean fitness of the population at equilibrium and Wmax is the mean fitness of a deleterious mutation-free individual.”

Is there a deleterious mutation-free individual in a real world or even an imagined world? All mutations, as random mistakes, have a deleterious aspect to an ordered system, if not individually, then collectively. Many mutations could be both deleterious and beneficial. For example, they could be beneficial to adaptive immunity that requires genome variation for producing diverse antibody responses but deleterious to innate immunity that requires conserved proteins to recognize conserved sequences shared by a certain class of microorganisms. By failing to recognize the both deleterious and beneficial nature of most mutations and by classifying mutations into two kinds (deleterious and non-deleterious with the latter consisted of mostly neutral ones), the assumption on the concept of deleterious and non-deleterious mutations eventually led to the genetic load argument for the conclusion that most mutations must be neutral. Here, one sees that the neutral conclusion is already embedded in the premise that led to it. The premise does not recognize the fact that most mutations appear neutral or nearly neutral as a result of balancing selection, and the fact that all mutations have a deleterious aspect as noises to a finely tuned system. Of course, that premise works for a junkyard like system.

Lesecque et al say: ““If the fitness effects of deleterious mutations are independent from one another, the mutation load across all loci subject to recurrent mutation is approximately

L = 1-e-U

(Kimura and Maruyama 1966), where U is the overall rate of deleterious mutation per diploid genome per generation. This simple formula is a classic result of evolutionary genetics.”

So, a classic formula for the genetic load argument is based on the assumption that the fitness effects of deleterious mutations are independent from one another. For a junk yard, yes, the consequences of errors in the building parts are independent from one another. However, for a system that is ordered and built by network-like interactions among the building parts, no, the consequences of errors in the building parts are NOT independent from one another. In fact, recent studies in genomics are constantly discovering epistatic interactions among mutations. So, here one sees clearly again, the neutral or junk DNA conclusion is already embedded in the premise that treats an organism more as a junkyard than a highly ordered system with components organized in a network fashion. When you have already assumed an organism to be junk like, why bother showing us the math formula and deduction leading to the junk DNA conclusion? You should just say that most DNAs are junks because I said so.

Finally, none of the premises related to the genetic load concept recognized the fact that a large collection of otherwise harmless mutations within an individual could be deleterious, as our recent papers have shown. Well, again, such a fact certainly does not exist for a junkyard-like system. By not recognizing that fact or being too naïve to see it, the practitioners in the popgen field have again and again assumed biological systems to be junk like before setting out to prove/deduce that they are made of largely junks.

I also briefly comment on a paper by the Ponting group concluding that human genome is only about 8% functional (Rands et al, 2014). The premise for that deduction is that non-conservation means non-function. Again, building parts for different junk yards are not conserved and nonfunctional. So, non-conservation means non function holds for junk yards. But for organisms relying on mutations to adapt to fast changing environments, recurrent or repeated mutations at the same sites at different time points in their life history are absolutely essential for their survival. Less conserved sequences are more important for adaptation to external environment, while the more conserved ones are important for internal integrity of a system. For bacteria or flu viruses to escape human immunity or medicines, the fast changing or non-conserved parts of their genome are absolutely essential. So, here again, by assuming non-function for the non-conserved parts of the genome, one is assuming an organism to be like a junk yard.

Other key assumptions like the infinite sites model (means neutral sites) are critical for phylogenetics as it is practiced today and for the absurd Out of Africa model of human origin that uses imagined bottlenecks to explain away the extremely low genetic diversity of humans. Well, a junk yard can certainly have an infinite number of parts and tolerate an infinite number of errors. An organism’s genome is finite in size and essentially nothing compared to infinite size. Within such finite size genomes, the proportion that can be free to change without consequences is even more limited or finite.

A paradigm shift (or revolutionary science) is, according to Thomas Kuhn, a change in the basic assumptions, or paradigms, within the ruling theory of science. The above analyses show that the assumptions for the popgen and molecular evolution field are largely out of touch with reality as more reality becomes known, and must be changed quickly if the field wants to avoid fading into oblivion and stay relevant to mainstream bench biology, genomic medicine, archeology, and paleontology. Those assumptions have produced few useful and definitive deductions that can be independently verified and avoid the fate of constant and endless revisions, like we have seen from 1987 to now for the Out of Africa model or the Neanderthals.

Lesecque Y, Keightley PD, Eyre-Walker A (2012) A resolution of the mutation load paradox in humans. Genetics 191: 1321–1330 .

Palazzo AF, Gregory TR (2014) The Case for Junk DNA. PLoS Genet 10(5): e1004351. doi:10.1371/journal.pgen.1004351

Dan Graur (2015) If @ENCODE_NIH is right each of us should have on average from 3 × 10^19 to 5 × 10^35 children. https://www.dropbox.com/s/4bj3andtlu3y9hk/Genetic%20mutational%20load.docx?dl=0 …

Rands CM, Meader S, Ponting CP, Lunter G (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genet 10(7): e1004525.

Sunday, July 26, 2015

Some quotations from our Parkinson's disease paper just published in PLoS One

Some quotations from our Parkinson's disease paper just published in PLoS One:

Recent studies have begun to show that a much larger than expected portion of the human genome may be functional [2429].

An organism can certainly accommodate some limited amounts of random variations within its building parts or DNAs, but too much random errors or mutations may exceed an organisms maximum level of tolerable disorder or entropy. Thus overall level of randomness or minor allele amounts may be expected to be higher in complex diseases relative to controls.

In fact, while most bench biologists have thought otherwise, nearly all in the population genetics field still believe that most SNPs are neutral or that most minor alleles are minor because of random drift rather than because of disease-association.

The findings of higher MAC in PD cases is consistent with our intuitive hypothesis that a highly complex and ordered system such as the human brain must have an optimum limit on the level of randomness or entropy in its building parts or DNAs. Too much randomness over a critical threshold may trigger complex diseases. There may be only one unique and optimum way to build a complex system but there could be numerous ways to break it.While it may only take one single major effect error in a major pathway to cause diseases, it would require the collective effects of a large number of minor effect errors in many different pathways to achieve a similar outcome.

Saturday, July 25, 2015

One of the most astonishing findings of modern science: the genetic equidistance result

The genetic equidistance result has been called, rightly in my opinion, by the biologist Mike Denton as “one of the most astonishing findings of modern science” in his 1986 book “Evolution, A Theory in Crisis”. No one had expected the result or could have guessed it and all would be shocked by it. Nearly all scientists today either don’t know it or have no idea about what it means. In fact, it has been mistakenly interpreted ever since its discovery, which has unfortunately misled the field of molecular evolution and population genetics into the wrong path. It was the reason for the universal molecular clock idea and the junk or neutral DNA idea.

The genetic equidistance result was originally discovered by Margoliash in 1963, who states:

“It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein.”

Margoliash E (1963) Primary structure and evolution of cytochrome c. Proceedings of the National Academy of Sciences of the USA 50: 672–679.

Half of a century later with numerous genomes sequenced and compared, we all know that Margoliash is correct in noticing the equidistance result. Indeed, all vertebrate cytochrome c are approximately equally different from the yeast protein, or the bacteria protein for that matter. However, one could have just as easily used common sense to interpret the equidistance result in the following alternative way by changing a few words in the above Margolaish version:

“It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the species with lower organismal complexity. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish has lower complexity than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein.”

Typical textbooks mention nothing about the original equidistance result and only present the Margoliash interpretation, known as the molecular clock. For example, Dan Graur and Wen-Hsiung Li in their “Fundamentals of Molecular Evolution” (2000) said this:

“In their comparative studies of hemoglobin and cytochrome c protein sequences from different species, Zuckerkandl and Pauling (1962, 1965) and Margoliash (1963) first noticed that the rates of amino acid replacement were approximately the same among various mammalian lineages.”

In other words, these scientists noticed that the equidistance result could be interpreted to mean a universal molecular clock that all mammalian species, or all species for that matter, have approximately the same substitution rate for any given protein. However, another person could have noticed the alternative that the equidistance is a result of lower complexity species having more tolerable sequence variations. This alternative is the maximum genetic diversity (MGD) hypothesis.

So, which is right? The universal molecular clock has now been proven invalid, as acknowledged by nearly all in the field. The only other alternative is the more intuitive MGD interpretation, which has yet to encounter a single piece of contradicting data. The molecular clock has led to nonsensical ideas such as neutral or junk DNAs as if an organism is like a junk yard or a dead body, but the MGD theory has led to the exact opposite. 

Thursday, July 2, 2015

Application of the MGD theory on complex diseases, first success Parkinson's disease

We have a new research paper on Parkinson's disease in press in PLoS One

It is merely the first success of the MGD theory in solving complex dieseases problems.

Enrichment of Minor Alleles of Common SNPs and Improved Risk Prediction for Parkinson's Disease

Zuobin Zhu, Dejian Yuan, Denghui LuoXitong Lu and Shi Huang*
State Key Laboratory of Medical Genetics, Central South University, Changsha, Hunan, China

Parkinson disease (PD) is the second most common neurodegenerative disorder in the aged population and thought to involve many genetic loci. While a number of individual single nucleotide polymorphisms (SNPs) have been linked with PD, many remain to be found and no known markers or combinations of them have a useful predictive value for sporadic PD cases. The collective effects of genome wide minor alleles of common SNPs, or the minor allele content (MAC) in an individual, have recently been shown to be linked with quantitative variations of numerous complex traits in model organisms with higher MAC more likely linked with lower fitness. Here we found that PD cases had higher MAC than matched controls. A set of 37564 SNPs with MA (MAF < 0.4) more common in cases (P < 0.05) was found to have the best predictive accuracy. A weighted risk score calculated by using this set can predict 2% of PD cases (100% specificity), which is comparable to using familial PD genes to identify familial PD cases. These results suggest a novel genetic component in PD and provide a useful genetic method to identify a small fraction of PD cases.

Friday, May 1, 2015

Genomics call for papers: Special issue on the functionality of genomic DNAs

I am coediting a special issue on the functionality of genomic DNAs.Genomics call for papers, special issue on the functionality of genomic DNAs

Guest Editors:
Prof. Shi Huang
State Key Laboratory of Medical Genetics
Central South University , China
Prof James Shapiro
Department of Biochemistry and Molecular Biology
University of Chicago
The field of genome evolution and population genetics has for the past half of a century assumed that genomic DNA can be divided into functional and non-functional (“junk”) regions. Experimental molecular science has found little evidence for this assumption. A majority of the noncoding parts of the human genome are transcribed, and numerous experimental researchers have now recognized an important functional role in the so called junk DNA regions, such as syn sites, lncRNA, psudogene transcripts, antisense transcripts, microRNA, and mobile elements. In fact, evidence for functional constraints on noncoding genome regions has long been recognized. New theoretical frameworks based on less arbitrary foundations have also appeared in recent years that can coherently account for the reality of far more functional DNAs, as well as all other major known facts of evolution and population genetics. Nonetheless, there still remains a large gap in opinions between bench scientists in experimental biology and those on the theory side in bioinformatics and population genetics. This special issue will aim to close that gap and provide a view of evidence from a perspective that all genome regions have (or can easily acquire) functionality.
The special issue on the functionality of genome will focus on the following tentative topics:
  1. Theoretical foundation for all genome regions to be functional. It will cover both the theory and all major features of genome evolution.
  2. Functional studies on junk DNA regions, including lncRNA sequences, viral DNAs and mobile elements
  3. Functionalities associated with genome spatial organization in the nucleus
  4. Isocores and compositional constraints on genomes
  5. Genetic basis of complex traits and diseases focusing on the collective effects of normal genetic variations
  6. Cancer genomics
  7. Roles of repetitive DNA elements in major evolutionary transitions
  8. Correlations of genome composition and organismal complexity
  9. Epigenetics
10. Evo Devo and extended synthesis
Important dates:
  • First submission date: July 1, 2015
  • Deadline for paper submissions: October 1, 2015
  • Deadline for final revised version: December 1, 2015
  • Expected publication: February 2016
Submission Guidelines
All manuscripts and any supplementary material should be submitted through Elsevier Editorial System located at: http://ees.elsevier.com/geno
Authors must select << Functionality of genomic DNAs>> at the first step of "Select ArticleType” during submission to ensure that the manuscript is correctly identified for inclusion into this special issue. Guide for Authors or other instructions could be also found on the website.

Wednesday, April 22, 2015

Ominous news for the neutral theory nearly every week now

Ominous news for the neutral theory nearly every week now: Nature paper Nature paper yesterday found endogenous retrovirus (ERV) to be functional. We have a paper last week in Genomics providing experimental evidence for essentially no neutral SNPs "Collective effects of SNPs on transgenerational inheritance in Caenorhabditis elegans and budding yeast.", which provides more evidence for the conclusion we published last year "Scoring the collective effects of SNPs: associations of minor alleles with complex traits in model organisms.

Human endogenous retrovirus (HERV) proviruses comprise a significant part of the human genome, with approximately 98,000 ERV elements and fragments making up nearly 8%. One family, termed HERV-K (HML2), makes up less than 1% of HERV elements but is one of the most studied. 

The paper found HERV-K to be fully functional. By inference via good common sense, the whole ERV class should also be functional, which just needs time and effort to be found out. This inference for the ERV kind sequence is exactly like we consider the protein kind to be all functional. Despite the fact that the functions of probably ~80% of human proteins remain unknown but no one doubts that they have a function because we do know some proteins have functions. So, if one type of ERV has functions, which happens to be the most studied, should it not to be the null hypothesis that all ERVs have functions?

The popgen and molecular evolution field today, mostly made up of people who rarely do any bench work on DNA functions, still considers ~90% of human genome to be neutral junks. But how interesting and dramatic, a big chunk of these junks were turned into gold overnight by one paper!! More interesting and dramatic findings of the same kind are sure to come over and over again within the next two years until all popgen researchers abandon their neutral bandwagon and join their bench colleagues who are nearly all on the functional train since long time ago. 

Abstract of the paper:

Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells
• Edward J. Grow, et al
Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections, and comprise nearly 8% of the human genome1. The most recently acquired human ERV is HERVK(HML-2), which repeatedly infected the primate lineage both before and after the divergence of the human and chimpanzee common ancestor2, 3. Unlike most other human ERVs, HERVK retained multiple copies of intact open reading frames encoding retroviral proteins4. However, HERVK is transcriptionally silenced by the host, with the exception of in certain pathological contexts such as germ-cell tumours, melanoma or human immunodeficiency virus (HIV) infection5, 6, 7. Here we demonstrate that DNA hypomethylation at long terminal repeat elements representing the most recent genomic integrations, together with transactivation by OCT4 (also known as POU5F1), synergistically facilitate HERVK expression. Consequently, HERVK is transcribed during normal human embryogenesis, beginning with embryonic genome activation at the eight-cell stage, continuing through the emergence of epiblast cells in preimplantation blastocysts, and ceasing during human embryonic stem cell derivation from blastocyst outgrowths. Remarkably, we detected HERVK viral-like particles and Gag proteins in human blastocysts, indicating that early human development proceeds in the presence of retroviral products. We further show that overexpression of one such product, the HERVK accessory protein Rec, in a pluripotent cell line is sufficient to increase IFITM1 levels on the cell surface and inhibit viral infection, suggesting at least one mechanism through which HERVK can induce viral restriction pathways in early embryonic cells. Moreover, Rec directly binds a subset of cellular RNAs and modulates their ribosome occupancy, indicating that complex interactions between retroviral proteins and host factors can fine-tune pathways of early human development.

Thursday, March 12, 2015

DNA mutation clock proves tough to set, of course fully expected by us

As reported by the latest issue of Nature (DNA mutation clock proves tough to set), the dates calculated so far for the Out of Africa model is really a joke. As a key player in the field David Reich says:“The fact that the clock is so uncertain is very problematic for us,” he says. “It means that the dates we get out of genetics are really quite embarrassingly bad and uncertain.”

The author says: "A slower molecular clock worked well to harmonize genetic and archaeological estimates for dates of key events in human evolution, such as migrations out of Africa and around the rest of the world. But calculations using the slow clock gave nonsensical results when extended further back in time — positing, for example, that the most recent common ancestor of apes and monkeys could have encountered dinosaurs."

Of course, we have said repeatedly in numerous papers since 2008 that the mutation rate should not be calculated by using genetic distances that are really maximum distance.

Again, without a real understanding , or with a mistaken understanding, of the first result in molecular evolution, the genetic equidistance result, the field really has no clue about what they are doing.