Friday, May 1, 2015

Genomics call for papers: Special issue on the functionality of genomic DNAs

I am coediting a special issue on the functionality of genomic DNAs.Genomics call for papers, special issue on the functionality of genomic DNAs

Guest Editors:
Prof. Shi Huang
State Key Laboratory of Medical Genetics
Central South University , China
Prof James Shapiro
Department of Biochemistry and Molecular Biology
University of Chicago
The field of genome evolution and population genetics has for the past half of a century assumed that genomic DNA can be divided into functional and non-functional (“junk”) regions. Experimental molecular science has found little evidence for this assumption. A majority of the noncoding parts of the human genome are transcribed, and numerous experimental researchers have now recognized an important functional role in the so called junk DNA regions, such as syn sites, lncRNA, psudogene transcripts, antisense transcripts, microRNA, and mobile elements. In fact, evidence for functional constraints on noncoding genome regions has long been recognized. New theoretical frameworks based on less arbitrary foundations have also appeared in recent years that can coherently account for the reality of far more functional DNAs, as well as all other major known facts of evolution and population genetics. Nonetheless, there still remains a large gap in opinions between bench scientists in experimental biology and those on the theory side in bioinformatics and population genetics. This special issue will aim to close that gap and provide a view of evidence from a perspective that all genome regions have (or can easily acquire) functionality.
The special issue on the functionality of genome will focus on the following tentative topics:
  1. Theoretical foundation for all genome regions to be functional. It will cover both the theory and all major features of genome evolution.
  2. Functional studies on junk DNA regions, including lncRNA sequences, viral DNAs and mobile elements
  3. Functionalities associated with genome spatial organization in the nucleus
  4. Isocores and compositional constraints on genomes
  5. Genetic basis of complex traits and diseases focusing on the collective effects of normal genetic variations
  6. Cancer genomics
  7. Roles of repetitive DNA elements in major evolutionary transitions
  8. Correlations of genome composition and organismal complexity
  9. Epigenetics
10. Evo Devo and extended synthesis
Important dates:
  • First submission date: July 1, 2015
  • Deadline for paper submissions: October 1, 2015
  • Deadline for final revised version: December 1, 2015
  • Expected publication: February 2016
Submission Guidelines
All manuscripts and any supplementary material should be submitted through Elsevier Editorial System located at:
Authors must select << Functionality of genomic DNAs>> at the first step of "Select ArticleType” during submission to ensure that the manuscript is correctly identified for inclusion into this special issue. Guide for Authors or other instructions could be also found on the website.

Wednesday, April 22, 2015

Ominous news for the neutral theory nearly every week now

Ominous news for the neutral theory nearly every week now: Nature paper Nature paper yesterday found endogenous retrovirus (ERV) to be functional. We have a paper last week in Genomics providing experimental evidence for essentially no neutral SNPs "Collective effects of SNPs on transgenerational inheritance in Caenorhabditis elegans and budding yeast.", which provides more evidence for the conclusion we published last year "Scoring the collective effects of SNPs: associations of minor alleles with complex traits in model organisms.

Human endogenous retrovirus (HERV) proviruses comprise a significant part of the human genome, with approximately 98,000 ERV elements and fragments making up nearly 8%. One family, termed HERV-K (HML2), makes up less than 1% of HERV elements but is one of the most studied. 

The paper found HERV-K to be fully functional. By inference via good common sense, the whole ERV class should also be functional, which just needs time and effort to be found out. This inference for the ERV kind sequence is exactly like we consider the protein kind to be all functional. Despite the fact that the functions of probably ~80% of human proteins remain unknown but no one doubts that they have a function because we do know some proteins have functions. So, if one type of ERV has functions, which happens to be the most studied, should it not to be the null hypothesis that all ERVs have functions?

The popgen and molecular evolution field today, mostly made up of people who rarely do any bench work on DNA functions, still considers ~90% of human genome to be neutral junks. But how interesting and dramatic, a big chunk of these junks were turned into gold overnight by one paper!! More interesting and dramatic findings of the same kind are sure to come over and over again within the next two years until all popgen researchers abandon their neutral bandwagon and join their bench colleagues who are nearly all on the functional train since long time ago. 

Abstract of the paper:

Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells
• Edward J. Grow, et al
Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections, and comprise nearly 8% of the human genome1. The most recently acquired human ERV is HERVK(HML-2), which repeatedly infected the primate lineage both before and after the divergence of the human and chimpanzee common ancestor2, 3. Unlike most other human ERVs, HERVK retained multiple copies of intact open reading frames encoding retroviral proteins4. However, HERVK is transcriptionally silenced by the host, with the exception of in certain pathological contexts such as germ-cell tumours, melanoma or human immunodeficiency virus (HIV) infection5, 6, 7. Here we demonstrate that DNA hypomethylation at long terminal repeat elements representing the most recent genomic integrations, together with transactivation by OCT4 (also known as POU5F1), synergistically facilitate HERVK expression. Consequently, HERVK is transcribed during normal human embryogenesis, beginning with embryonic genome activation at the eight-cell stage, continuing through the emergence of epiblast cells in preimplantation blastocysts, and ceasing during human embryonic stem cell derivation from blastocyst outgrowths. Remarkably, we detected HERVK viral-like particles and Gag proteins in human blastocysts, indicating that early human development proceeds in the presence of retroviral products. We further show that overexpression of one such product, the HERVK accessory protein Rec, in a pluripotent cell line is sufficient to increase IFITM1 levels on the cell surface and inhibit viral infection, suggesting at least one mechanism through which HERVK can induce viral restriction pathways in early embryonic cells. Moreover, Rec directly binds a subset of cellular RNAs and modulates their ribosome occupancy, indicating that complex interactions between retroviral proteins and host factors can fine-tune pathways of early human development.

Thursday, March 12, 2015

DNA mutation clock proves tough to set, of course fully expected by us

As reported by the latest issue of Nature (DNA mutation clock proves tough to set), the dates calculated so far for the Out of Africa model is really a joke. As a key player in the field David Reich says:“The fact that the clock is so uncertain is very problematic for us,” he says. “It means that the dates we get out of genetics are really quite embarrassingly bad and uncertain.”

The author says: "A slower molecular clock worked well to harmonize genetic and archaeological estimates for dates of key events in human evolution, such as migrations out of Africa and around the rest of the world. But calculations using the slow clock gave nonsensical results when extended further back in time — positing, for example, that the most recent common ancestor of apes and monkeys could have encountered dinosaurs."

Of course, we have said repeatedly in numerous papers since 2008 that the mutation rate should not be calculated by using genetic distances that are really maximum distance.

Again, without a real understanding , or with a mistaken understanding, of the first result in molecular evolution, the genetic equidistance result, the field really has no clue about what they are doing. 

Monday, October 27, 2014

Why the surprising pattern of no genetic continuity between people living in the same area but from different periods of time? Think the flu virus!

I used three slides as shown below to illustrate the idea of informative DNAs in my talk in last month’s workshop on genome and evolution in Naples, Italy.

The antigenic sites in human influenza A virus mutate and turn over quickly, which is critical for their survival or escape from human neutralizing antibodies and hence responsible for flu epidemics. As shown in Figure 1, two amino acid positions in hemagglutinin (156 and 145, panel a and b) turned over several times within a 30 year period, while two others (138 and 194, panel c and d) stayed largely unchanged (Figure from Shih et al, 2007). 

The flu results illustrate two important points with regard to evolutionary dynamics of a genome that have so far been grossly overlooked by the evolution and popgen field. First, fast evolving or less conserved DNAs are also functional rather than neutral as they are essential for quick adaptive needs in response to fast changing environments. Second, fast evolving DNAs turn over quickly and can be shown to violate the infinite sites model.  Hence, they cannot be used for phylogenetic inference. If one uses the fast changing sites in a flu virus to infer the phylogenetic relationship of the virus isolates responsible for different epidemics in a past period of say 10 years, one would reach the absurd conclusion that each epidemic was caused by a distinct type of flu virus with no genetic continuity among them rather than just minor variations of the same type.

Mutation rates in humans are of course much slower than that in a flu virus. But just like a flu virus, there are also fast and slow changing sites (Figure 2). The time scales are different but the principle is the same.  The fast changing sites may turn over every few thousand years and in fact make up the majority of the observed variant sites in humans when properly examined by us (Figure 3). This is why the field of ancient DNA kept producing the absurd pattern of no genetic continuity between people living in the same area but from different periods of time. All of the published analyses have simply used the wrong sites that are equivalent to the fast changing antigenic sites in a flu virus. What one should be using are sites with very slow mutation rates, like 1 mutation every 50,000 years. We have been busy reinterpreting the published DNAs for several years now and hope to submit our work soon.

Figure 1. (a and b) Frequency changes at residue sites 156 (a) and 145 (b) were highly dynamic. (c and d) Sites 138 (c) and 194 (d) did not undergo major frequency change over time.

Figure 2. A priori model of evolutionary dynamics of human genomic DNAs.

Figure 3. Difference between slow and fast evolving sites. Shown are a piece of homologous DNA in three different individuals or species. In the fast evolving DNAs making up the vast majority of human genome, there is obvious and verifiable violation of the infinite sites model. These DNAs have abundant overlapped mutant sites where independent mutations have occurred on the same site in different individuals or species. 


Shih, C-C., Hsiao, T-C., Ho, M-S., and Li, W-H. (2007) Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc Natl Acad Sci U S A. 104:6283-6288.

Thursday, October 23, 2014

Surprises from the 45,000 year old Siberian Ust'-Ishim: why is he not closer to Africans than East Asians are?

The genome of the 45,000 year old Siberian Ust'-Ishim published yesterday in Nature (see John Hawks blog) again repeated the same absurd pattern of no genetic continuity between local people living in different periods of time. The Ust'-Ishim genome is no more related to the 24,000 year old Siberian MA1 than to living East Asians. But this kind of surprises is getting boring for me to mention in this blog.  (John Hawks said this in his blog: "This is not an isolated case, it is another example of what we see throughout the world: Ancient people represented by DNA that seem to have very little to do with the people who live in the same areas today. We're not finding the ancestors of living populations so much as we are finding branches of populations we did not know existed.")

A new kind of surprise is the failure to do all necessary studies or to present all relevant studies. One expect that the Ust'-Ishim genome should be almost 2 fold less distant to living Africans than East Asians are because he had 45,000 years less time to accumulate distance as shown in Figure 1A. But the paper made no mention of this key expectation from the Out of Africa model. 

It also makes no sense for Ust'-Ishim to be an outlier to living East Asians on a PCA plot (Figure 2) since the distance between Ust'-Ishim and East Asians should be almost 2 fold less distant than between certain pair of East Asians, again because Ust'-Ishim had 45,000 years less time to accumulate mutations/distance (Figure 1A).

Our results with the 1000 genomes data showed that East Asians CHS and Europeans GBR are equidistant to Africans LWK or YRI in fast evolving SNPs representing genome average (Figure 1B). This of course has nothing to do with mutation rate and time but represents maximum genetic distance and natural selection. We are going to soon analyse the Ust'-Ishim genome in the same way and we fully expect Ust'-Ishim to be equidistant or more distant to Africans than East Asians are, which would be the same pattern as our first blog post here in 2007 had shown for the Neanderthals. Now such a result would be truly inconvenient for the Out of Africa model, which is probably why it was left out in the paper. 

Figure 1

Figure 2

Thursday, September 11, 2014

More ancient DNA surprises from ASHG 2014 abstracts

Two interesting ancient DNA abstracts from the ASHG 2014 meeting.  Just like my last post here, the surprise is again (and again and again.....again....) that there is no genetic continuity between local people living today and those locals in the past, or between local people living in different periods in the past. 

Capture of 390,000 SNPs in dozens of ancient central Europeans reveals a population turnover in Europe thousands of years after the advent of farming. I. Lazaridis, W. Haak, N. Patterson, N. Rohland, S. Mallick, B. Llamas, S. Nordenfelt, E. Harney, A. Cooper, K. W. Alt, D. Reich.
   To understand the population transformations that took place in Europe since the early Neolithic, we used a DNA capture technique to obtain reads covering ~390 thousand single nucleotide polymorphisms (SNPs) from a number of different archaeological cultures of central Europe (Germany and Hungary). The samples spanned the time period from 7,500 BP to 3,500 BP (Early Neolithic to Early Bronze Age periods) and most of them were previously studied using mtDNA (Brandt, Haak et al., Science, 2013). The captured SNPs include about 360,000 SNPs from the Affymetrix Human Origins Array that were discovered in African individuals, as well as about 30,000 SNPs chosen for other reasons (that are thought to have been affected by natural selection, or to have phenotypic effects, or are useful in determining Y-chromosome haplogroups). By analyzing this data together with a dataset of 2,345 present-day humans and other published ancient genomes, we show that late Neolithic inhabitants of central Europe belonging to the Corded Ware culture were not a continuation of the earlier occupants of the region. Our results highlight the importance of migration and major population turnover in Europe long after the arrival of farming. * Contributed equally to this work.

Insights into British and European population history from ancient DNA sequencing of Iron Age and Anglo-Saxon samples from Hinxton, England. S. Schiffels, W. Haak, B. Llamas, E. Popescu, L. Loe, R. Clarke, A. Lyons, P. Paajanen, D. Sayer, R. Mortimer, C. Tyler-Smith, A. Cooper, R. Durbin.
   British population history is shaped by a complex series of repeated immigration periods and associated changes in population structure. It is an open question however, to what extent each of these changes is reflected in the genetic ancestry of the current British population. Here we use ancient DNA sequencing to help address that question. We present whole genome sequences generated from five individuals that were found in archaeological excavations at the Wellcome Trust Genome Campus near Cambridge (UK), two of which are dated to around 2,000 years before present (Iron Age), and three to around 1,300 years before present (Anglo-Saxon period). Good preservation status allowed us to generate one high coverage sequence (12x) from an Iron Age individual, and four low coverage sequences (1x-4x) from the other samples.   By providing the first ancient whole genome sequences from Britain, we get a unique picture of the ancestral populations in Britain before and after the Anglo-Saxon immigrations. We use modern genetic reference panels such as the 1000 Genomes Project to examine the relationship of these ancient samples with present day population genetic data. Results from principal component analysis suggest that all samples fall consistently within the broader Northern European context, which is also consistent with mtDNA haplogroups. In addition, we obtain a finer structural genetic classification from rare genetic variants and haplotype based methods such as FineStructure. Reflecting more recent genetic ancestry, results from these methods suggest significant differences between the Iron Age and the Anglo-Saxon period samples when compared to other European samples. We find in particular that while the Anglo-Saxon samples resemble more closely the modern British population than the earlier samples, the Iron Age samples share more low frequency variation than the later ones with present day samples from southern Europe, in particular Spain (1000GP IBS). In addition the Anglo-Saxon period samples appear to share a stronger older component with Finnish (1000GP FIN) individuals. Our findings help characterize the ancestral European populations involved in major European migration movements into Britain in the last 2,000 years and thus provide more insights into the genetic history of people in northern Europe.

Friday, August 29, 2014

Another ancient DNA surprise: history of the New World Arctic people

It has been widely noticed repeatedly that every ancient DNA research result has been a great surprise, starting from our 2008 paper or the first post on this blog back in 2007. The latest is a Science paper today The genetic prehistory of the New World Arctic. The surprises here are 1) again (and again and again.....again....) that there is no genetic continuity between local people living today and those locals in the past(>2000 years old); again and again ... replacement rather than regional continuity, following exactly the footsteps of the Out of Africa model superseding the Multiregional model. 2) no sex between people who lived side by side;"Elsewhere, as soon as people meet each other, they have sex," says Willerslev. "Even potentially different species like Neanderthals [and modern humans] had sex, so this finding is extremely surprising." (3) extreme low genetic diversity in mtDNA in ancient Paleo-Eskimos. "I can't remember any other group having such low diversity," says Willerslev. For quote by Willerslev, see see this news piece.

Well, just like we said in our post on the 400K year old Heidelbergensis DNA, it would be a complete surprise if the field of ancient DNA as it is presently practiced could produce any sensible and non-surprising result consistent with common sense and fossil and cultural records. When you use noninformative DNAs to do your analytic work, what can you expect other than meaningless trash.

Of course we are working hard to reinterpret these newly published DNA sequences and we should soon publish our results (constantly delayed by newly released DNAs needing reinterpretations) that should be a very pleasant and intellectually satisfying surprise to all. For example, as our new analysis shows, the iceman Otzi was indeed most closely related to the local living Italians as common sense would expect, rather than to remote island people the Sardinians as is now mistakenly concluded by the literature.