We presented the following poster at the "1000 genomes and beyond" meeting, Cambridge, UK, 24-26, June 2014. We are also going to present it in the ASHG 2014 meeting in Oct 2014, San Diego.
The bottom line is that there are very few neutral or junk DNAs in the human genome, at least when one examines the genome by using experimental approaches. (a new paper of ours on disproving the neutral theory by using an experimental approach has just got published here, titled Scoring the collective effects of SNPs: associations of minor alleles with complex traits in model organisms.) All previous studies used only bioinformatics approaches. Their conclusions of less than 10% functional genome are based on UNCERTAIN assumptions and therefore are mostly meaningless. The field must realize that it is time to stop such senseless researches based on senseless assumptions. We should either do experiments without any prior assumptions or if we have to, we must only use a priori sound intuitions as our assumptions.
The abstract, introduction and discussion of the our poster are posted here. The poster can be downloaded from my lab website.
Abstract The infinite sites model of the
neutral theory is a fundamental assumption underlying nearly all population
genetic and phylogenetic studies today but has yet to be properly tested. We
here tested it from two novel perspectives using the 1000 genomes dataset.
First, we examined the genetic diversity patterns of different human
populations using a variety of different types of SNPs, such as a random set of
SNPs representing genome average, stop codon, nonsyn, syn,
etc. Patterns shown by a random set of SNPs are expected to be similar to those
shown by known functional stop codon SNPs, if most SNPs are not neutral. In
contrast, neutral SNPs should show a most different pattern from stop codon
SNPs. Second, it has long been well known that most genetic variations are
shared among different human groups, which has been interpreted from the
infinites sites perspective to mean few genetic differences among the ethnic
groups (Lewontin,
1972). But the possibility of saturation or independent mutations to account
for this phenomenon has yet to be examined and excluded. We compared the number
of shared SNPs in DNAs of different evolutionary rates among different human
populations to see if shared SNPs are in fact a result of independent mutations
or saturation and hence more common in fast evolving DNAs relative to slow
ones. We
found that a random set of SNPs are just like the stop codon SNPs in showing
Africans to have the largest
genetic diversity.
Shared
SNPs are enriched in fast evolving DNAs. These results suggest that the vast
majority of the human genome do not follow the infinite sites model.
Introduction Molecular studies have so far relied on the
Neutral theory and its infinite sites assumption. The Neutral theory was
originally inspired by the so called molecular clock which was in turn inspired
by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a
simpler outgroup. In
recent papers, we have shown that the equidistance result has been incorrectly
interpreted by the molecular clock with grave consequences on phylogenetic
studies: nearly all past studies have used non-informative DNAs assumed to be
neutral but have now been shown by us to be under selection (Hu et al, 2013,
Huang, 2010). The neutral theory was mistaken right from its inception. We have
developed the maximum genetic diversity (MGD) hypothesis to absorb and
supersede the neutral theory (Hu et al, 2013). From this more correct/complete
theoretical perspective, we here tested whether the infinite sites model holds
for the majority of the human genome as is commonly assumed.
Discussion:
mutation rate, sequence conservation, and neutrality
The results suggest that the vast majority of
human genome do not follow the infinite sites model and are not neutral. Only a
very limited sites: the non-syn slow
evolving SNPs as defined here, behaved uniquely among all the SNPs examined and
appear to be neutral or follow the infinite sites model. They are not
deleterious as they are different from stop codon SNPs. They are also not under positive selection as
positively selected genes tend to be fast evolving. To the dramatic difference between slow and
fast evolving DNAs as shown here, we cannot come up with a meaningful explanation
using any known schemes other than the recently proposed idea of maximum
genetic diversity.
Variation in mutation rate in different
regions of the human nuclear genome may exceed 1000 fold. That a gene is slow evolving could be due to
at least two reasons. One is being
located in a region of the genome with slow mutation rates. This however may not apply to the difference
in mutation rates between non-synonymous and synonymous sites of the same gene
as found here. Alternatively, most
mutations may hit functional sites and be negatively selected by the need to
maintain the internal integrity/order of a biological system. It would take
many mutations and hence a long time before a neutral site is hit, thus giving
the appearance of a slow mutation rate. Since changes in such neutral sites
take long time, they may be too slow to meet adaptive needs to be under
positive selection. Given the apparent
slow rate and absence of positive selection, they are also unlikely to reach
excess levels to cause harm or be under negative selection.
Hence, sequence conservation per se may not
automatically indicate functionality of variants within such sequences as is
commonly assumed. Less conserved
sequences are more important for adaptation to external environment, while the
more conserved ones are important for internal integrity of a system. To a
virus or bacteria facing elimination by human medicines, the fast evolving
parts of their genome is far more critical/functional to their survival than
their more conserved parts. The popular assumption of
neutrality/non-functionality for the less conserved parts of the genome
overlooks their fundamental function in quick adaptation.