Sunday, August 17, 2014

Testing the infinite sites assumption

We presented the following poster at the "1000 genomes and beyond" meeting, Cambridge, UK, 24-26, June 2014. We are also going to present it in the ASHG 2014 meeting in Oct 2014, San Diego. 

The bottom line is that there are very few neutral or junk DNAs in the human genome, at least when one examines the genome by using experimental approaches. (a new paper of ours on disproving the neutral theory by using an experimental approach has just got published here, titled Scoring the collective effects of SNPs: associations of minor alleles with complex traits in model organisms.)  All previous studies used only bioinformatics approaches. Their conclusions of less than 10% functional genome are based on UNCERTAIN assumptions and therefore are mostly meaningless. The field must realize that it is time to stop such senseless researches based on senseless assumptions. We should either do experiments without any prior assumptions or if we have to, we must only use a priori sound intuitions as our assumptions.

The abstract, introduction and discussion of the our poster are posted here. The poster can be downloaded from my lab website.

  Abstract The infinite sites model of the neutral theory is a fundamental assumption underlying nearly all population genetic and phylogenetic studies today but has yet to be properly tested. We here tested it from two novel perspectives using the 1000 genomes dataset. First, we examined the genetic diversity patterns of different human populations using a variety of different types of SNPs, such as a random set of SNPs representing genome average, stop codon, nonsyn, syn, etc. Patterns shown by a random set of SNPs are expected to be similar to those shown by known functional stop codon SNPs, if most SNPs are not neutral. In contrast, neutral SNPs should show a most different pattern from stop codon SNPs. Second, it has long been well known that most genetic variations are shared among different human groups, which has been interpreted from the infinites sites perspective to mean few genetic differences among the ethnic groups (Lewontin, 1972). But the possibility of saturation or independent mutations to account for this phenomenon has yet to be examined and excluded. We compared the number of shared SNPs in DNAs of different evolutionary rates among different human populations to see if shared SNPs are in fact a result of independent mutations or saturation and hence more common in fast evolving DNAs relative to slow ones. We found that a random set of SNPs are just like the stop codon SNPs in showing Africans to have the largest genetic diversity. Shared SNPs are enriched in fast evolving DNAs. These results suggest that the vast majority of the human genome do not follow the infinite sites model.

     Introduction  Molecular studies have so far relied on the Neutral theory and its infinite sites assumption. The Neutral theory was originally inspired by the so called molecular clock which was in turn inspired by the first and most remarkable result in molecular evolution, the genetic equidistance result that sister species are approximately equidistant to a simpler outgroup. In recent papers, we have shown that the equidistance result has been incorrectly interpreted by the molecular clock with grave consequences on phylogenetic studies: nearly all past studies have used non-informative DNAs assumed to be neutral but have now been shown by us to be under selection (Hu et al, 2013, Huang, 2010). The neutral theory was mistaken right from its inception. We have developed the maximum genetic diversity (MGD) hypothesis to absorb and supersede the neutral theory (Hu et al, 2013). From this more correct/complete theoretical perspective, we here tested whether the infinite sites model holds for the majority of the human genome as is commonly assumed.  

       Discussion: mutation rate, sequence conservation, and neutrality
       The results suggest that the vast majority of human genome do not follow the infinite sites model and are not neutral. Only a very limited sites: the non-syn slow evolving SNPs as defined here, behaved uniquely among all the SNPs examined and appear to be neutral or follow the infinite sites model. They are not deleterious as they are different from stop codon SNPs.  They are also not under positive selection as positively selected genes tend to be fast evolving.  To the dramatic difference between slow and fast evolving DNAs as shown here, we cannot come up with a meaningful explanation using any known schemes other than the recently proposed idea of maximum genetic diversity. 
       Variation in mutation rate in different regions of the human nuclear genome may exceed 1000 fold.  That a gene is slow evolving could be due to at least two reasons.  One is being located in a region of the genome with slow mutation rates.  This however may not apply to the difference in mutation rates between non-synonymous and synonymous sites of the same gene as found here.  Alternatively, most mutations may hit functional sites and be negatively selected by the need to maintain the internal integrity/order of a biological system. It would take many mutations and hence a long time before a neutral site is hit, thus giving the appearance of a slow mutation rate. Since changes in such neutral sites take long time, they may be too slow to meet adaptive needs to be under positive selection.  Given the apparent slow rate and absence of positive selection, they are also unlikely to reach excess levels to cause harm or be under negative selection. 
        Hence, sequence conservation per se may not automatically indicate functionality of variants within such sequences as is commonly assumed.  Less conserved sequences are more important for adaptation to external environment, while the more conserved ones are important for internal integrity of a system. To a virus or bacteria facing elimination by human medicines, the fast evolving parts of their genome is far more critical/functional to their survival than their more conserved parts. The popular assumption of neutrality/non-functionality for the less conserved parts of the genome overlooks their fundamental function in quick adaptation.  



  


No comments: