Information

How much variation in mutation rate in there in the human genome?

How much variation in mutation rate in there in the human genome?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

In humans, the average mutation rate is estimated to be around $2.5 cdot 10^{-8}$ (Nachman and Crowell, 2000). Of course this mutation varies from sequence to sequence.

Can you please give some ideas of how much variation in mutation rate there is in the human genome?

There are many ways to express such variation. One could give the lowest and highest mutation rate or shows a mutation rate map over either the whole genome or a randomly chosen sequence.


Cool question. Unfortunately, I don't have a complete answer; the best I can do is summarize some literature. I'd be happy to look more if you want to discuss though. Also, sorry the answer is a little scattered and I apologize if you were already aware of all this!

One thing I will suggest though is that this is hard to do for the human genome in isolation; hence why much of the literature draws on other animals as well.

In any case, the literature on this topic is quite expansive, so I am afraid this is merely the tip of the iceberg. Hopefully it's helpful though.


Firstly, I know of two genomic mutation rate maps that might interest you (as you suggest in your question). One in Genome-wide patterns and properties of de novo mutations in humans by Francioli et al, looking at de novo mutations in humans (obviously more concerned with germline mutations), and another in Mutational heterogeneity in cancer and the search for new cancer genes by Lawrence et al, which (unsurprisingly) is concerned with somatic mutation rates and the resulting oncogenics.

A great review, although not entirely human-specific, is Variation in the mutation rate across mammalian genomes by Hodgkinson and Eyre-Walker. They note that:

  • Small-scale effects: G & C mutate more than A & T, CG dinucleotides are highly correlated to higher mutation in non-coding DNA, and there appear to be hyper-mutable sites (conserved across species) that are susceptible to SNPs. These effects vary across the genome as well. Indeed, certain regions of mitochondrial DNA also have highly variable mutation rates. Oddly, GC-rich islands may be less mutable (for certain mutation types), and may affect nearby nucleotides, particularly methylated ones.

  • Transciption sites: somatic tissue has lower mutation rates of transcribed areas than non-transcribed ones. In germ tissue, there is an interesting effect for certain types of mutations increasing as one moves away from the transcription start site, but not others.

  • Other mutations: indel mutations tend to co-occur with point mutations

  • Mutation rates are spatially correlated at multiple scales.

  • Chromosomes: for humans, it seems the Y chromosome is the most mutable, followed by the X, and then the autosomes. There is also a difference in the mutation rate between autosomes, but it appears this is overshadowed by the intra-autosomal mutation rates.

Not explicitly noted by the paper, but intutive and sometimes implied, is the lower mutation rate in coding and regulatory regions (e.g. enhancers, promoters,… ).

Another paper is Evolution of Local Mutation Rate and Its Determinants by Terekhanova et al. They note local mutation rates in primates (obviously including humans) are highly correlated across the genome. This correlation weakens as evolutionary distance grows (e.g. humans to mice). They also look at the influence of several genomic features on local mutation rates.

Not in humans, but Evidence of non-random mutation rates suggests an evolutionary risk management strategy by Martincorena et al look at mutation rates across E. Coli genomes, finding enormous variability, particularly lower rates in highly expressed genes.

Since we have not yet seen chromatin structure pop up much, I'd like to point out the cool paper The effects of chromatin organization on variation in mutation rates in the genome by Makova and Hardison. Therein, they find that mutations of different types are affected differently by open vs closed chromatin states; roughly, closed chromatin encourages base substitutions, whereas indels occur more often in areas with open chromatin.

Finally, some of the difficulties in assessing mutation variability are explained by Eyre-Walker and Eyre-Walker, in How Much of the Variation in the Mutation Rate Along the Human Genome Can Be Explained?


Some related work

For mutation rates in germ-line cells specifically, see:

  • Properties and rates of germline mutations in humans by Campbell and Eichler

  • Determinants of Mutation Rate Variation in the Human Germline by Segurel et al

  • Variation in genome-wide mutation rates within and between human families by Conrad et al

which describe some patterns in mutation variability across the genome, but are often more concerned with inter-individual factors.

Also here are two early papers using comparative genomics to look at mutation rates:

  • Mutation rates in mammalian genomes by Kumar and Subramanian

  • Deterministic Mutation Rate Variation in the Human Genome by Smith et al

which focus on inter-species comparison, but also discuss compositional differences (e.g. GC richness).


Mutations: Driver Versus Passenger

Scott W. Piraino , . Simon J. Furney , in Encyclopedia of Cancer (Third Edition) , 2019

Factors Impacting Genomic Mutation Rates

Somatic mutation rates vary considerably across regions of the genome. Understanding the factors that contribute to this variation is essential for methods for the detection of driver genes, and is also independently of interest in terms of understanding cancer biology ( Fig. 3 ). One of the first genomic features to be associated with genomic mutation rate is gene expression level. Highly transcribed genes generally show lower mutation rates compared to weakly expressed genes. It has also been noted that the transcribed strand of genes shows lower mutations rates compared to the nontranscribed strand, which has been interpreted as suggesting that the impact of gene expression on mutation rate may be due to transcription-coupled nucleotide excision repair.

Fig. 3 . Many factors influence mutations rates both within individual cancer genomes and across individuals. Here we give several examples of factors that influence mutation rates from three categories: environmental exposures, factors that impact DNA replication and repair, and features that vary across the genome.

From Piraino, S. W., Furney, S. J. (2016). Beyond the exome: The role of non-coding somatic mutations in cancer. Annals of Oncology 27(2), 240–248.

DNA repair has emerged as a central factor in mutation rate variation in cancer genomes. Early efforts to characterize the variability of mutation rate across the genome identified chromatin markers associated with chromatin state as being particularly associated with cancer genome mutation rate. Markers of closed chromatin have generally been associated with higher mutation rate, while open chromatin states have been associated with lower mutation rate. This observed association was given a mechanistic interpretation when chromatin state was linked to access of DNA repair machinery to the underlying DNA in the context of mutation rates. It has been observed that while mismatch repair proficient tumors demonstrate substantial mutation rate variability across the genome, this variability is considerably reduced in mismatch repair deficient tumors. Combined with the observed association of mutation rate with chromatin state, these results suggest a model in which open chromatin state makes DNA more available for repair, resulting in lower mutation rates in these regions when DNA repair is proficient.

Other studies have suggested a plausible role for the interaction of DNA repair and chromatin state in more specific chromatin interactions. Reduced mutation rates have been observed in DNase I hypersensitive sites, while this hypomutation is absent in tumors with deficient nucleotide excision repair, again suggesting easier access of repair machinery in open chromatin as a factor influencing mutation rates. While DNase I hypersensitivity sites show reduced mutation rates, increased mutation rates have been noted at the actual site of protein binding. Transcription factor binding sites coincide with decreased excision repair activity as measured by excision repair sequencing, and the excess mutation rate at these sites relative to flanking regions is disrupted in XPC-deficient skin cancers, suggesting that this mutation rate variability is again caused by access to DNA repair factors. In colorectal cancer, CTCF binding sites have an excess of mutations, while colorectal tumors harboring mutations that disrupt the proofreading activity of POLE actually have fewer mutations at these sites compared to background mutations, again implicating an interaction between protein binding and DNA repair.

Understanding mutation rate variability both within and between cancer genomes has important implications for the detection of driver genes. The number of mutations available for driver gene discover influences the power to detect driver genes. Mutation rate variability can also lead to “false positives” in analyses, genes that have high mutation rates but are not driver genes. Accounting for factors that influence mutation rate can help improve the performance of statistical methods and decrease the likelihood of false positives. Recently, ratiometric methods which use the ratio of different types of mutations as opposed to raw rates have been suggest as a method that can improve performance in the presence of un-modeled variation in mutation rates.

In addition to varying across the genome, mutation rates also vary substantially across individuals. Environmental exposures such as tobacco smoke, UV light, and aristolochic acid can result in increased mutation rates in cancer genomes. Mutation rates across individuals are also impacted by variability in the activity of certain cellular processes. In the next section we discuss some of the environmental and endogenous-driven mutational processes that contribute to the heterogeneity of mutation rates.


Human mutation rate revealed

Next-generation sequencing provides the most accurate estimate to date.

Every time human DNA is passed from one generation to the next it accumulates 100–200 new mutations, according to a DNA-sequencing analysis of the Y chromosome.

This number — the first direct measurement of the human mutation rate — is equivalent to one mutation in every 30 million base pairs, and matches previous estimates from species comparisons and rare disease screens.

The British-Chinese research team that came up with the estimate sequenced ten million base pairs on the Y chromosome from two men living in rural China who were distant relatives. These men had inherited the same ancestral male-only chromosome from a common relative who was born more than 200 years ago. Over the subsequent 13 generations, this Y chromosome was passed faithfully from father to son, albeit with rare DNA copying mistakes.

The researchers cultured cells taken from the two men, and using next-generation sequencing technologies found 23 candidate mutations. Then they validated twelve of these mutations using traditional sequencing techniques. Eight of these mutations, however, had arisen in their cell-culturing process, which left just four genuine, heritable mutations. Extrapolating that result to the whole genome gives a mutation rate of around one in 30 million base pairs.

"It was very reassuring that our application of the new sequencing technologies seems to give a reliable result and that the number we've been using for the mutation rate is pretty much the right one," says Chris Tyler-Smith of the Wellcome Trust Sanger Institute in Hinxton, UK, who led the study, published today in Current Biology 1 .

Tyler-Smith says that direct measurement of the mutation rate can be used to infer events in our evolutionary past, such as when humans first migrated out of Africa, more accurately than previous methods. But before that's possible, researchers will need a more precise estimate, notes Laurent Duret, an evolutionary biologist at the University of Lyon in France. "The confidence interval for the mutation rate is still quite wide," he says. Sequencing more pairs of Y chromosomes from distant male cousins in other families should provide a more robust measurement and reveal how mutation rates vary between individuals, Duret adds.

Most of the Y chromosome doesn't mix with any other chromosomes, which makes estimating its mutation rate easier. But the mutation rate might be somewhat different on other chromosomes, points out Adam Eyre-Walker, an evolutionary biologist at the University of Sussex in Brighton, UK. Other projects that involve sequencing parents and their offspring, such as the 1000 Genomes Project, should start to illuminate how DNA changes across the rest of the genome.

"I'm sure this is just the first of many papers that will be doing the same sort of thing," says Tyler-Smith.


Persistence of Deleterious Mutations

Would you like to write for us? Well, we're looking for good writers who want to spread the word. Get in touch with us and we'll talk.

Despite their harmful nature, why do deleterious genes persist in the genome of organisms? This could be due to a number of reasons, such as the rate of elimination of these mutations may be low compared to the rate at which they appear. Other probable reasons are as follows.

Heterozygote Advantage

This is the condition where the possession of two different copies of a gene (wild-type and mutant) is beneficial to the organism, rather than detrimental. An example of this is the mutation that occurs in the hemoglobin gene, resulting in the condition called sickle cell anemia (SCA). In this case, the homozygote for the mutant allele will show a deleterious effect, i.e, the individual will suffer from SCA (all RBCs will be sickle-shaped). However, if the individual is a heterozygote, the recessive nature of the condition will render him a carrier (Partial sickling of RBCs) of the condition. This is beneficial, since the malarial parasite P. falciparum that infects red blood cells and deprives them of oxygen will be unable to infect the sickled cells and lead to a malarial infection. In other words, the partial sickling of RBCs of a carrier render that person immune to malaria. On the other hand, a wild-type homozygote individual would be susceptible to the malarial infection.

No Effect on Reproductive Fitness

In some cases, the deleterious effect of the mutation is exhibited at a later stage in life, by when the reproductive stage of the organism has already elapsed. Hence, the mutations are passed on despite their harmful nature, as the effect does not interfere or exhibit itself during the reproductive stage. An example of this is the trinucleotide repeat mutations seen in the HD gene that causes Huntington’s disease. In this case, the effects of the disease are seen after the age of 40, and till then, the individual has already reproduced and passed this deleterious mutation onto the offspring. Despite this affecting the fitness of the individual, it persists, since it does not affect the reproductive fitness of the individual, but merely shortens the lifespan.

Maintained by Mutations

Some mutations may keep arising in certain genes despite the elimination efforts taken by the organisms genome. This may be due to the hyper-mutable nature of the gene, and also because the gene maybe too vital to tamper with (to prevent the induction of other accidental errors). An example of this is the NF gene, which when mutated, gives rise to a condition called neurofibromatosis, that causes tumors of the nervous system. Here, it may be difficult to remove the mutation, since any unwanted disruption in the gene sequence will only cause further damage. Also, even in case this mutation is eliminated, the gene does have a high tendency to mutate almost 1 in every 4,000 gametes possesses new mutations of this gene.

Maintained by Gene Flow

This refers to the prevalence of a mutated gene copy in a population to its introduction by another population that has migrated to the same location. As mentioned above, the SCA mutation is beneficial to areas with rampant malaria, as is the case with the regions of the African continent. However, when the carriers residing in this area migrated to other countries with a low incidence of malaria, the SCA mutation was introduced into the populations of those countries. Therefore, human migration brought about the flow of genetic material from Africa to other countries, where this mutation, in the absence of malarial incidence, was purely detrimental.

Polyploidy of Genome

Deleterious mutations are usually recessive in nature. If a haploid organism possesses a deleterious mutation, the effect can be readily observed, crippling the organisms fitness, and resulting in its demise. However, in case the organism is a diploid or polyploid with multiple alleles of a gene, the detrimental effect can be silenced or overridden by the presence of a fully functional wild-type allele. While this prevents the expression of the mutated allele, it does not eliminate it, causing it to persist in the population, till two individuals with the same allele reproduce and give rise to an offspring that will suffer the deleterious effects of the mutation.

Although the cellular repair machinery, along with the proofreading mechanisms, try to eliminate the mutations, certain mutations are not rectified or are actively conserved (as explained above). The accumulation of mutations, by this way, over the course of several generations, leads to an effect called Muller’s ratchet, which may lead to the extinction of the species of that organism. This effect is a principle studied in reference to the extinction of species, and the effort to conserve those on the brink of extinction.

Related Posts

Mutation, a change in the sequence of genes, is divided into various types such as beneficial, harmful, and neutral, based on their effects. We are here to discuss beneficial mutation&hellip

The main difference between germline and somatic mutation lies in the fact whether they are heritable or not. But, there's more to it. This BiologyWise post gives a systematic comparison&hellip

Zone of inhibition is found with the help of disk diffusion method. This BiologyWise post gives you the definition as well as information regarding different parameters that may affect the&hellip


Unusual DNA folding increases the rates of mutations

New research shows that DNA that folds into conformations other than the classic double helix (non-B DNA), which includes as much as 13% of the human genome, leads to elevated nucleotide substitution rates in both the non-B motifs themselves and their flanking regions. These elevated mutation rates are a major contributor to the regional variation in mutation rates across the genome. Credit: Wilfried Guiblet and Dani Zemba, Penn State

DNA sequences that can fold into shapes other than the classic double helix tend to have higher mutation rates than other regions in the human genome. New research by a team of Penn State scientists shows that the elevated mutation rate in these sequences plays a major role in determining regional variation in mutation rates across the genome. Deciphering the patterns and causes of regional variation in mutation rates is important both for understanding evolution and for predicting sites of new mutations that could lead to disease.

A paper describing the research is available online in the journal Nucleic Acids Research.

"Most of the time we think about DNA as the classic double helix this basic form is referred to as 'B-DNA,'" said Wilfried Guiblet, co-first author of the paper, a graduate student at Penn State at the time of research and now a postdoctoral scholar at the National Cancer Institute. "But, as much as 13% of the human genome can fold into different conformations called 'non-B DNA.' We wanted to explore what role, if any, this non-B DNA played in variation that we see in mutation rates among different regions of the genome."

Non-B DNA can fold into a number of different conformations depending on the underlying DNA sequence. Examples include G-quadruplexes, Z-DNA, H-DNA, slipped strands, and various other conformations. Recent research has revealed that non-B DNA plays critical roles in cellular processes, including the replication of the genome and the transcription of DNA into RNA, and that mutations in non-B sequences are associated with genetic diseases.

"In a previous study, we showed that in the artificial system of a DNA sequencing instrument, which uses similar DNA copying processes as in the cell, error rates were higher in non-B DNA during polymerization," said Kateryna Makova, Verne M. Willaman Chair of Life Sciences at Penn State and one of the leaders of the research team. "We think that this is because the enzyme that copies DNA during sequencing has a harder time reading through non-B DNA. Here we wanted to see if a similar phenomenon exists in living cells."

The team compared mutation rates between B- and non-B DNA at two different timescales. To look at relatively recent changes, they used an existing database of human DNA sequences to identify individual nucleotides—letters in the DNA alphabet—that varied among humans. These "single nucleotide polymorphisms" (SNPs) represent places in the human genome where at some point in the past a mutation occurred in at least one individual. To look at more ancient changes, the team also compared the human genome sequence to the genome of the orangutan.

They also investigated multiple spatial scales along the human genome, to test whether non-B DNA influenced mutation rates at nucleotides adjacent to it and further away.

"To identify differences in mutation rates between B- and non-B DNA we used statistical tools from 'functional data analysis' in which we compare the data as curves rather than looking at individual data points," said Marzia A. Cremona, co-first author of the paper, a postdoctoral researcher at Penn State at the time of the research and now an assistant professor at Université Laval in Quebec, Canada. "These methods give us the statistical power to contrast mutation rates for the various types of non-B DNA against B-DNA controls."

For most types of non-B DNA, the team found increased mutation rates. The differences were enough that non-B DNA mutation rates impacted regional variation in their immediate surroundings. These differences also helped explain a large portion of the variation that can be seen along the genome at the scale of millions of nucleotides.

"When we look at all the known factors that influence regional variation in mutation rates across the genome, non-B DNA is the largest contributor," said Francesca Chiaromonte, Huck Chair in Statistics for the Life Sciences at Penn State and one of the leaders of the research team. "We've been studying regional variation in mutation rates for a long time from a lot of different angles. The fact that non-B DNA is such a major contributor to this variation is an important discovery."

"Our results have critical medical implications," said Kristin Eckert, professor of pathology and biochemistry and molecular biology at Penn State College of Medicine, Penn State Cancer Institute researcher, an author on the paper, and the team's long-time collaborator. "For example, human geneticists should consider the potential of a locus to form non-B DNA when evaluating candidate genetic variants for human genetic diseases. Our current and future research is focused on unraveling the mechanistic basis behind the elevated mutation rates at non-B DNA."

The results also have evolutionary implications.

"We know that natural selection can impact variation in the genome, so for this study we only looked at regions of the genome that we think are not under the influence of selection," said Yi-Fei Huang, assistant professor of biology at Penn State and one of the leaders of the research team. "This allows us to establish a baseline mutation rate for each type of non-B DNA that in the future we could potentially use to help identify signatures of natural selection in these sequences."

Because of their increased mutation rates, non-B DNA sequences could be an important source of genetic variation, which is the ultimate source of evolutionary change.

"Mutations are usually thought to be so rare, that when we see the same mutation in different individuals, the assumption is that those individuals shared an ancestor who passed the mutation to them both," said Makova, a Penn State Cancer Institute researcher. "But it's possible that the mutation rate is so high in some of these non-B DNA regions that the same mutation could occur independently in several different individuals. If this is true, it would change how we think about evolution."


Genes vs Genome Size

In eukaryotic organisms, there is a paradox observed, namely that the number of genes that make up the genome does not correlate with genome size. In other words, the genome size is much larger than would be expected given the total number of protein coding genes. Genome size can increase by duplication, insertion, or polyploidization and the process of recombination can lead to both DNA loss or gain. It is also possible that genomes can shrink due to deletions.

Figure (PageIndex<1>): Gene variation in the Genome: This figure represents the human genome, categorized by function of each gene product, given both as number of genes and as percentage of all genes. Importantly, genome size does not necessarily correlate with complexity.

A famous example for such gene decay is the genome of Mycobacterium leprae, the causative agent of leprosy. M.leprae has lost many once-functional genes over time due to the formation of pseudogenes. This is evident in looking at its closest ancestor Mycobacterium tuberculosis. M. leprae lives inside and replicates inside of a host and due to this arrangement it does not have a need for many of the genes it once carried which allowed it to live and prosper outside of the host. Thus over time these genes have lost their function through mechanisms such as mutation causing them to become pseudogenes. It is beneficial to an organism to rid itself of non-essential genes because it makes replicating its DNA much faster and more energy-efficient.

An example of increasing genome size over time is seen in filamentous plant pathogens. These plant pathogen genomes have been growing larger over the years due to repeat-driven expansion. The repeat-rich regions contain genes coding for host interaction proteins. With the addition of more and more repeats to these regions the plants increase the possibility of developing new virulence factors through mutation and other forms of genetic recombination. In this way it is beneficial for these plant pathogens to have larger genomes.


Somatic Mutation

The preceding discussion focused on heritable germline mutations, the cumulative phenotypic effects of which are expressed only in the following generations. The situation is dramatically different for somatic mutations influencing our daily well-being, both because of the larger numbers of cells involved and the higher underlying mutation rates. Although somatic mutations are nonheritable, there is a potentially significant evolutionary link with the germline mutation rate because the DNA replication and repair machinery is shared between both types of cells. Substantial theory on the evolution of mutation rates focuses on the indirect consequences of mutant alleles remaining transiently associated with mutator alleles until disassociated by recombination and segregation (Kimura 1967 Dawson 1999 Lynch 2010), but this yields relatively weak selection on the mutation rate. For large multicellular species, the direct effects of somatic mutations may be the primary source of selection on the mutation rate (Crow 1986 Lynch 2008 Erickson 2010).

One of the many consequences of somatic mutations is cancer, although such effects almost certainly extend to other physical and psychological disorders. Observing that the majority of the variance in lifetime risk of cancer among different tissues is associated with variation in the number of cell divisions in self-renewing lineages, Tomasetti and Vogelstein (2015) argued that the majority of cancers are unavoidable consequences of the stochastic arrival of background replication errors in normal, otherwise healthy cells (rather than responses to exogenous and avoidable carcinogenic factors). This idea that mutation is associated with DNA replication has precedence in work suggesting that variation in germline mutation rates among species, between males and females of the same species, and among males of different ages is in part due to variation in germline cell-division number (Drost and Lee 1995 Crow 2000 Wilson Sayres et al. 2011). Nevertheless, Tomasetti and Vogelstein’s conclusion that most cancers are unpredictable (and therefore unpreventable) elicited considerable controversy (e.g., Albini et al. 2015 Weinberg and Zaykin 2015).

Such engaged discussion makes clear the need for quantitative information on background rates of somatic mutation, which is difficult to achieve owing to the mosaic nature of somatic mutations within multicellular tissues. Early indirect estimates based on marker loci for phenotypes in four human tissue types suggested an increase in the base-substitution mutation rate per cell division relative to that in the germline (Lynch 2009, 2010). More recent results based on whole-exome sequencing imply and inflations for brain, lymphocyte, colon epithelium, and skin cells (Tomasetti et al. 2013 Lodato et al. 2015 Martincorena et al. 2015). Although the mechanisms generating elevated somatic mutation rates remain unclear (possible explanations include elevated numbers of cell divisions, altered expression of components of the repair machinery, and elevated levels of mutagenic by-products of metabolism), it is clear that humans are not exceptional in this regard. In all other species for which data are available, somatic-mutation rates are substantially greater than those at the germline level (Lynch 2009, 2010).

Assuming a inflation of the somatic mutation rate (the average of the above estimates), an average adult cell will contain de novo mutations. Although these will not all be independent, with cells in the human body, the total number of mutations carried by an adult will then be of order with every nucleotide site having been mutated in thousands of cells. A large fraction of such mutations may be completely innocuous, but even if the fraction of the human genome with fitness consequences is as small as 1% (Lindblad-Toh et al. 2011 Keightley 2012 Rands et al. 2014), the unavoidable conclusion is that there is no way to avoid the accumulation of somatic mutations with undesirable effects in an aging human. Thus, at least insofar as eliminating the source, the war on cancer appears to be unwinnable.

This is not to say that we should abandon goals toward reducing the incidence of environmental mutagens. Indeed, the possibility that the baseline human mutation rate will elevate over time (for reasons discussed below) motivates a strong argument to the contrary—the need to minimize all extraneous factors that might further exacerbate an already precarious situation. It should be of particular concern that procedures commonly employed in medical screening and intervention have the side effect of increasing our exposure to key mutagens. For example, the use of computed tomography (CAT scans), which involves X-ray irradiation, has increased dramatically in the past two decades, with of patients being of reproductive age or earlier (Kocher et al. 2011 Berdahl et al. 2013) and the administered radiation being well above levels known to affect somatic mutation rates (Leuraud et al. 2015). A second potential concern involves the extremely widespread application of antibiotics. It is now known that sublethal levels of antibiotics indirectly increase the mutation rate in target bacteria by inducing the stress response (Kohanski et al. 2010 Andersson and Hughes 2014), but for these and most other commonly applied medicines, we know little to nothing about the effects on DNA stability at the nucleotide level in eukaryotic host cells.


Abstract

Events in primate evolution are often dated by assuming a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life history traits, suggesting it should also be found among primates. Motivated by these considerations, we analyze whole genomes from 10 primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs), and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are up to 64% higher in lineages leading from the hominoid–NWM ancestor to NWMs than to apes. Within apes, rates are ∼2% higher in chimpanzees and ∼7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however in particular, transitions at CpG sites exhibit a more clocklike behavior than do other types, presumably because of their nonreplicative origin. Thus, not only the total rate, but also the mutational spectrum, varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate the human and chimpanzee divergence time is 12.1 million years,​ and the human and gorilla divergence time is 15.1 million years​.

Germline mutations are the ultimate source of genetic differences among individuals and species. They are thought to arise from a combination of errors in DNA replication (e.g., the chance misincorporation of a base pair) or damage that is unrepaired by the time of replication (e.g., the spontaneous deamination of methylated CpG sites) (1). If mutations are neutral (i.e., do not affect fitness), then the rate at which they arise will be equal to the substitution rate (2). A key consequence is that if mutation rates remain constant over time, substitution rates should likewise be constant.

This assumption of constancy of substitution rates plays a fundamental role in evolutionary genetics by providing a molecular clock with which to date events inferred from genetic data (3). Notably, important events in human evolution for which there is no fossil record (e.g., when humans and chimpanzees split, or when anatomically modern humans left Africa) are dated using a mutation rate obtained from contemporary pedigrees or phylogenetic analysis, assuming the per year rate has remained unchanged for millions of years (4).

However, we know from studies of mammalian phylogenies, as well as of other taxa, that there can be substantial variation in substitution rates per unit time (5 ⇓ –7). In particular, there is the well-known hypothesis of a “generation time effect” on substitution rates, based on the observation that species with shorter generation time (i.e., mean age of reproduction) have higher mutation rates (8). For instance, mice have a generation time on the order of months (∼10–12 mo) compared with ∼29 y in humans (9), and a two- to threefold higher substitution rate per year (8). More generally, a survey of 32 mammalian species found reproductive span to be the strongest predictor of substitution rate variation (5).

A generation time effect has also been suggested in humans, motivated by the observation that the yearly mutation rate estimated by sequencing human and chimpanzee pedigrees [∼0.4 × 10 −9 per base pair per year (10, 11)] is approximately twofold lower than the mutation rate inferred from the number of substitutions observed between primates (1). Substitution-derived estimates of mutation rates are highly dependent on dating evolutionary lineages from the fossil record, and so are subject to considerable uncertainty. Nonetheless, one way to reconcile pedigree and substitution-derived estimates of the mutation rate would be to postulate that the generation time has increased toward the present, and led to a decrease in the yearly mutation rate (12).

Whether the association between generation time and substitution rates is causal remains unclear, however correlated traits such as metabolic rate (13), body size (14), and sperm competition (15) may also affect substitution rates. For instance, the metabolic rate hypothesis posits that species with higher basal metabolic rates are subject to higher rates of oxidative stress, and hence have a higher mutation rate (13). Body mass has been shown to be negatively correlated to substitution rates, such that smaller animals tend to have higher substitution rates (13). Sexual selection on mating systems may also affect substitution rates, as more intense sperm competition leads to selection for higher sperm counts, leading to more cell divisions per unit time during spermatogenesis and a higher male mutation rate (15).

That said, an effect of life history traits such as generation time on the yearly mutation rate is expected from first principles, given our understanding of oogenesis and spermatogenesis (16, 17). In mammals, oogonial divisions are completed by the birth of the future mother, whereas the spermatogonial stem cells continue to divide postpuberty (16). Thus, the total number of replication-driven mutations inherited by a diploid offspring accrues in a piecewise linear manner with parental age, with the number depending on the number of cell divisions in each developmental stage, as well as the per cell division mutation rates (1, 17). These considerations indicate that changes in generation time, onset of puberty, and rate of spermatogenesis should all influence yearly mutation rates (1, 17).

Importantly, then, primates are well known to differ with regard to most of these traits. In addition to huge variation in body size and metabolic rates, generation time varies almost 10-fold, with the shortest generation time observed in prosimians [∼3 y in galago and mouse lemurs (18)] and the longest generation time observed in humans (∼29 y). Species also differ in the strength of sperm competition and rates of spermatogenesis: monkeys have a shorter spermatogenetic division, and thus consequently produce more sperm per unit time than do apes (19). Thus, even if the per cell division mutation rate remained constant, we should expect differences in yearly mutation rates among species.

Although the factors discussed thus far apply to all sites, variation in substitution rates among species also depends on the type of mutation and the genomic context (i.e., flanking sequence) in which it occurs (6). For example, in mammals, CpG transitions show the least amount of variation in substitution rates among species (6). A plausible explanation is the source of mutations, as transitions at methylated CpG sites are thought to occur primarily through spontaneous deamination if they arise at a constant rate and their repair is inefficient relative to the cell cycle length, as is thought to be the case, then their mutation rate should depend largely on absolute time, rather than the number of cell divisions (20 ⇓ –22).

In addition, even substitutions that have no effect on fitness may vary in their rate of accumulation among lineages because of biased gene conversion (BGC), the bias toward strong (S: G or C) rather than weak (W: A or T) bases that occurs in the repair of double-strand breaks (23). This phenomenon leads to the increased probability of fixation of S alleles (and loss of W alleles) in regions of higher recombination, and can therefore change substitution rates relative to mutation rates (23, 24). The strength of BGC is a function of the degree of bias, the local recombination rate, and the effective population size of the species (23). The latter varies by three- to fourfold among primates (25), and the fine-scale recombination landscape is also likely to differ substantially across species (26).

Empirically, the extent to which substitution rates vary among primate lineages remains unclear. Kim et al. (27) compared two hominoids (human and chimpanzee) and two Old World Monkeys (OWMs baboon and rhesus macaque). Assuming that the average divergence time of the two pairs of species is identical, they reported that substitution rates at transitions at non-CpG sites differ by ∼31% between hominoids and OWMs, whereas rates of CpG transitions are almost identical (27). In turn, Elango et al. (28) found that the human branch is ∼2% shorter than that in chimpanzee (considering the rates from the human–chimpanzee ancestor), and ∼11% shorter than in gorilla (considering rates from the human–gorilla ancestor). Although these comparisons suggest that substitution rates are evolving across primates, they are based on limited data, make strong assumptions about divergence times, and rely on parsimony-based approaches that may underestimate substitution rates for divergent species, notably at CpG sites (29). We therefore revisit these questions using whole-genome sequence alignments of 10 primates, allowing for variable substitution rates along different lineages and explicitly modeling the context dependency of CpG substitutions.


Unusual DNA folding increases the rates of mutations

DNA sequences that can fold into shapes other than the classic double helix tend to have higher mutation rates than other regions in the human genome. New research shows that the elevated mutation rate in these sequences plays a major role in determining regional variation in mutation rates across the genome. Deciphering the patterns and causes of regional variation in mutation rates is important both for understanding evolution and for predicting sites of new mutations that could lead to disease.

A paper describing the research by a team of Penn State scientists is available online in the journal Nucleic Acids Research.

"Most of the time we think about DNA as the classic double helix this basic form is referred to as 'B-DNA,'" said Wilfried Guiblet, co-first author of the paper, a graduate student at Penn State at the time of research and now a postdoctoral scholar at the National Cancer Institute. "But, as much as 13% of the human genome can fold into different conformations called 'non-B DNA.' We wanted to explore what role, if any, this non-B DNA played in variation that we see in mutation rates among different regions of the genome."

Non-B DNA can fold into a number of different conformations depending on the underlying DNA sequence. Examples include G-quadruplexes, Z-DNA, H-DNA, slipped strands, and various other conformations. Recent research has revealed that non-B DNA plays critical roles in cellular processes, including the replication of the genome and the transcription of DNA into RNA, and that mutations in non-B sequences are associated with genetic diseases.

"In a previous study, we showed that in the artificial system of a DNA sequencing instrument, which uses similar DNA copying processes as in the cell, error rates were higher in non-B DNA during polymerization," said Kateryna Makova, Verne M. Willaman Chair of Life Sciences at Penn State and one of the leaders of the research team. "We think that this is because the enzyme that copies DNA during sequencing has a harder time reading through non-B DNA. Here we wanted to see if a similar phenomenon exists in living cells."

The team compared mutation rates between B- and non-B DNA at two different timescales. To look at relatively recent changes, they used an existing database of human DNA sequences to identify individual nucleotides -- letters in the DNA alphabet -- that varied among humans. These 'single nucleotide polymorphisms' (SNPs) represent places in the human genome where at some point in the past a mutation occurred in at least one individual. To look at more ancient changes, the team also compared the human genome sequence to the genome of the orangutan.

They also investigated multiple spatial scales along the human genome, to test whether non-B DNA influenced mutation rates at nucleotides adjacent to it and further away.

"To identify differences in mutation rates between B- and non-B DNA we used statistical tools from 'functional data analysis' in which we compare the data as curves rather than looking at individual data points," said Marzia A. Cremona, co-first author of the paper, a postdoctoral researcher at Penn State at the time of the research and now an assistant professor at Université Laval in Quebec, Canada. "These methods give us the statistical power to contrast mutation rates for the various types of non-B DNA against B-DNA controls."

For most types of non-B DNA, the team found increased mutation rates. The differences were enough that non-B DNA mutation rates impacted regional variation in their immediate surroundings. These differences also helped explain a large portion of the variation that can be seen along the genome at the scale of millions of nucleotides.

"When we look at all the known factors that influence regional variation in mutation rates across the genome, non-B DNA is the largest contributor," said Francesca Chiaromonte, Huck Chair in Statistics for the Life Sciences at Penn State and one of the leaders of the research team. "We've been studying regional variation in mutation rates for a long time from a lot of different angles. The fact that non-B DNA is such a major contributor to this variation is an important discovery."

"Our results have critical medical implications," said Kristin Eckert, professor of pathology and biochemistry and molecular biology at Penn State College of Medicine, Penn State Cancer Institute Researcher, an author on the paper, and the team's long-time collaborator. "For example, human geneticists should consider the potential of a locus to form non-B DNA when evaluating candidate genetic variants for human genetic diseases. Our current and future research is focused on unraveling the mechanistic basis behind the elevated mutation rates at non-B DNA."

The results also have evolutionary implications.

"We know that natural selection can impact variation in the genome, so for this study we only looked at regions of the genome that we think are not under the influence of selection," said Yi-Fei Huang, assistant professor of biology at Penn State and one of the leaders of the research team. "This allows us to establish a baseline mutation rate for each type of non-B DNA that in the future we could potentially use to help identify signatures of natural selection in these sequences."

Because of their increased mutation rates, non-B DNA sequences could be an important source of genetic variation, which is the ultimate source of evolutionary change.

"Mutations are usually thought to be so rare, that when we see the same mutation in different individuals, the assumption is that those individuals shared an ancestor who passed the mutation to them both," said Makova, a Penn State Cancer Institute researcher. "But it's possible that the mutation rate is so high in some of these non-B DNA regions that the same mutation could occur independently in several different individuals. If this is true, it would change how we think about evolution."


The Human Genome

De Novo Mutations

While much emphasis is placed on inherited genome variation , all such variation had to originate as de novo, or new, changes occurring in germ cells. At that point, such a variant would be quite rare in the population (occurring just once), and its ultimate frequency in the population over time depends on chance and on the principles of Mendelian inheritance and population genetics (see Chapter 3 ). While there have been many efforts to estimate the human mutation rate, the ability to sequence genomes directly provides a robust method for measuring such rates genome-wide, by, for example, comparing the sequence of an offspring’s genome (or a portion of that genome) with that of his or her parents ( Conrad et al., 2011 Roach et al., 2010 ).

Such studies have determined the germline base substitution mutation rate to be

10 −8 /bp/generation. Thus, any individual carries an estimated 30–70 new mutations per genome that were not present in the genomes of his or her parents. This rate, however, varies from gene to gene around the genome, and perhaps from population to population or even individual to individual ( Conrad et al., 2011 ). Overall, the rate, combined with considerations of population growth and dynamics, predicts that there must be an enormous number of relatively new (and thus rare) mutations in the current worldwide population of 7 billion individuals, a prediction confirmed by resequencing of genes from some 13,000 individuals ( Coventry et al., 2010 ).

Conceptually similar studies have explored de novo mutations in CNVs, where the generation of a new length variant depends on recombination, rather than on errors in DNA synthesis to generate a new basepair (see Chapter 10 ). Indeed, the measured rate of formation of new CNVs (

1.2 × 10 −2 /genome/transmission) is orders of magnitude higher than that of base substitutions ( Itsara et al., 2011 ).

Mutations also occur in somatic cells, and one would predict that, in fact, every cell in an individual has a slightly different version of his or her genome, depending on the number of cell divisions that have occurred since conception to the time of sample acquisition. In highly proliferative tissues, such as intestinal epithelial cells or hematopoietic cells, such genomic heterogeneity is particularly likely to be apparent. However, most such mutations are not typically detected, since one usually sequences DNA from collections of many millions of cells in such a collection, the most prevalent base at any position in the genome will be the one present at conception and rare somatic mutations will be largely invisible and unascertained. New methods to sequence DNA from single cells are under development, which will provide an opportunity to explore the nature of genomic sequence heterogeneity in different cell and tissue types and across the lifespan ( Kalisky et al., 2011 ).

An exception to the expectation that de novo somatic mutations will be typically undetectable within any multi-cell DNA sample is in cancer, in which the mutational basis for the origins of cancer and the clonal nature of tumor evolution drives certain somatic changes to be present in essentially all the cells of a tumor. Indeed, 1000 to 10,000 somatic mutations (and sometimes many more) are readily found in the genomes of most adult cancers, with mutation frequencies and patterns specific to different cancer types ( Meyerson et al., 2010 Stratton, 2011 ). In principle, such studies can point to critical genes (“driver” mutations) and/or novel gene fusion products that will inform understanding of a given cancer’s molecular, genetic, and genomic etiology. To date, this sequence-based approach has identified over 200,000 mutations in hundreds of whole cancer genomes, pointing to nearly 500 somatically mutated cancer genes that contribute to neoplastic change in one or more types of cancer. Impressively, this suggests that as many as 2% of the protein-coding genes in the human genome can, when mutated in particular ways, predispose a cell to become cancerous ( Meyerson et al., 2010 ). An online catalog of somatic mutations in cancer is maintained by the Sanger Institute in the UK ( Forbes et al., 2011 ) ( Table 1.2 ).