Infer gene frequency within a species over time

Infer gene frequency within a species over time

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I was reading Karlsson et al. (2014) and I came into this:

A selected variant that increases rapidly in frequency in the past ~250,000 years can be detected as an unusual reduction in genetic diversity.

I realised that I do not know how to infer a specific allele frequency over time within a given species.

I tried to googled some keyword but was flooded by other concepts. Could you please direct me to some appropriate documentation/kewords?

There's two parts to your post that I want to address, the first is the quote (because I want to make sure you understand it well), and the second is about general inference methods for estimating the genetic composition of ancestral populations.

The Quote: Selective Sweeps

A selected variant that increases rapidly in frequency in the past ~250,000 years can be detected as an unusual reduction in genetic diversity.

When an allele is selected for it will spread relatively quickly in the population, relative to the spread of neutral alleles for example. If an allele becomes fixed in the population the diversity in that gene is zero; there is no standing genetic variation in the gene. Selection will often reduce genetic variation but see this post too.

However, selection doesn't just reduce genetic diversity at the selected locus, but at loci near to the gene, those that are linked. The loss of genetic diversity at linked loci occurs by a process called a selective sweep. This is defined (somewhat poorly) in the web version of your linked paper:

Selective sweeps; Reductions in genetic variation caused by positive selection at particular loci.

Basically, a selective sweep occurs when strong selection causes one allele, and the loci it is highly linked to, to spread through a population. Genetic diversity will be lost from the linked loci at a rate determined by the strength of selection and the degree of linkage (where tighter linkage and stronger selection increase the rate of loss). This paper (see section 7 for selective sweeps) provides a good discussion of the factors affecting genetic variation in natural populations, and draws on the example of the Y chromosome:

One or more selective sweeps will have left the Y chromosome with little or no variability

Inference of ancestral populations

You could sample DNA of the population at the time you are interested in. However, doing so is very tricky. DNA degrades over time therefore it is important to have an understanding of how DNA degrades over time if you want to infer about the population. (I saw a talk a while back by a researcher, can't remember the name, taking samples from graves. I don't know how correctly I remember this, but much of the DNA they collect is bacteria. Just a tiny fraction of the DNA they got from sampling human bones found in graves that were just a few hundred years old was actually human. Another issue of studying old DNA). It's often difficult to find sources of DNA samples which a) of high enough quality and b) with enough individuals to allow good inference of the ancestral population; a small sample will be prone to sampling bias.*

There is another approach, and its commonly used now. If you are interested in finding more you should be searching for coalescent theory and methods. This infers back, based on current (or relatively more recent) genetic composition of populations and using population genetic theory. It's not really an used as an attempt to estimate the specific allele frequency, rather as an attempt to infer population size, migration rates, and recombination rates. It infers when was the most recent common ancestor (MRCA). This paper reviews coalescent methods for phylogenetic trees and this is an introduction lecture on coalescence. Coalescent theory has many considerations that need to be made; Are some mutations more common than others (see Molecular Clock)? How does selection affect genetic variation (e.g. selective sweeps)? Does linkage vary across the genome? Do rates of mutation, drift, and selection vary across the genome? Are different parts of the genome differently affected by migration?

Both approaches raise serious considerations which are at the forefront of evolutionary genetics right now. Research groups all over the world are developing lab methods, statistical methods, and mathematical models in an attempt to make inference more accurate. Right now the only way, in my opinion, to infer the frequency of specific alleles in ancestral populations is to use ancient DNA methods; coalescent can be used to infer population genetic parameters, but regarding specific alleles it just has so many factors to consider which (right now, but certainly less so in the future) we just don't have a thorough understanding of. In other words, the assumptions that need to be made for coalescence to be able to estimate specific ancestral allele frequencies are rarely going to be satisfied. However, as long as this is properly discussed there is no problem with the method being used, and I am certain that the future is bright for such methods.

*On sampling bias: Imagine you want to work out the frequency of the number two on a six-sided die (we know the true frequency is 0.167). You roll the die four times and the die shows the side with two dots once. You population frequency estimate is 0.25. Your friend rolls the same die 4000 times. They see the two dots 652 times, which is 0.163, much more representative of the true population frequency. The moral of the story, small samples can give misleading estimates of the truth.

There are really two ways to infer past genetics.

  1. Sample the past.

Only really works if you have well-preserved uncontaminated archaeological samples, but works surprisingly well, considering. The accuracy and completeness goes down fairly quickly as you go back in time but thousands of years to hundreds of thousands of years is roughly possible. You won't have many samples so it's hard to say anything about population genetics. Don't do this. It doesn't work very well and it's very hard.

  1. Molecular clocks.

Same-sense mutations that affect nothing accumulate in genes over time as they're passed down. Genes that have a distant common ancestor are more different than genes that are just a generation or two apart. This is essentially how phylogenetics works as a whole, but you can use age differences between nearby genes to detect selection in this way. If gene A is nearly identical in every individual in a population, but gene B has thousands of pretty distinct variations, it's clear gene A does something important and has been powerfully selected for. Don't just count the number of kinds of alleles for gene A, calculate the average relatedness between all the alleles you find. If the relatedness for gene A is much lower than for other genes, gene A is being selected on. This is what the text means by 'reduction in genetic diversity', bringing down the average relatedness.

You can't work out allele frequencies at arbitrary times in the past without a time machine(even ancient sampling, you can't really be sure you're getting an accurate sample of the historical population), but phylogenetics can hint at what happened by detecting the signatures of evolution.

Genome-wide analysis of a long-term evolution experiment with Drosophila

Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast. Here we present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development. Flies in these selected populations develop from egg to adult ∼20% faster than flies of ancestral control populations, and have evolved a number of other correlated phenotypes. On the basis of 688,520 intermediate-frequency, high-quality single nucleotide polymorphisms, we identify several dozen genomic regions that show strong allele frequency differentiation between a pooled sample of five replicate populations selected for accelerated development and pooled controls. On the basis of resequencing data from a single replicate population with accelerated development, as well as single nucleotide polymorphism data from individual flies from each replicate population, we infer little allele frequency differentiation between replicate populations within a selection treatment. Signatures of selection are qualitatively different than what has been observed in asexual species in our sexual populations, adaptation is not associated with 'classic' sweeps whereby newly arising, unconditionally advantageous mutations become fixed. More parsimonious explanations include 'incomplete' sweep models, in which mutations have not had enough time to fix, and 'soft' sweep models, in which selection acts on pre-existing, common genetic variants. We conclude that, at least for life history characters such as development time, unconditionally advantageous alleles rarely arise, are associated with small net fitness gains or cannot fix because selection coefficients change over time.

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.


The main drivers of global biodiversity change have been identified (Millennium Ecosystem Assessment 2005 ), but their impacts vary spatially, temporally and taxonomically. Drivers may also interact to produce synergistic or opposing effects (Travis 2003 Brook, Sodhi & Bradshaw 2008 Schweiger et al. 2010 ), but there are few empirical examples, particularly for insects, which comprise the majority of terrestrial biodiversity (Collen et al. 2012 ). Unquantified change and a resultant lack of evidence-based conservation present pressing biological and strategic management challenges.

Here, we utilize a substantial data set of species occurrence records to examine long-term changes in a species-rich insect taxon (Lepidoptera: macro-moths) in Great Britain (GB). Large-scale, comprehensive assessments of biodiversity changes in speciose insect taxa are rare (Thomas 2005 Mattila et al. 2008 , 2009 Jeppsson et al. 2010 ). Moths constitute one of the largest groups of herbivorous insects, forming key links in food webs, inflicting damage (as well as pollination) on their plant hosts and providing a major food source for insectivorous animals in many ecosystems (Strong, Lawton & Southwood 1984 ).

We calculate long-term changes in frequency of occurrence of 673 lepidopteran species in GB and evaluate the trends in relation to species’ predicted sensitivities to recent climatic and habitat changes. Habitat modification, particularly agricultural intensification, is considered the pre-eminent cause of recent species declines in GB and other western European countries (Warren & Key 1991 Robinson & Sutherland 2002 Kleijn et al. 2009 ). In parallel, climate change is eliciting changes in the geographical range, abundance, phenology and biotic interactions of Lepidoptera species (Parmesan 2006 ). Climate change provides a shifting context for the impacts of habitat modification, either amplifying or ameliorating species’ responses depending upon ecological traits and biogeographical situation.

Gradients of land use, climate and species’ distributions combine conveniently to provide distinct (often opposite) predictions of changes to species’ occurrence in GB. Northern GB retains a higher proportion of semi-natural habitats than southern GB, where levels of land conversion to intensive agriculture and urbanisation have been greater (Morton et al. 2011 ). Therefore, moth species that are not strongly constrained by climate and occur widely in GB might be expected to decline in the south while remaining relatively stable in the north, in response to land-use changes. On the other hand, many insect species (including many macro-moths) reach the north-western climatic limit of their European range within southern GB. These species should benefit from climate change, leading to the opposite prediction – they should potentially increase as the climate has warmed (Hickling et al. 2006 ). In contrast, arctic–alpine species that are restricted to northern and montane areas in GB might be expected to decline in response to regional warming. By considering warm-adapted, cold-adapted and relatively climate-insensitive (within GB) species across a broad gradient of land-use intensity, we attempt to tease apart the effects of change in land use and climate on GB moths.

Land-use changes involve altered management (e.g. increased fertilizer input) as well as conversion from one land-use type to another. We considered these effects by analysing the occurrence changes in moths that are monophagous on larval host plants that possess different environmental requirements. Trait-based analyses of plant trends have been linked to drivers of change (Carey et al. 2008 ), utilizing Ellenberg indicator values to characterize the realized niches of plants along environmental gradients, such as those relating to soil chemistry and light availability (Ellenberg 1979 ). Thus, by considering the Ellenberg indicator values of moth larval hosts, we can examine links between drivers of botanical change and changes to the frequency of occurrence of moths.

Here, we test three hypotheses: (i) macro-moth species will show a wide diversity of changes as they respond to diverse drivers, but will have declined overall, mirroring wider biodiversity trends. (ii) The responses of species with different geographical distributions (southern, northern, widespread) are expected to differ because the effects of climate and land use may differ between these species categories. (iii) Moth occurrence trends will be associated with host plant attributes (Ellenberg indicator values) specifically, moths that use types of plant that are in decline, such as those associated with low nitrogen soil conditions, will also be in decline.

We found support for each hypothesis, enabling us to assess long-term moth biodiversity change. These results will guide future research into drivers of biodiversity change and inform ecological management to buffer species from negative impacts.

Alpha-1-Antitrypsin (α1AT) Deficiency


Allelic frequencies of α1AT gene mutations vary considerably around the world and between study populations. Utilizing global population studies to map the putative α1AT genes, investigators have suggested that the Z mutation originated in Scandinavia approximately 2000 years ago, eventually spreading from Northern Europe when people migrated from the area to other parts of the world. Therefore, the highest frequencies of the mutant gene are seen in Northern Europeans or in individuals of European ancestry. In Sweden, the frequency of the Z allele (PI*Z) is estimated to be 0.026 with 4–5% of the population carrying the mutation and 1:1600 live births being homozygous for this deficiency. The PI*Z gene frequency is somewhat lower in the United States among individuals of European descent, estimated to be between 0.01 and 0.02 with a carrier rate of approximately 3%. Some 80,000 to 100,000 people in the United States are PI ZZ homozygotes with considerable potential to develop disease. The Z allele is confined predominantly to Caucasians and is uncommon among individuals of Asian or African ancestry unless there is ethnic intermixing in the population. PI*S is a more common deficiency mutation than PI*Z, having a gene frequency between 0.02 and 0.03 in U. S. Caucasians. However, this genotypic variant is not associated with as severe a reduction in circulating levels of α1AT as the PI*Z allele and hence does not leave the individual susceptible to disease unless it is co-inherited with a Z allele. Of greater concern is the risk of disease in the 1:1000 to 1:1500 Caucasian individuals expressing a PI SZ phenotype. Although the matter is controversial, anyone carrying a single Z allele may be at some, as yet undefined, risk of developing liver disease regardless of the circulating levels of α1AT because of intracellular accumulation of the abnormally folded protein.

Using the Hardy-Weinberg Equation

The Hardy-Weinberg equation can be used to calculate gene frequencies (under Hardy-Weinberg conditions) from any one of its components.

Example 1:

In one hypothetical Zebra Mussel (Dreissena polymorpha) population, most of the individuals have dark, zebra-striped shells (below left). However, solid light-colored shells (below right, caused by a homozygous recessive gene, aa) occur in 1 of every 10,000 individuals.


Calculate gene frequencies and numbers of dominant homozygotes (AA, at left) and recessive homozygotes (aa, at right) in a population of 10,000 individuals.


  1. frequency of aa = q 2 = 1/10,000 = 0.0001, so q = 0.01
  2. number of aa = 0.0001 x 10,000 = 1 individual
  3. p + q = 1, so p = 0.99
  4. frequency of AA = p 2 = 0.9801
  5. number of AA = 0.9801 x 10,000 = 9,801 individuals
  6. For extra credit: frequency of Aa = 2pq = 2 x 0.9801 x 0.01 = 0.0198 or 198 individuals

Example 2:

The Coquina Clam (Donax variabilis) is highly **polychromic** (with shells of many different colors. In a population of 2,000 clams, 1,920 are solid colored, whereas the remainder has radiating color bands. Solid color occurs in homozygous dominant (BB) and heterozygotes (Bb) color banding only occurs in homozygous recessive individuals (bb).


Calculate gene frequencies and numbers of BB and Bb.


  1. 1,920 are solid (BB and Bb), so 80 banded are recessive (bb)
  2. frequency of bb = q2 = 80/2000 = 0.04, so q = 0.20
  3. p + q = 1, so p = 0.80
  4. number of BB: p2 = 0.64, so BB in population of 2,000 = 0.64 x 2,000 = 1,280 individuals
  5. number of Bb: 2pq [frequency of Bb] = 2 x 0.2 x 0.8 = 0.32, so Bb = 0.32 x 2,000 = 640 individuals


We first analyzed the dataset of Barghi et al. (33), an evolve-and-resequence study with 10 replicate populations exposed to a high-temperature laboratory environment, evolved for 60 generations, and sequenced every 10 generations. Using the seven time points and 10 replicate populations, we estimated the genome-wide 6 × 6 temporal covariance matrix Q for each of the 10 replicates. Each row of these matrices represents the temporal covariance Cov(Δ10ps10pt) between the allele frequency change (in 10-generation intervals, denoted Δ10pt) of some initial reference generation s (the row of the matrix) and some later time point t (the column of the matrix). We corrected these matrices for biases created due to sampling noise and normalized the entries for heterozygosity (SI Appendix, sections S1.2 and S1.4). These covariances are expected to be zero when only drift is acting, as only heritable variation for fitness can create covariance between allele frequency changes in a closed population (37). Averaging across the 10 replicate temporal covariances matrices, we find temporal covariances that are statistically significant (95% block bootstraps CIs do not contain zero), consistent with linked selection perturbing genome-wide allele frequency changes over very short time periods. The covariances between all adjacent time intervals are positive and then decay toward zero as we look at more distant time intervals (Fig. 1A), as expected when directional selection affects linked variants’ frequency trajectories until ultimately linkage disequilibrium (LD) and the associated additive genetic variance for fitness decays (which could occur as a population reaches a new optimum and directional selection weakens) (37). The temporal covariances per replicate are noisier, but this general pattern holds (SI Appendix, Fig. S23).

(A) Temporal covariance, averaged across all 10 replicate populations, through time from the Barghi et al. (33) study. Each line depicts the temporal covariance C o v ( Δ p s , Δ p t ) from some reference generation s to a later time t, which varies along the x axis each line corresponds to a row of the triangle of the temporal covariance matrix with the same color (Right). The ranges around each point are 95 % block bootstrap CIs. (B) A lower bound on the proportion of the total variance in allele frequency change explained by linked selection, G ( t ) , as it varies through time t along the x axis. The black line is the G ( t ) averaged across replicates, with the 95 % block bootstrap CI. The other lines are the G ( t ) for each individual replicate, with colors indicating what subset of the temporal covariance matrix in Right is being included in the calculation of G ( t ) .

Since our covariances are averages over loci, the covariance estimate could be strongly affected by a few outlier regions. To test whether large outlier regions drive the genome-wide signal we see in the Barghi et al. (33) data, we calculate the covariances in 100-kb windows along the genome (we refer to these as windowed covariances throughout) and take the median windowed covariance (and trimmed-mean windowed covariance) as a measure of the genome-wide covariance robust to large-effect loci. These robust estimates (SI Appendix, Table S1 and Fig. S24) confirm the patterns we see using the mean covariance, establishing that genomic temporal covariances are nonzero due to the impact of selection acting across many genomic regions.

While the presence of positive temporal covariances is consistent with selection affecting allele frequencies over time, this measure is not easily interpretable. We can calculate a more intuitive measure from the temporal covariances to quantify the impact of selection on allele frequency change: the ratio of total covariance in allele frequency change to the total variance in allele frequency change. We denote the change in allele frequency as Δ p t = p t + 1 − p t , where p t is the allele frequency in generation t. Since the total variation in allele frequency change can be partitioned into variance and covariance components, Var ( p t − p 0 ) = ∑ i = 0 t − 1 Var ( Δ p i ) + ∑ i = 0 t − 1 ∑ j ≠ i t − 1 C o v ( Δ p i , Δ p j ) (we correct for biases due to sequencing depth), and the covariances are zero when drift acts alone, this is a lower bound on how much of the variance in allele frequency change is caused by linked selection (37). We call this measure G ( t ) , defined as G ( t ) = ∑ i = 0 t − 1 ∑ j ≠ i t − 1 C o v ( Δ p i , Δ p j ) Var ( p t − p 0 ) . [1] This estimates the impact of selection on allele frequency change between the initial generation 0 and some later generation t, which can be varied to see how this quantity grows through time. When the sum of the covariances is positive, this measure can intuitively be understood as a lower bound on relative fraction of allele frequency change normally thought of as “drift” that is actually due to selection. Additionally, G ( t ) can be understood as a short-timescale estimate of the reduction in neutral diversity due to linked selection (or equivalently, the reduction in neutral effective population size needed to account linked selection) (SI Appendix, section S7). Since the Barghi et al. (33) experiment is sequenced every 10 generations, the numerator uses the covariances estimated between 10-generation blocks of allele frequency change thus, the strong, unobservable covariances between adjacent generations do not contribute to the numerator of G ( t ) . Had these covariances been measurable on shorter timescales, their cumulative effect would likely have been higher yet (SI Appendix, sections S2 and S8.4 have more details). Additionally, selection inflates the variance in allele frequency change per generation however, this effect cannot be easily distinguished from drift. For both these reasons, our measure G ( t ) is quite conservative (we demonstrate this through simulations in SI Appendix, section S8.4). Still, we find a remarkably strong signal. Greater than 20 % of total, genome-wide allele frequency change over 60 generations is the result of selection (Fig. 1B). This proportion of variance attributable to selection builds over time in Fig. 1B as the effects of linked selection are compounded over the generations unlike genetic drift. Our G(t) starts to plateau to a constant level as the covariances from earlier generations have decayed and so, no longer contribute as strongly (Fig. 1).

Additionally, we looked for a signal of temporal autocovariance in Bergland et al. (35), a study that collected Drosophila melanogaster through spring–fall season pairs across 3 years. If there was a strong pattern of genome-wide fluctuating selection, we might expect a pattern of positive covariances between similar seasonal changes (e.g., spring–fall in two adjacent years) and negative covariances between dissimilar seasonal changes (e.g., spring–fall and fall–spring in two adjacent years). However, we find no such signal over years, and in reproducing their original analysis, we find that their number of statistically significant seasonal polymorphisms is not enriched compared with an empirical null distribution created by permuting seasonal labels (we discuss this in more depth in SI Appendix, section S6).

The replicate design of Barghi et al. (33) allows us to quantify another covariance: the covariance in allele frequency change between replicate populations experiencing convergent selection pressures. These between-replicate covariances are created in the same way as temporal covariances: alleles linked to a particular fitness background are expected to have allele frequency changes in the same direction if the selection pressures are similar. Intuitively, where temporal covariances reflect that alleles associated with heritable fitness backgrounds are predictive of frequency changes between generations, replicate covariances reflect that heritable fitness backgrounds common to each replicate predict (under the same selection pressures) frequency changes between replicates we note that there is not a direct one-to-one correspondence between temporal and replicate covariances since the latter are driven by a shared selection pressure and the stochastic genetic backgrounds across replicate populations. We measure this through a statistic similar to a correlation, which we call the convergent correlation: the ratio of average between-replicate covariance across all pairs to the average SD across all pairs of replicates: c o r ( Δ p s , Δ p t ) = E A ≠ B C o v ( Δ p s , A , Δ p t , B ) E A ≠ B Var ( Δ p s , A ) Var ( Δ p t , B ) , [2] where A and B here are two replicate labels, and for the Barghi et al. (33) data, we use Δ10pt.

We have calculated the convergent correlation for all rows of the replicate covariance matrices. Like temporal covariances, we visualize these through time (Fig. 2 A, Left), with each line representing the convergent correlation from a particular reference generation s as it varies with t (shown on the x axis). In other words, each of the colored lines corresponds to the like-colored row of the convergence correlation matrix (Fig. 2 A, Right). We find that these convergent correlation coefficients are relatively weak and decay very quickly from an initial value of about 0.1 (95% block bootstrap CIs [ 0.094 , 0.11 ] ) to around 0.01 (95% CIs [0.0087, 0.015]) within 20 generations. This suggests that while a substantial fraction of the initial response is shared over the replicates, this is followed by a rapid decay, a result consistent with the primary finding of the original Barghi et al. (33) study: that alternative loci contribute to longer-term adaptation across the different replicates.

(A) The convergence correlations, averaged across Barghi et al. (33) replicate pairs, through time. Each line represents the convergence correlation c o r ( Δ p s , Δ p t ) from a starting reference generation s to a later time t, which varies along the x axis each line corresponds to a row of the temporal convergence correlation matrix depicted on Right (where the diagonal elements represent the convergence correlations between the same time points across replicate populations). We note that convergent correlation for the last time point is an outlier we are unsure as to the cause of this (e.g., it does not appear to be driven by a single pair of replicates). (B) The convergence correlations between individual pairs of replicates in the Kelly and Hughes (38) data (note that the CIs are plotted but are small on this y-axis scale). (C) The convergence correlations between individual pairs of replicates in the data from Castro et al. (39) for the two selection lines (LS1 and LS2) and the control (Ctrl) gray CIs are those using the complete dataset, and blue CIs exclude chromosome 5 (chr5) and chr10, which harbor the two regions Castro et al. (39) found to have signals of parallel selection between LS1 and LS2. Through simulations, we have found that the differences in convergence correlation CI widths between these Drosophila studies and the Longshanks study are due to the differing population sizes.

A benefit of between-replicate covariances is that unlike temporal covariances, these can be calculated with only two sequenced time points and a replicated study design. This allowed us to assess the impact of linked selection in driving convergent patterns of allele frequency change across replicate populations in two other studies. First, we reanalyzed the selection experiment of Kelly and Hughes (38), which evolved three replicate wild populations of Drosophila simulans for 14 generations adapting to a novel laboratory environment. Since each replicate was exposed to the same selection pressure and shared LD common to the original natural founding population, we expected each of the three replicate populations to have positive convergence correlations. We find that all three convergent correlation coefficients between replicate pairs are significant (Fig. 2B) and average to 0.36 ( 95 % CI [ 0.31 , 0.40 ] ). Additionally, we can calculate the proportion of the total variance in allele frequency change from convergent selection pressure, analogous to our G ( t ) , where the numerator is the convergent covariance and the denominator is the total variance (SI Appendix, section S4). We find that 37% of the total variance is due to shared allele frequency changes caused by selection (95% CI [29%, 41%]) these are similar to the convergence correlation since the variance is relatively constant across the replicates.

Next, we reanalyzed the Longshanks selection experiment, which selected for longer tibiae length relative to body size in mice, leading to a response to selection of about 5 SDs over the course of 20 generations (39, 40). This study includes two independent selection lines, Longshanks 1 (LS1) and Longshanks 2 (LS2), and an unselected control line (Ctrl) where parents were randomly selected. Consequently, this selection experiment offers a useful control to test our convergence correlations: we expect to see significant positive convergence correlations in the comparison between the two Longshanks selection lines but not between each of the control and Longshanks line pairs. We find that this is the case (gray CIs in Fig. 2C), with convergence correlations between each of the Longshanks lines and the control not being statistically different from zero, while the convergence correlation between the two Longshanks lines is strong (0.18) and statistically significant (CIs [0.07, 0.25]).

One finding in the Longshanks study was that two major-effect loci showed parallel frequency shifts between the two selection lines. We were curious to what extent our genome-wide covariances were being driven by these two outlier large-effect loci, so we excluded them from the analysis. Since we do not know the extent to which LD around these large-effect loci affects neighboring loci, we took the conservative precaution of excluding the entire chromosomes these loci reside on (chromosomes 5 and 10) and recalculating the temporal covariances. We find that excluding these large-effect loci has little impact on the CIs (blue CIs in Fig. 2C), indicating that these across-replicate covariances are indeed driven by a large number of loci. This is consistent with a signal of selection on a polygenic trait driving genome-wide change, although we note that large-effect loci can contribute to the indirect change at unlinked loci (41, 42).

The presence of an unselected control line provides an alternative way to partition the effects of linked selection and genetic drift: we can compare the total variance in allele frequency change of the control line (which excludes the effect of artificial selection on allele frequencies) with the total variance in frequency change of the Longshanks selection lines. This allows us to estimate the increase in variance in allele frequency change due to selection, which we can further partition into the effects of selection shared between selection lines and those unique to a selection line by estimating the shared effect through the observed covariance between replicates (Materials and Methods and SI Appendix, section S4 have more details). We estimate at least 32% (95% CI [ 21 % , 48 % ] ) of the variance in allele frequency change is driven by the effects of selection, of which 14% (95% CI [ 3 % , 33 % ] ) is estimated to be unique to a selection line, and 17% (95% CI [ 9 % , 23 % ] ) is the effect of shared selection between the two Longshanks selection lines.

We observed that in the longest study we analyzed (33), some genome-wide temporal covariances become negative at future time points (the first two rows in Fig. 1 A, Left). This shows that alleles that were on average going up initially are later going down in frequency (i.e., that the average direction of selection experienced by alleles has flipped). This might reflect either a change in the environment or the genetic background, due to epistatic relationships among alleles altered by frequency changes (which can occur during an optima shift ref. 43) or recombination breaking up selective alleles. Such reversals in selection dynamics could be occurring at other time points, but the signal of a change in the direction of selection at particular loci may be washed out when we calculate our genome-wide average temporal covariances. To address this limitation, we calculated the distribution of the temporal covariances over 100-kb windowed covariances (Fig. 3 shows these distributions pooling across all replicates, and SI Appendix, Fig. S26 shows individuals replicates). The covariance estimate of each genomic window will be noisy, due to sampling and genetic drift, and the neutral distribution of the covariance is complicated due to LD, which can occur over long physical distances in evolve-and-resequence and selection studies (44, 45). To address this, we have developed a permutation-based procedure that constructs an empirical neutral null distribution by randomly flipping the sign of the allele frequency changes in each genomic window (i.e., a single random sign flip is applied to all loci in a window). This destroys the systematic covariances created by linked selection and creates a sampling distribution of the covariances spuriously created by neutral genetic drift while preserving the complex dependencies between adjacent loci created by LD. This empirical neutral null distribution is conservative in the sense that the variances of the covariances are wider than expected under drift alone, as selection not only creates covariance between time intervals but also, inflates the magnitude of allele frequency change within a time interval. We see (Fig. 3 A and B) that there is an empirical excess of windows with positive covariances between close time points compared with the null distribution (a heavier right tail) and that this then shifts to an excess of windows with negative covariances between more distant time points (a heavier left tail).

(A and B) The distribution of temporal covariances calculated in 100-kb genomic windows from the Barghi et al. (33) study plotted alongside an empirical neutral null distribution created by recalculating the windowed covariances on 1,000 sign permutations of allele frequency changes within tiles. The number of histogram bins is 88, chosen by cross-validation (SI Appendix, Fig. S25). In A, windowed covariances C o v ( Δ p t , Δ p t + k ) are separated by k = 2 × 10 generations, and in A, the covariances are separated by k = 4 × 10 generations each k is an off diagonal from the variance diagonal of the temporal covariance matrix (cartoon of upper triangle of covariance matrix in A and B, where the first diagonal is the variance, and the dark gray indicates which off diagonal of the covariance matrix is plotted in the histograms). (C) The lower and upper tail probabilities of the observed windowed covariances, at 20 and 80% quintiles of the empirical neutral null distribution, for varying time between allele frequency changes (i.e., which off diagonal k). The CIs are 95 % block bootstrap CIs, and the light gray dashed line indicates the 20% tail probability expected under the neutral null. Similar figures for different values of k are in SI Appendix, Fig. S27.

We quantified the degree to which the left and right tails are inflated compared with the null distribution as a function of time and see excesses in both tails in Fig. 3C. This finding is also robust to sign-permuting allele frequency changes on a chromosome level, the longest extent that gametic LD can extend (SI Appendix, Fig. S29). We see a striking pattern that the windowed covariances not only decay toward zero but in fact, become negative through time, consistent with many regions in the genome having had a reversed fitness effect at later time points.

Finally, we used forward-in-time simulations to explore the conditions under which temporal and convergent correlations arise. We show a subset of our results for a model of stabilizing selection on a phenotype where directional selection is induced by a sudden shift in the optimum phenotype of varying magnitudes (Fig. 4A). We find that positive temporal covariances are produced by such selection (Fig. 4B) and that these positive temporal covariances can compound together to generate a large proportion of allele frequency change being due to selection [i.e., large G ( t ) ] over the relatively short time periods similar to our analyzed selection datasets span (Fig. 4C). The magnitude of G ( t ) increases with the strength of selection (i.e., the variance in fitness) such that stronger selection generates larger proportions of allele frequency change. We find a similar picture of stronger convergent selection pressures generating larger convergence correlations (Fig. 4D SI Appendix, Fig. S12 shows how other factors impact convergence correlations).

Forward-in-time simulations demonstrate how temporal covariance, G ( t ) trajectories, and convergence correlations arise during optima shifts of two different magnitudes, under GSS. (A) Trait means across 30 replicates before and after optima shifts (solid lines) for two different magnitudes (indicated by color). The new optimal trait values are indicated by the purple and yellow dashed lines. (B) Mean temporal covariance C o v ( Δ p 5 , Δ p t ) across 30 simulation replicates, where t varies along the x axis (points), with a loess-smoothed average (solid lines). (C) G ( t ) trajectories through time for 30 replicate simulations across two optima shifts. The solid lines are loess-smoothed averages. (D) The convergence correlations between two populations (each 1,000 diploids) split from a common population that underwent an optima shift in either the same direction (converge) or opposite directions (diverge) at generation 5. In B–D, directional selection begins at generation 5, when the optima shifts this is indicated by the vertical dashed red lines (SI Appendix, section S8.2 has details on these simulations).

Averaging across replicates, these simulation results show G ( t ) is relatively insensitive to the number of loci underlying the trait. However, if only a small number of loci influence the trait, the G ( t ) trajectories are typically much more stochastic across replicates. This reaffirms that the genome-wide linked selection response we see in the Barghi et al. (33) data is highly polygenic (compare Fig. 1B with SI Appendix, Fig. S6). Furthermore, using our simulations we find that sampling only every 10 generations does indeed mean that our estimates of G ( t ) are an underestimate of the proportional effect of linked selection as they cannot include the covariance between closely spaced generations (SI Appendix, Fig. S14).

Additionally, we explored other modes of selection with simulations. We find that the long-term dynamics of the covariances under directional truncation selection, which generates substantial epistasis, are richer than we see under Gaussian stabilizing selection (GSS) and multiplicative selection (SI Appendix, Fig. S18). We also conducted simulations of purifying selection alone (i.e., background selection) and find that this can also generate positive temporal covariances (SI Appendix, Fig. S16) and under some circumstances, can even generate convergence correlations (SI Appendix, Fig. S17). Thus, it is unlikely that the signatures of linked selection we see are entirely the result of the novel selection pressure the populations are exposed to, and some of this signature may be ongoing purifying selection. Only in the case of the Longshanks experiment does the presence of a control line allow us to conclude that selection that is almost entirely due to the novel selection pressure.

While none of our experiments have selected the populations in divergent directions, in our simulations we find that such selection can generate negative convergent correlations (Fig. 4D). This suggests that selection experiments combining multiple replicates, control lines, as well as divergent selection pressures might be quite informative in disentangling the contribution of particular selection pressures from genome-wide allele frequency changes.

Difference Between Natural Selection and Genetic Drift

Both natural selection and genetic drift lead to evolution process by varying the gene frequency of a population over time. Both these processes are involved in evolution and are not mutually exclusive. However, natural selection is the only process, which selects the best adaptive organism to the environment, and genetic drift reduces the genetic variation.

These variations in genes or alleles are inheritable and genetic variation can be resulted by mutation, gene flow and sex.

Natural Selection

Natural selection is a hypothesis proposed by Darwin, where most adaptive organisms are selected by the environment gradually. Natural selection occurs when individuals are genetically varied, that variation makes some individuals better than others, and those superior traits are heritable.

This process occurs through mutations, which occurs in individuals randomly due to various reasons. Because of these mutations, individual may have the advantage beyond the environmental challenges. Individual with this mutation may have better adaptation to the environment than others. For an example, the superior trait will help to escape from predators running faster than other individuals. They can reproduce more than other individuals and trait will pass to the second generation and the evolving of new species happens. The frequency of the new trait will increase in the genome and this process is called natural selection or survival of the fittest organisms.

Genetic Drift

The variation in allele frequencies within a population due to random sampling is simply called genetic drift or Sewall Wright effect. Due to random sampling, subset of the population is not necessarily a representative of the population. It might be skewed to either direction. Smaller the population, the effect of random sampling causes genetic drift than a larger population. Some alleles become more common while they are being selected over and over again, and some may disappear from the small, isolated populations. This genetic drift or disappearance of the allele is unpredictable (Taylor et al, 1998).

The new generations may be diverge form of the parental population thus resulting either extinction of the population or making more adaptive species to the environment. However, in a large population, this effect can be considered as negligible. Genetic drift does not select the adaptive organism like natural selection.

What is the difference between Natural Selection and Genetic Drift?

• The major difference between natural selection and genetic drift is that the natural selection is a process where more adaptive species are selected in response to the environmental challenges, whereas genetic drift is a random selection.

• Natural selection occurs due to environmental challenges, whereas genetic drift does not occur due to environmental challenges.

• Natural selection ends up with selecting the more successive trait over the detrimental trait, whereas due to genetic drift important alleles may disappear completely.

• Natural selection increases the frequency of the trait more adaptive to the environment, whereas genetic drift rarely results in more adaptive species to the environment.

• Natural selection increases genetic variation, whereas genetic drift does not increase genetic variation compared to natural selection. Some times genetic drift causes some variants to be extinct completely.

Biological Evolution

In biology, the process of evolution is the change in a population's genetic structure over successive generations. Specifically, it is the change in allele frequency over time. The many sub-processes of evolution account for the diversity of life, such as genetic inheritance, which accounts for the continuity of traits, mutation, which accounts for novel traits, and natural selection, which accounts for the environmental filtering of traits.

There are four common mechanisms of evolution. The first mechanism is natural selection, a process in which there is differential survival and/or reproduction of organisms that differ in one or more inherited traits. A second mechanism is genetic drift, a process in which there are random changes to the proportions of two or more inherited traits within a population. A third mechanism is mutation, which is a permanent change in a DNA sequence. Finally, the fourth mechanism is gene flow, which is the incorporation of genes from one population into another.

Evolution may in the long term lead to speciation, whereby a single ancestral species splits into two or more different species. Speciation is visible in anatomical, genetic and other similarities between groups of organisms, geographical distribution of related species, the fossil record and the recorded genetic changes in living organisms over many generations. Speciation stretches back over 3.5 billion years during which life has existed on earth. It is thought to occur in multiple ways such as slowly, steadily and gradually over time or rapidly from one long static state to another.

The scientific study of evolution began in the mid-nineteenth century, when research into the fossil record and the diversity of living organisms convinced most scientists that species evolve.] The mechanism driving these changes remained unclear until the theory of natural selection was independently proposed by Charles Darwin and Alfred Wallace in 1858. In the early 20th century, Darwinian theories of evolution were combined with genetics, palaeontology and systematics, which culminated into a union of ideas known as the modern evolutionary synthesis. The synthesis became a major principle of biology as it provided a coherent and unifying explanation for the history and diversity of life on Earth.

Evolution is currently applied and studied in various areas within biology such as conservation biology, developmental biology, ecology, physiology, paleontology and medicine. Moreover, it has also made an impact on traditionally non-biological disciplines such as agriculture, anthropology, philosophy and psychology.

A scientific model, or theory, explaining this process is called a theory of evolution (ToE). The current widely-accepted theory of evolution is the modern evolutionary synthesis, also called the Neo-Darwinian theory. Sometimes, the theory of evolution is simply shortened to "evolution" (as in, "Evolution explains the diversity of life"). (Evolution Wiki, Wikipedia)

As with other truly revolutionary scientific hypotheses such as Aristarchus', Copernicus' and Galileo's heliocentric cosmology, Newton's universal theory of gravitation, Einstein's theory of relativity, Darwin's explanation of the origin and diversity of life on Earth through natural selection totally transformed our understanding of the natural world and our place in the universe. Indeed, the evolution and transformation of life on Earth through geological time simply cannot be understood except through Darwinian evolution, any more than the correct movement of the celestial bodies can be understood without reference to Galileo, Newton, etc. The extraordinary amount of data collected by the life sciences in the one and half centuries since Darwin published origin of the species, have allowed us to develop an integrated understanding of the evolution of life through various physical, chemical, and biological processes over millions of years. MAK110719

page last modified MAK111009, edited RFVS111214
Google search box courtesy of Ask Dave Taylor

Infer gene frequency within a species over time - Biology

The Modern Synthesis of Genetics and Evolution
Copyright © 1993-1997 by Laurence Moran
[Last Update: January 22, 1993]

any people do not understand current ideas about evolution. The following is a brief summary of the modern consensus among evolutionary biologists.

The idea that life on Earth has evolved was widely discussed in Europe in the late 1700's and the early part of the last century. In 1859 Charles Darwin supplied a mechanism, namely natural selection, that could explain how evolution occurs. Darwin's theory of natural selection helped to convince most people that life has evolved and this point has not been seriously challenged in the past one hundred and thirty years.

It is important to note that Darwin's book "The Origin of Species by Means of Natural Selection" did two things. It summarized all of the evidence in favor of the idea that all organisms have descended with modification from a common ancestor, and thus built a strong case for evolution. In addition Darwin advocated natural selection as a mechanism of evolution. Biologists no longer question whether evolution has occurred or is occurring. That part of Darwin's book is now considered to be so overwhelmingly demonstrated that is is often referred to as the FACT of evolution. However, the MECHANISM of evolution is still debated.

We have learned much since Darwin's time and it is no longer appropriate to claim that evolutionary biologists believe that Darwin's theory of Natural Selection is the best theory of the mechanism of evolution. I can understand why this point may not be appreciated by the average non-scientist because natural selection is easy to understand at a superficial level. It has been widely promoted in the popular press and the image of "survival of the fittest" is too powerful and too convenient.

During the first part of this century the incorporation of genetics and population biology into studies of evolution led to a Neo-Darwinian theory of evolution that recognized the importance of mutation and variation within a population. Natural selection then became a process that altered the frequency of genes in a population and this defined evolution. This point of view held sway for many decades but more recently the classic Neo-Darwinian view has been replaced by a new concept which includes several other mechanisms in addition to natural selection. Current ideas on evolution are usually referred to as the Modern Synthesis which is described by Futuyma

  1. It recognizes several mechanisms of evolution in addition to natural selection. One of these, random genetic drift, may be as important as natural selection.
  2. It recognizes that characteristics are inherited as discrete entities called genes. Variation within a population is due to the presence of multiple alleles of a gene.
  3. It postulates that speciation is (usually) due to the gradual accumulation of small genetic changes. This is equivalent to saying that macroevolution is simply a lot of microevolution.

The major controversy among evolutionists today concerns the validity of point #3 (above). The are many who believe that the fossil record at any one site does not show gradual change but instead long periods of stasis followed by rapid speciation. This model is referred to as Punctuated Equilibrium and it is widely accepted as true, at least in some cases. The debate is over the relative contributions of gradual versus punctuated change, the average size of the punctuations, and the mechanism. To a large extent the debate is over the use of terms and definitions, not over fundamentals. No new mechanisms of evolution are needed to explain the model.

Some scientists continue to refer to modern thought in evolution as Neo-Darwinian. In some cases these scientists do not understand that the field has changed but in other cases they are referring to what I have called the Modern Synthesis, only they have retained the old name.

Watch the video: Αλληλόμορφα Γονίδια-Αλεξάνδρα Κοκκίνου (May 2022).


  1. Faejar

    I apologise, but, in my opinion, you are not right. I can defend the position. Write to me in PM, we will discuss.

  2. Tolar

    I can suggest to come on a site where there are many articles on a theme interesting you.

  3. Payton

    Everything is buttered.

  4. Kasia

    It hurt him! It got to him!

  5. Fontane

    Magnificent thought

  6. Dugar

    Sorry that I cannot take part in the discussion right now - there is no free time. But I will return - I will definitely write what I think on this issue.

  7. Lahthan

    I mean you are not right. Enter we'll discuss. Write to me in PM.

Write a message