Information

Fixation rate at neutral loci

Fixation rate at neutral loci


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

It is a classical result that the expected time for a neutral mutation to occur and to get fixed is $2 N mu frac{1}{2N} = mu$, where $N$ is the population size and $mu$ is the neutral mutation rate (wiki).

I am not sure I understand what this actually mean. What is the probability of (at least) one fixation event to occur in $frac{1}{mu}$ generations? Is it 0.5? Could we calculate the probability that (at least) one fixation event occurs in $n$ generations? For example after $n=frac{mu}{10}$ generations, is the probability of observing (at least) one fixation event equal to $0.1$ or is it more complicated?

UPDATE

Following @DanielWeissman comments:

Let's consider we are starting with an initially monomorphic population (population of clones). In addition to that, let's consider that the population is relatively small $Nmu < 1$


First of all, the $mu$ is not expected time for a mutation to occur and get fixed; it is the rate at which mutations are fixed in the population. The basic result states that if neutral mutations arise at a locus at rate $mu$ within individuals, mutations at this locus will be fixed in the population at rate $mu$ as well.

The expected time for a given neutral mutation to fix after arising, given that it is not lost from the population, is approximately $4N_e$, where $N_e$ is the effective population size. Kimura and Ohta (1969) derive this result using a diffusion approximation; Kingman (1982) develops coalescent theory and in the process obtains what in my opinion is a more elegant derivation of the same approximation.

Now what is the probability that some mutation goes to fixation in the population as a function of time? Even though the fixation rate in the population is $mu$, this does not mean that with probability 0.5 some mutation fixes within $1/mu$ time units. That probability will depend on the distribution of fixation events in the population. Here's a good way to think about it. If fixation events were evenly spaced and occurred at rate $mu$, then with probability 1 you would have a mutation within an interval of length $1/mu$. If fixation events were highly clumped, then you would have a very low probability of having a fixation event within an interval of length $1/mu$ -- but if you did have one event, you would probably have many. If fixation events are distributed as a Poisson process--which I suspect they would be in a purely neutral model--the probability of having a fixation event within a time interval $1/mu$ would be given by a (negative) exponential distribution and thus the probability of having at least one fixation event in a time interval of length $1/mu$ would be $1-frac{1}{e}$.


Fixation Strategies and Formulations Used in IHC Staining

Fixation plays four critical roles in immunohistochemistry:

  • It preserves and stabilizes cell morphology and tissue architecture
  • It inactivates proteolytic enzymes that could otherwise degrade the sample
  • It strengthens samples so that they can withstand further processing and staining
  • It protects samples against microbial contamination and possible decomposition.

The right fixation method requires optimization based on the application and the target antigen to be stained. This means that the optimal fixation method may have to be determined empirically. Common methods of fixation include:

  • Perfusion: Tissues can be perfused with fixative following exsanguination and saline perfusion to allow rapid fixation of entire organs.
  • Immersion: Samples are immersed in fixative which then diffuses into and through the tissue or cell sample. Immersion is often combined with perfusion to ensure thorough fixation throughout the tissue.
  • Freezing: Samples with antigens that are too labile for chemical fixation or exposure to the organic solvents used for de-paraffinization can be embedded in a cryoprotective embedding medium, such as optimal cutting temperature (OCT) compound, and then snap-frozen and stored in liquid nitrogen.
  • Drying: Blood smears for ICC staining are air-dried and waved across a flame to heat-fix the cells to the slide.

While a particular fixative may preserve the immunoreactivity of one antigenic epitope, it may destroy others, even if they are on the same antigen. The guidelines provided here are helpful in determining the appropriate fixative for a particular system, but it is important to remember that each antigen is unique. Therefore, the following considerations should be addressed when choosing a fixative:

  • Type of fixative (formaldehyde, glutaraldehyde, organic solvent, etc.)
  • Rate of penetration and fixation
  • Fixative concentration
  • Fixative pH
  • Ideal Fixation Temperature
  • Post-fixation treatment

Learn more

Learn more

Chemical fixatives crosslink or precipitate sample proteins, which can mask target antigens or prevent antibody accessibility to the tissue target after prolonged fixation. No single fixative is ideal for all tissues, samples or antigens. This means that each fixation procedure must be optimized to assure adequate fixation without altering the antigen or disturbing the endogenous location and the cellular detail of the tissue.

Physical fixation is an alternate approach to prepare samples for staining, and the specific method depends on the sample source and the stability of the target antigen. For example, blood smears are usually fixed by drying, which removes the liquid from the sample and fixes the cells to the slide. Tissues that are too delicate for the rigorous processing involved with paraffin removal and antigen retrieval are first embedded in cryoprotective embedding medium, such as OCT compound, and then snap-frozen and stored in liquid nitrogen until they are sectioned. The example below provides an example of IHC staining in formalin fixed tissue.

IHC was performed on a formalin fixed, paraffin embedded (FFPE) human colon cancer tissue section. To expose target proteins, heat-induced epitope retrieval (HIER) was performed using 10 mM sodium citrate buffer, pH 6, (e.g., 00-5000, AP-9003-125) for 20 min by heating at 95°C. Following antigen retrieval, cooling to room temperature and washing, tissues were blocked in 3% BSA (Product # 37525) in PBST for 30 min at room temperature and then probed with an Ezrin monoclonal antibody (Product # MA5-13862) at a dilution of 1:100 for 1 h in a humidified chamber. Tissues were washed extensively with PBS/0.025% Tween-20 (Product # 003005) and endogenous peroxidase activity was quenched with Peroxidase Suppressor (Product # 35000) for 30 min at room temperature. Detection was performed using an HRP-conjugated goat anti-mouse IgG-HRP secondary antibody (Product # 31430) at a dilution of 1:500 followed by colorimetric detection using Metal Enhanced DAB Substrate Kit (Product # 34065). Images were taken on a light microscope at 40X magnification.

Formaldehyde

The most widely used chemical fixative is formaldehyde, which shows broad specificity for most cellular targets. The water-soluble, colorless, toxic, and pungent gas reacts with primary amines on proteins and nucleic acids to form partially-reversible methylene bridge crosslinks

Formaldehyde and paraformaldehyde

Most commercial formaldehyde is prepared from paraformaldehyde (PFA, polymeric formaldehyde) dissolved in distilled/deionized water, with up to 10% (v/v) methanol added to stabilize the aqueous formaldehyde. Stabilization is important to prevent oxidation of the formaldehyde to formic acid and its eventual re-polymerization to paraformaldehyde. To avoid using methanol-stabilized formaldehyde for fixation, many protocols recommend making “fresh” formaldehyde from paraformaldehyde immediately before sample fixation.

Formalin vs. formaldehyde

The terms “formalin” and “formaldehyde” are often used interchangeably, although the chemical composition of each fixative is different. Formalin is made with formaldehyde but the percentage denotes a different formaldehyde concentration than true formaldehyde solutions. For example, 10% neutral-buffered formalin (NBF, or simply formalin) is really a 4% (v/v) formaldehyde solution. The basis for this difference is that historically, formalin was prepared with commercial-grade stock formaldehyde, which was 37 to 40% (w/v) formaldehyde, by diluting it 1:10 with phosphate buffer at neutral pH

Learn more

Learn more

Glutaraldehyde

Glutaraldehyde is a dialdehyde compound that reacts with amino and sulfhydryl groups and possibly with aromatic ring structures. Fixatives containing glutaraldehyde are stronger protein crosslinkers than formaldehyde. However, they penetrate tissue more slowly, causing extraction of soluble antigens and modification of the tissue architecture. Tissues that have been fixed with a glutaraldehyde-based fixative must be treated or quenched with inert amine-containing molecules prior to the IHC staining because any free, unsaturated aldehyde groups that are available will react covalently with amine-containing moieties such as antibodies (Schiff base formation). The most efficient aldehyde blockers/quenchers are ethanolamine and lysine.

Other fixatives

Mercuric chloride-based fixatives are sometimes used as alternatives to aldehyde-based fixatives to overcome poor cytological preservation. These harsh fixatives work by reacting with amines, amides, amino acids like cysteine, and phosphate groups in proteins and nucleic acids. The result is protein and nucleic acid coagulation, which can lead to undesirable tissue hardening. The benefits of using these fixatives are more intense IHC staining accompanied by the preservation of cytological detail allowing for easier morphological interpretation. These fixatives often include neutral salts containing zinc to maintain tonicity and they can be mixed with other fixatives to provide a balanced, less harsh formulation. Mercuric chloride-based fixatives include Helly and Zenker's Solution. One disadvantage of mercury-containing fixatives is that sections must be cleared of mercury deposits before IHC staining. The main disadvantage of these mercury based fixatives is that they are highly toxic, corrosive, and they require special disposal procedures. For this reason, they are not used frequently any more.

Precipitating fixatives include ethanol, methanol and acetone. These solvents precipitate and coagulate large protein molecules, thereby denaturing them, and can be good for cytological preservation. Such reagents can also permeabilize cells, which may be critical depending on the sample. However, acetone, in particular, extracts lipids from cells and tissues, which can adversely affect morphology. Despite this fact, acetone is usually used as a post-fixative for frozen sections that have already been bound to slides. In contrast, the solvent fixatives are not appropriate for electron microscopy because they can cause severe tissue shrinkage.

Diimidoester fixation using dimethyl suberimidate (DMS), an amine-reactive crosslinker, is a rarely-used alternative to aldehyde-based fixation (Hassel, J. et al., 1974). DMS is a homobifunctional reagent which crosslinks the α and ε-amino groups of proteins to each other. Diimidoesters are unique in that they create amidine linkages with the amines on the target molecules. As a result, DMS does not change the net charge of the protein. The advantages of using DMS as a fixative for both light and electron microscopy include retention of immunoreactivity of the antigen and the lack of aldehyde groups that require blocking.

There are a variety of other fixatives that are used in special situations. These include acrolein and glyoxal, which are similar to formaldehyde, and osmium tetroxide, which is particularly well-suited as a fixative prior to electron microscopy. Other specialty fixative include carbodiimide and other protein crosslinkers, zinc salt solutions, picric acid, potassium dichromate, and acetic acid.


Neutral Theory and Substitution rates

-
There seems to be some confusion about what the neutral theory says about substitution rates. NickM seems to think that if the mutation rate is 1.1x10^-8 then 40 (neutral) mutations will occur with each birth and 40 will become fixed in each generation because rate of substitution = mutation rate.

NickM gives a wikipedia reference for a mutation rate of 1.1 x 10^-8.

Anyhow -- another thing you obviously don't understand at all is that even under completely neutral conditions, with no natural selection acting at all, and with nothing but genetic drift going on, *the substitution rate equals the mutation rate*.

If the genome size is 3.2 billion bases, if the human-chimp divergence time is 6 my ago, and if the generation size is 15 years, 1% divergence in point mutations takes 32 million mutations. That's 40 mutations/generation.

What NickM doesn't seem to realize is the rate of substitution = 1.1 x 10^-8, not 40 mutations/ generation. The 40 is the number of mutations we can expect in each birth given a genome of 3.2 bp.

Neutral theory pertains to the mutation rate only, not the number of mutations. What is your degree in?

Ya see Nick in order to become fixed every member in the population has to haz it. Every member, biology 101. Well I guess if the population size is one.

60 Comments:

"Ya see Nick in order to become fixed every member in the population has to haz it"

". fixed effects (mutations escaping drift loss)"

But feel free to cite support for your assertion.

You are an ignorant fuck, Richie:

In population genetics, fixation is the change in a gene pool from a situation where there exist at least two variants of a particular gene (allele) to a situation where only one of the alleles remains.

The fixation probability, the probability that the frequency of a particular allele in a population will ultimately reach unity, is one of the cornerstones of population genetics.

Wow, more silliness. Neutral theory is about many things, but one of the most important topics is the substitution rate, and how it is connected to the mutation rate.

If, say, 40 mutations are happening per individual per generation (which is about what you get if you take mutations-per-site-per-generation times genome length in sites), this is happening in *each individual in the population*.

Thus, a huge number of mutations are happening *in the population* each generation. If effective population size was 10,000, this would be 40 x 10,000 = 400,000 new mutations added to the population each generation.

In a neutral situation, the vast majority of these new mutations are eventually lost from the population by neutral drift.

But, by chance, a few of them will spread to fixation. The odds are essentially the same as the odds of winning the lottery. Each mutation's chance of spreading to fixation is 1/(population size). (I'm using the population size of the haploid loci, if you use diploid population as N then it's 2N).

So, most neutral mutations die out, but a few spread to fixation. On average, it works out that the substitution rate equals the mutation rate.

So, a huge number of substitutions can happen without selection going on at all.

Or, shorter version:
http://www.stat.berkeley.edu/users/terry/Classes/s260.1998/Week13a/week13a/node10.html

If, say, 40 mutations are happening per individual per generation (which is about what you get if you take mutations-per-site-per-generation times genome length in sites), this is happening in *each individual in the population*.

yes, but not the same 40 mutations. In order for those 40 mutations to be fixed every individual, even the parents and grandparents, would have to have them.

That is the meaning of being fixed- UNITY- which means all have to have it.

Thus, a huge number of mutations are happening *in the population* each generation. If effective population size was 10,000, this would be 40 x 10,000 = 400,000 new mutations added to the population each generation.

Yup but to become fixed every individual has to have it.

But, by chance, a few of them will spread to fixation.

maybe and maybe not. The question is how long will it take and how many will become fixed?

Each mutation's chance of spreading to fixation is 1/(population size). (I'm using the population size of the haploid loci, if you use diploid population as N then it's 2N).

Not for neutral mutations. Then it is the mutation rate.

Also if it has a 1/N chance of becoming fixed that means it has a (N - 1/N) chance of getting lost. And taht doesn't even take into account any random effects that would wipe out all mutations.

So, a huge number of substitutions can happen without selection going on at all.

How many and how long does it take?

With a mutation rate of 1.1 x 10^-8 that says quite a long, long time.

OK NickM- lookie what I found:

For neutral alleles that do fix, it takes an average of 4N generations to do so.

So what's the problem again, JoeG? None of the links you have just posted have contradicted anything I said.

You claimed there wasn't enough time to get the observed amount of difference between the chimp and human genomes.

First I pointed out that the "10%" difference number comes from including things like indels, which happen in big chunks rather than in point mutations. These required only an indel substitution every

15 generations, not some impossible number.

Then we turned to the point mutations (the 1% difference), and I pointed out that the observed amount of substitution difference between chimp and human approximately equals what we should expect given the time of the chimp-human split and the observed mutation rate. Nothing you have posted since has contradicted this.

Just admit that your original claim was wrong. That's the scientific way.

Or read p. 239 here: http://books.google.com/books?id=ng85sd1UR7EC&lpg=PA72&ots=pqL73i-4iH&dq=4N%20generations&pg=PA239#v=onepage&q=substitution%20rate&f=false

NickM:
So what's the problem again, JoeG? None of the links you have just posted have contradicted anything I said.

You said that there would be a number of mutations that become fixed in one/ each generation. That is obviously false

You claimed there wasn't enough time to get the observed amount of difference between the chimp and human genomes.

There is if you rely on design. There isn't if you rely on anything else.

First I pointed out that the "10%" difference number comes from including things like indels, which happen in big chunks rather than in point mutations.

Something I have known for years.

These required only an indel substitution every

15 generations, not some impossible number.

More like 10 generations and there isn't anything to support a fixation rate that high. And that is just for indels, which is a fraction of the differences that need to be accounted for.

Then we turned to the point mutations (the 1% difference), and I pointed out that the observed amount of substitution difference between chimp and human approximately equals what we should expect given the time of the chimp-human split and the observed mutation rate. Nothing you have posted since has contradicted this.

Everything I posted has contradicted that.

4N generations to get 1 neutral mutation fixed (estimate), where N is the population size.

Haldane published 1 in 300 generations and experiments have it at over 600 generations for BENEFICIAL mutations which will become fixed more readily than any neutral mutation.

So either you don't understand anything I have posted, which is pretty obvious because of your strawman you refuted when you first arrived, your avoidance of the rest of the mutations beyond indels, and your obvious fuck-up on the math.

And now we have your reliance on never heard of substitution rates for the indels.

But yes I will revise my original claim to say:

Given a 10%+ genetic difference between chimps and humans some number of mutations will have to become fixed each year.

(And all I need is one-
all I need is one-
all I need is 1, 1.
1 is all I need)

Ah, I see your problem. You think that one mutation (indel or point mutation) has to originate, then spread to fixation, then the next one has to repeat the same process.

This would indeed be slow.

Unfortunately for you, many, many mutations can be drifting in a population at the same time. All of them, in fact. First, we have 23 pairs of chromosomes, and the frequency of any variant in the chromosome 1 population can go up or down independent of what happens to the frequency of any variant in the chromosome 2 population.

Second, the chromosomes all experience many crossing-over events each meiosis, so really each chromosome is divided up into many small segments which segregate independently and get recombined each generation.

4N generations to get 1 neutral mutation fixed (estimate), where N is the population size.

That's true, but that's just for one mutation, and just for the mutations that are lucky enough to be fixed (the vast majority are never fixed, and are instead lost, thus their "time to fixation" is infinity).

In reality we have something like (20 new mutations per individual) * (N = population size) entering the population every single generation. Most are lost, a few are fixed. The average rate at which the population fixes mutations is equal to mu, the mutation rate.

You are talking about something different, the average time to fixation for a single mutation which, after the fact, we know has fixed. Doesn't effect my point, nor your wrongness about your overall claim.

YOU are the problem. You require accumulations of mutations and they have to culiminate in the individuals of today.

320,000,000+ ACCUMULATED differences.

How do mutations accumulate if they don't become fixed first?

Once lost it can no longer accumulate.

Kimura, Population genetics, molecular evolution, and the neutral theory: "If the mutant is selectively neutral, the probability of ultimate fixation is equal to its initial frequency, that is u=1/(2N) in the diploid population, and therefore, from Eq. (1), we have Ks=v. In other words, for neutral alleles, the rate of evolution is equal to the mutation rate."

It's a very simple derivation. A simple example should suffice. Consider a haploid population, size N) with one neutral mutation on average per individual (say a genome of 10^8 and a mutation rate of 10^-8). The chance of fixation is the inverse of the population size, or 1/N. But we have N mutations per generation. The Expected Rate of fixation is therefore N*1/N. That is, one mutation will fix on average per generation.

Zacho:
That is, one mutation will fix on average per generation.

How is that even possible given a population over 3?

If the offspring, ie the next generation, gets a new mutation, guess what? The parents do not have it so it cannot become fixed until they die. And then it becomes fixed if the next generation has a population of 1.

If the mutation rate is 1.1 x 10-8 then that is the number of mutations, and since that is not a whole number and is much less than 1, you are obviously clueless.

But thanks for the entertainment.

Talk Origins has a neutral mutation becoming fixed every 4N generations (where N = population size).

Relevant references (from genetic drift- see above) that support my claim, which is really the evolutionary biology claim:

Hedrick, Philip W. (2004). Genetics of Populations, 737, Jones and Bartlett Publishers.

Daniel Hartl, Andrew Clark (2007). Principles of Population Genetics, 4th edition

Wen-Hsiung Li, Dan Graur (1991). Fundamentals of Molecular Evolution, Sinauer Associates.

Kimura, Motoo (2001). Theoretical Aspects of Population Genetics, 232, Princeton University Press.

Masel J, King OD, Maughan H (2007). The loss of adaptive plasticity during long periods of environmental stasis. American Naturalist 169 (1): 38–46.

Joe G: How is that even possible given a population over 3?

The chance of fixation is 1/N, so if the population is 100, then the chance that it will eventually reach fixation is 1/100. However, given our example, 100 mutations are occurring in each generation. So we can expect, on average that 100* 1/100 or one will reach eventually reach fixation. Each generation.

Joe G: Talk Origins has a neutral mutation becoming fixed every 4N generations (where N = population size).

That's the *time* to fixation. With our example, 4N generations ago, an average of 100 mutations occurred, 1/100 of which are now reaching fixation. On average, of course.

Zacho:
The chance of fixation is 1/N, so if the population is 100, then the chance that it will eventually reach fixation is 1/100. However, given our example, 100 mutations are occurring in each generation. So we can expect, on average that 100* 1/100 or one will reach eventually reach fixation. Each generation.

That is just plain stupid. If 100 different mutations are occurring how can one reach fixation in one generation?

You have no idea what you are talking about and it still shows.

Talk Origins has a neutral mutation becoming fixed every 4N generations (where N = population size).

That's the *time* to fixation

No, that is the number of generations. So with a population of 100 that would mean it would take 400 generations for one neutral mutation to reach fixation.

Obviously you are still clueless when it comes to math.

But please feel free to produce some positive evidence for your claim.

Joe G: That is just plain stupid. If 100 different mutations are occurring how can one reach fixation in one generation?

They don't get fixed in one generation. About one in a hundred get fixed after 4N generations. on average. However, new mutations are constantly occuring. The ones being fixed in the current generation occurred 4N generations ago, on average.

Joe G: So with a population of 100 that would mean it would take 400 generations for one neutral mutation to reach fixation.

That's right. On average Only one of the 100 would reach fixation, and it would take 400 generations, on average for that to occur. As mutations are being introduced in every generation, that means there is a constant stream of mutations becoming fixed.

Let's leave aside the on average bit, and assume exactly 100 neutral mutations per generation in a population of 100 and a fixation time of 400 generations. In generation 0, we have 100 mutations, over time 99 will be lost, but one will fix in generation 400. In generation 1, we have 100 mutations, over time 99 will be lost, but one will fix in generation 401. In generation 2, we have 100 mutations, over time 99 will be lost, but one will fix in generation 402. And so on. There will be one mutation becoming fixed in each generation, perpetually.

Zacho:
They don't get fixed in one generation.

Zacho earlier:
That is, one mutation will fix on average per generation.

Zacho:
About one in a hundred get fixed after 4N generations.

Zacho:
The ones being fixed in the current generation occurred 4N generations ago, on average.

If there even was 4N generations.

So with a population of 100 that would mean it would take 400 generations for one neutral mutation to reach fixation.

OK so you agree with what I said. What is your point?

As mutations are being introduced in every generation, that means there is a constant stream of mutations becoming fixed.

No. That just means there is a constant stream of variation.

Again it is clear that you have no idea what you are talking about.

In generation 0, we have 100 mutations, over time 99 will be lost, but one will fix in generation 400.

Too many unknowns to make that claim. The equations do not take into account the every day stresses of the real world. And then there is population growth and competetion with advantageous traits.

But anyway seeing that you are so confident in your spewage perhaps you should try to get it published. Then maybe someone will care.

Joe G: The equations do not take into account the every day stresses of the real world.

Given the said model of neutral mutations, the rate of fixation of new neutral mutations is equal to the mutation rate. It's a simple and direct result.

Joe G: But anyway seeing that you are so confident in your spewage perhaps you should try to get it published. Then maybe someone will care.

It's already been published, by someone named Motoo Kimura.

Kimura, Population genetics, molecular evolution, and the neutral theory: "If the mutant is selectively neutral, the probability of ultimate fixation is equal to its initial frequency, that is u=1/(2N) in the diploid population, and therefore, from Eq. (1), we have Ks=v. In other words, for neutral alleles, the rate of evolution is equal to the mutation rate."

Zachriel:
Given the said model of neutral mutations, the rate of fixation of new neutral mutations is equal to the mutation rate. It's a simple and direct result.

Except it isn't so simple and not even close to being direct.

Mutation rates vary and they are all very, very small numbers, ie 10^-8.

Apparently you have no idea what that means.

But anyway seeing that you are so confident in your spewage perhaps you should try to get it published. Then maybe someone will care.

Zachriel:
It's already been published, by someone named Motoo Kimura.

Unfortunately for you he never says anything about serial fixation of neutral muations after some threshold has been crossed.

IOW Zacho you are a pretentious liar.

substitution rate = mutation rate

However the mutation rate for neutral muations is unknown which means the substitution rate is also unknown.

"Thus, the rate of fixation for a mutation not subject to selection is simply the rate of introduction of such mutations."

Wow check out troy, linking to shit that we have already discussed and agreed upon.

However the mutation rate for neutral muations is unknown which means the substitution rate is also unknown.

All that said theoretical musings are fine but sooner or later there needs to be some confirming experimental evidence, which is still lacking for the neutral theory pertaining to the fixation of neutral mutations.

Joe G: Except it isn't so simple and not even close to being direct. Mutation rates vary and they are all very, very small numbers, ie 10^-8. Apparently you have no idea what that means.

It means that *if* we have a genome of length 10^8 and a mutation rate of 10^-8 per site per generation, then the expected value is one mutation per genome per generation. If there are 100 genomes in the population, then the expected value is 100 mutations across the population per generation. From your original post:

Joe G: What NickM doesn't seem to realize is the rate of substitution = 1.1 x 10^-8, not 40 mutations/ generation. The 40 is the number of mutations we can expect in each birth given a genome of 3.2 bp.

The rate of substitution is equal to the rate of neutral mutations. That means if the rate is 40 neutral mutations per birth per generation, then the expect value is 40 (previously occurring) mutations becoming fixed across the population in each generation. As Nick said.

Zacho:
That means if the rate is 40 neutral mutations per birth per generation, then the expect value is 40 (previously occurring) mutations becoming fixed across the population in each generation.

Wow, you are either really stupid or really ignorant.

40 is the number of mutations, not the rate.

All that said theoretical musings are fine but sooner or later there needs to be some confirming experimental evidence, which is still lacking for the neutral theory pertaining to the fixation of neutral mutations.

It is amazing what evotards will believe if they think it supoorts their claims.

BTW Zacho- you need to know the neutral mutation rate because the mutation rate includes all kinds, neutral, deletrious and beneficial. And that does not wash with the equation.

Joe G: 40 is the number of mutations, not the rate.

If the rate is 1.1e-8 per base per generation, then if the genome has a length of 3.2e9 bases, the rate is

40 per genome per generation.

Is there a reason why all our comments go into moderation?

Zacho:
If the rate is 1.1e-8 per base per generation, then if the genome has a length of 3.2e9 bases, the rate is

40 per genome per generation.

Wrong- you are confusing the number with the rate.

Not only that you don't have any experimental evidence to support that number.

So you need to start there. And if you don't have said experimental evidence then you don't have anything.

Is there a reason why all our comments go into moderation?

Zachriel: If the rate is 1.1e-8 per base per generation, then if the genome has a length of 3.2e9 bases, the rate is

40 per genome per generation.

Joe G: Wrong- you are confusing the number with the rate.

1.1e-8 mutations per base per generation is a rate. 4e1 per genome per generation is a rate.

Why are you refusing to provide experimental support for your claims?

What is your evidence that there are 40 neutral mutations per genome per generation?

Why are you refusing to provide experimental support for your claims?

What is your evidence that there are 40 neutral mutations per genome per generation?

Now that you finally seem to get that 1.1 x 10^-8 mutations PER BASE PER BIRTH times 3.2 x 10*9 bases in the human genome = 35.2 mutations PER GENOME PER BIRTH, and that *both* of these are rates, we can discuss your above question.

1. The 1.1 x 10^-8 mutations number came from experimental data, see the wikipedia page for the reference. There are several common experiments to determine mutation rates, e.g.:

(a) sequence the parents and the offspring, and count the differences

(b) sequence a man and the DNA of some of his sperm cells (popular back in the day due to the commonness of said sperm cells)

2. Now, how many of these mutations are neutral? Answer: almost all of them, since only a few percent of the genome codes for genes or gene regulation, and even in genes most point mutations are neutral because of the redundancy of the genetic code.

There is some evidence that some "neutral" mutations (like the third position in DNA codons) might not be exactly completely selectively neutral some of them might be very slightly beneficial or detrimental. But selection can only "see" mutations with a certain minumum selective value (the value is something like s must be greater than 1/N, I don't remember exactly). So if the selective disadvantage of a mutation is only 1/10,000, it would be effectively neutral in a species like humans with a historic effective population size of something like 1/10,000. And, anything with an s anywhere close to 1/10,000 would also act nearly neutral (i.e. genetic drift would have a big impact on its fate, compare to selection).

NickM:
1. The 1.1 x 10^-8 mutations number came from experimental data, see the wikipedia page for the reference.

2. Now, how many of these mutations are neutral? Answer: almost all of them, since only a few percent of the genome codes for genes or gene regulation, and even in genes most point mutations are neutral because of the redundancy of the genetic code.

Untestable gibberish as there is no way to A) test what % codes for necessary stuff and B) we know of many alleged silent mutations that have an impact- that is due to the way the coding works.

That said you need some experimental data, or real-world data, to support your claim of substitution rates.

You don't have that or for some reason you refuse to present it.

So until you do you don't have anything but a bald assertion.

NickM: But selection can only "see" mutations with a certain minumum selective value (the value is something like s must be greater than 1/N, I don't remember exactly).

That's right. A mutation will be effectively neutral if |s| << 1/(2N), diploid.

And still refusing to provide experimental support for your claims.

Joe G: And still refusing to provide experimental support for your claims.

We didn't make any experimental claims. Rather, we answered a question about how neutral mutations would spread in a population given a certain rate of neutral mutation.

Zacho:
Rather, we answered a question about how neutral mutations would spread in a population given a certain rate of neutral mutation.

Your "answer" is unsupported, meaning it isn't an answer at all.

BTW the "how" part is just by chance, so you didn't answer how.

Zacho:
That means if the rate is 40 neutral mutations per birth per generation, then the expect value is 40 (previously occurring) mutations becoming fixed across the population in each generation.

Pure bullshit- meaning without experimental support. And until you get experimental support it will be pure bullshit.

So either have at it or take a trip down the Shenandoah and out to sea, if you catch my drift.

Joe G: Your "answer" is unsupported, meaning it isn't an answer at all.

Huh? Not only did we explain why the substitution rate is equal to the mutation rate, but we directly cited Kimura. Here it is again:

Kimura, Population genetics, molecular evolution, and the neutral theory: "If the mutant is selectively neutral, the probability of ultimate fixation is equal to its initial frequency, that is u=1/(2N) in the diploid population, and therefore, from Eq. (1), we have Ks=v. In other words, for neutral alleles, the rate of evolution is equal to the mutation rate."

Way to prove your cowardly nature by not even addressing what I post.

Zacho:
Not only did we explain why the substitution rate is equal to the mutation rate, but we directly cited Kimura.

Unfortunately for you imura doesn't support your claim of:

That means if the rate is 40 neutral mutations per birth per generation, then the expect value is 40 (previously occurring) mutations becoming fixed across the population in each generation.

You are a lying bluffer. Which means what you aee saying is pure bullshit- meaning without experimental support. And until you get experimental support it will be pure bullshit.

Look, all I am asking for is evidence to support your claim:

That means if the rate is 40 neutral mutations per birth per generation, then the expect value is 40 (previously occurring) mutations becoming fixed across the population in each generation.

I have provided references that say 4N generations for 1.

NickM started out with a population of 10,000. Do you understand what that means?

Joe G: NickM started out with a population of 10,000. Do you understand what that means?

Yes, it means the 40 or so neutral mutations reaching fixation today occurred 40,000 generations ago on average.

What is the empirical evidence to support your claim?

Joe G: There seems to be some confusion about what the neutral theory says about substitution rates.

Joe G: And how can we tell? What is the empirical evidence to support your claim?

It's not an empirical claim, but an entailment of the neutral theory model of evolution you introduced in your original post.

Zacho:
It's not an empirical claim, but an entailment of the neutral theory model of evolution you introduced in your original post.

I didn't introduce the neutral theory. Kimura did.

And how was it determined that it is an entailment without any empirical support?

Joe G: I didn't introduce the neutral theory. Kimura did.

You brought up neutral theory in your original post.

Joe G: And how was it determined that it is an entailment without any empirical support?

A theory is a model, in this case, a model of how neutral mutations will propagate through a population. Given that some mutations are neutral (such as synonymous substitutions or bases in a pseudogene), their statistical distribution can be predicted. In particular, the rate of fixation of neutral mutations will be equal to the neutral mutation rate.

Joe G: And how was it determined that it is an entailment without any empirical support?

Do you know what "entailment" means?

Zacho:
Do you know what "entailment" means?

Yes I do. So how do you know what you said is an entailment of the theory?

Zacho:
You brought up neutral theory in your original post.

That is because NickM brought it up in the other thread about chimp and human DNA.

But bringing it up isn't the same as introducing it.

And how was it determined that it is an entailment without any empirical support?

A theory is a model, in this case, a model of how neutral mutations will propagate through a population.

Can't model something scientifically without evidetiary support.

Given that some mutations are neutral (such as synonymous substitutions or bases in a pseudogene), their statistical distribution can be predicted.

A prediction without empirical or evidentiary support is worthless.

In particular, the rate of fixation of neutral mutations will be equal to the neutral mutation rate.

Yup and you can say anything as long as you are not required to support it. And that makes it pure bullshit, as I said before.

Zachriel: Do you know what "entailment" means?

Then tell us what an entailment is.

There are several definitions but I would say you are using it to mean "a necessary accompaniment or consequence", which is why I asked So how do you know what you said is an entailment of the theory?

You avoided that question because you have intellectual integrity issues.

Declaring something to be an entailment and having evidence that it really is are two different things. Not that you could understand that.

Joe G: So how do you know what you said is an entailment of the theory?

Because it follows directly from the premises. Assuming unlinked, random neutral mutations μ, in a diploid population N:

The frequency in the population of a novel mutation is 1/(2N).
The probability of fixation of a mutation is its frequency in the population.
The number of new mutations in the population is 2Nμ.
Therefore, the rate of fixation of neutral mutations is 2Nμ * 1/(2N) = μ.

Some mutations are clearly neutral, such as bases in pseudogenes. And the evidence supports their neutral evolution.

So how do you know what you said is an entailment of the theory?

Because it follows directly from the premises.

But all we have is your word for that. And that means it is meaningless.

You can keep repeating all the unverified equations you want, that doesn't make them correct. Nor does it mean that you understand them.

So no experimental support for the bald declaration of entailment.

When Einstein first published his theory of relativity his calculations contained an entailment. Strange thing is no one really believed him until Eddington confirmed it via a natural experiment. And it has been confirmed over and over again via experimentation.

The neutral theory's entailment is stuck on paper and that renders it moot to the real world.

Declaring something to be an entailment and having evidence that it really is are two different things. Not that you could understand that.

And Zacho obviously doesn't understand that.

Another prediction fulfilled.

Joe G: But all we have is your word for that.

Um, no. An entailment follows from the premises of the model, which we have shown. All you have left is handwaving. Good luck with that.

But all we have is your word for that.

Without empirical evidence all we have is your word for it. And you have admitted your claim does not have empirical support.

An entailment follows from the premises of the model.

An entailment without empirical support is meaningless. And a model without empirical support is a pipe-dream.

You have shown that you are bloviating evotards who have no intention of ever supporting anything you claim.


RESULTS

To assess the support for a recent selective sweep, I follow the approach developed by P ritchard et al. (1999) to estimate the time since the onset of growth in humans. Specifically, I summarize the polymorphism data and obtain a sample of the posterior distribution of the parameters conditional on the summaries being close to (i.e., within a prespecified neighborhood) or equal to the observed value. Similar rejection-sampling methods have been used in other contexts, including estimating the effective population size (B achtrog and C harlesworth 2002), population parameters (T avare et al. 1997 W all 2000 F earnhead and D onnelly 2002), and the age of an allele (T ishkoff et al. 2001), as well as demographic inference (W eiss and von H aeseler 1998 B eaumont et al. 2002). For a discussion of the differences between implementations, see B eaumont et al. (2002).

Choice of summaries: Short of being able to use all the information in the data, one would like to use summaries that are sensitive to the parameter of interest (here, the time since the fixation of the beneficial allele, T) and capture different facets of the data. I focus on three statistics: the number of segregating sites (S), a summary of the allele frequency spectrum (D), and a summary of linkage disequilibrium (H). Previous studies have shown two of the statistics, S and D, to be sensitive to T. Specifically, selective sweeps are expected to reduce diversity, thus reducing S, and skew the frequency spectrum toward rare alleles, leading to negative D values (M aynard S mith and H aigh 1974 B raverman et al. 1995 S imonsen et al. 1995). I chose D among various frequency-spectrum summaries because when it is used as a test statistic it, more than other statistics, retains power to reject a neutral model when data are generated under a selective sweep model for larger T values (K im and S tephan 2002 P rzeworski 2002). D is the (approximately) normalized difference between a measure of diversity based on S and π, the mean pairwise difference in the sample (T ajima 1989). Thus, specifying S and D determines π as well.

The number of haplotypes, H, also carries information about T. Its behavior depends on the strength of selection and the recombination rate. If recombination occurs between selected and neutral loci during the selective sweep, then at T = 0, H/(S + 1) will be lower on average than it would be in the absence of selection (P rzeworski 2002). In other words, allelic associations will tend to be stronger than they would be in the absence of selection. As T increases, new alleles will arise by mutation. These rare alleles will create new haplotypes, such that H/(S + 1) will rapidly exceed the neutral expectation. The ratio will subsequently decrease (at T » 0.1), as the alleles gradually increase in frequency and recombine onto other backgrounds (P rzeworski 2002). If there is no recombination between selected and neutral loci during the selective sweep, most of the alleles will be rare, and H/(S + 1) will be larger than expected under neutrality, with its largest value attained for T > 0.1 (P rzeworski 2002).

—(A) A sample from the posterior distribution of T for a simulated data set, when the true time To = 0. Other parameters used to generate the data set are the same as in Table 1 (with a uniform prior on T) with ε set to 0.1 and Mε = 10 4 . In this example, S = 7, D =-1.78, and H = 4. (B) A sample from the posterior distribution of T for a simulated data set, when the true time To = 0.2. Other parameters are as in A. In this example, S = 8, D =-0.91, and H = 10.

In summary, while to a rough approximation, S and D are expected to increase monotonically with T, H/(S + 1) tends to have a maximum value at some intermediate T. This suggests that using H as an additional statistic may help to distinguish between recent T values and therefore to refine the estimates of T obtained using D and S alone. I illustrate this in Figure 1 by plotting the posterior distribution of T for two simulated data sets, conditional on S and D (first row) S and H (second row) and S, D, and H (last row). As can be seen, conditioning on all three summaries leads to a tighter distribution around the true value than does the use of only two this finding is confirmed by more extensive simulations (results not shown).

The extent to which the three summaries are informative about T depends on prior knowledge about the parameters. In particular, detecting a reduction in diversity requires some knowledge of what levels of diversity are expected to be in the absence of selection. Thus, if one has accurate prior knowledge about the population mutation rate θ (= 4Nμ), the decrease in the number of segregating sites and in the number of haplotypes can be highly informative about the time since the selective sweep (S imonsen et al. 1995). When less is known about θ, most of the information about the time since the selective sweep will come from the observed value of D and the value of H given S.

Performance of the method on simulated data

An additional benefit of using distinct aspects of the data is that the approach may be less sensitive to misspecification of the prior distributions. For example, methods that estimate T on the basis of diversity levels alone are highly sensitive to the estimate of θ. If θ is estimated to be higher than it is in reality but no selection has occurred, levels of variation will appear reduced. On this basis, methods may spuriously suggest the recent fixation of a beneficial allele. However, if the data are generated under a neutral model with an elevated mutation rate, the values of D and H will tend to be less likely under a recent selective sweep than under neutrality. Thus, the use of all three summaries may result in less support for a recent T. To examine this, I ran 20 simulations with no selection in which the mean of the prior distribution of θ was twofold larger than the value used to generate the data (parameters as in Table 1 for a uniform prior on T). For none of the simulated data sets was there strong support for a recent selective sweep (results not shown).

Performance of approach: One concern is that the posterior probabilities may not be well estimated. In that respect, an advantage of this method over more efficient ones such as Monte Carlo Markov chain is that it provides independent samples from the posterior distribution, so one can easily assess the accuracy of estimates of the posterior probabilities. In particular, if the sample from the posterior is of size Mε, the sampling error associated with Bj, the observed number of counts in interval j, is binomial (C arlin and L ouis 1998) and can be estimated using parameters (Bj/Mε, Mε). This indicates that probabilities on the order of 1/Mε are poorly estimated, while those »1/Mε are fairly precisely estimated. In the simulations presented here, Mε = 2000 and probabilities of interest are »5 × 10 -4 .

A second question is whether data generated under a selection sweep model with realistic parameters carry much information about the parameter T. Let To be the true time since the fixation of the beneficial allele. If the data are informative, the support for recent times should be stronger in the posterior distribution than in the prior if To is 0 or 0.10, while there should be weaker support for recent times in the absence of selection. As can be seen in Table 1, this is true of almost all simulated runs, whether the prior distribution of T is Exp(1.2) or U(0, 1). To measure the proportion of data sets with strong support for a “recent” selective sweep, I tabulate the proportion of 100 simulated data sets where the posterior Pr(T ≤ 0.2) > 0.50. For To = 0, the proportion is very high. It decreases with To, but even when the beneficial substitution occurred some time ago (To = 0.10), over one-third of simulated data sets strongly support a selective sweep (Table 1). In contrast, one rarely (in <5% of the runs) finds strong support for a recent selective sweep when none has occurred. In summary, the simulated data sets appear to be informative about recent genetic adaptations. In humans, the parameters chosen for the simulations correspond loosely to 25 individuals sequenced for 10 kb in a region of average recombination, comparable to what is currently collected for studies of putatively selected loci (e.g., E nard et al. 2002 H amblin et al. 2002).

Note further that these tests of performance were carried out for two quite different prior distributions for T. The results are very similar in both implementations, suggesting that the method is robust to the choice of a prior distribution. This is reassuring, as little or nothing is usually known about T—in contrast to other parameters, where there is often independent knowledge to guide the specification of the prior.

In some contexts, one is interested in an estimate of the unscaled time (in generations) since the fixation of the beneficial allele, Tgen = 4NT. As a point estimate, one might consider the mode of the sample from the posterior distribution of Tgen. In Figure 2, I plot the distribution of modes (i.e., the bin with the largest number of counts) for 100 simulated data sets. When the data are generated under a no-selection model, few modes are at recent times (e.g., 6 are at Tgen ≤ 8000 generations). In contrast, when the data are generated under a recent selective sweep model, most modes are close to the true time. For example, if the true Tgen is 0, 94 of the modes are at Tgen ≤ 4000 generations. As the true Tgen increases, the precision of the estimate decreases: thus, if the true Tgen is 4000 generations, only 60 of the modes are within a factor of two of the true value.

—The modes of posterior distribution samples of Tgen = 4NT for 100 simulated data sets. For each data set, the values of Tgen are binned in increments of 2000 from 0 to 80,000 the mode refers to the bin with the highest number of counts. The simulated data sets are the same as summarized in Table 1 for a uniform prior on T results are similar if the prior is instead Exp(1.2) (results not shown).

Application to tb1: The tb1 locus is responsible for the short branches that distinguish maize from its wild progenitor, teosinte. This trait is thought to have fixed during the domestication process, 5-10 KYA (cf. W ang et al. 1999). I use polymorphism data collected for 2740 bp of the maize tb1 by T enaillon et al. (2001 available from http://bgbox.bio.uci.edu/data/maud1asd.html). I focus on the 14 landraces among the 23 lines that were sequenced results are similar if the 9 additional inbred lines are included (results not shown). For these data, S = 39, H = 14, and D =-2.25 [statistics were calculated using DNAsp (R ozas and R ozas 1999)]. A sample from the posterior distribution of T is presented in Figure 3A. Over 99.99% of the support is on T ≤ 0.2. Thus, consistent with what is known about the role of tb1 in the domestication of maize, polymorphism data strongly suggest the recent fixation of a beneficial allele.

I also present a sample from the joint posterior distribution of s, the selection coefficient of the favored allele, and Tgen, the time in generations since the fixation of the beneficial allele (Figure 3B). The results suggest a large selection coefficient, in accordance with evidence that the trait was under artificial selection. However, most of the support is on older than expected times from the archeological record (assuming approximately one generation per year for maize). This discrepancy may be due to chance, since few estimates of Tgen will be on the true value even under ideal conditions (see Figure 2). Alternatively, it may reflect an incorrect assumption about the location of the selected site or a salient aspect of the history of maize not captured by the demographic or selective model (see below).


Results

Obtaining the neutral substitution rate

Our investigations apply to a class of evolutionary models (formally described in the Methods) in which reproduction is asexual and the population size and spatial structure are fixed. Specifically, there are a fixed number of sites, indexed i = 1, …, N. Each site is always occupied by a single individual. At each time-step, a replacement event occurs, meaning that the occupants of some sites are replaced by the offspring of others. Replacement events are chosen according to a fixed probability distribution�lled the replacement rule—specific to the model in question. Since we consider only neutral mutations that have no phenotypic effect, the probabilities of replacement events do not depend on the current population state.

This class includes many established evolutionary models. One important subclass is spatial Moran processes [23, 33�], in which exactly one reproduction occurs each time-step. This class also includes spatial Wright-Fisher processes, in which the entire population is replaced each time-step [36, 37]. In general, any subset R ⊂ <1, …, N> of individuals may be replaced in a given replacement event. Parentage in a replacement event is recorded in an offspring-to-parent map α:R → <1, …, N> (see Methods, [32, 38]), which ensures that each offspring has exactly one parent and allows us to trace lineages over time.

For a given model in this class, we let e ij denote the (marginal) probability that the occupant of site j is replaced by the offspring of site i in a single time-step. Thus the expected number of offspring of site i over a single time-step is b i = ∑ j = 1 N e i j . The probability that node i dies (i.e., is replaced) in a time-step is d i = ∑ j = 1 N e j i . The death rate d i can also be regarded as the rate of turnover at site i. The total expected number of offspring per time-step is denoted B = ∑ i = 1 N b i = ∑ i = 1 N d i = ∑ i , j e i j . We define a generation to be N/B time-steps, so that, on average, each site is replaced once per generation.

We use this framework to study the fate of a single neutral mutation, as it arises and either disappears or becomes fixed. The probability of fixation depends on the spatial structure and the initial mutant’s location. We let ρ i denote the probability that a new mutation arising at site i becomes fixed. (ρ i can also be understood as the reproductive value of site i [39].) We show in the Methods that the fixation probabilities ρ i are the unique solution to the system of equations

Equation (2) arises because ρ i equals the probability that the current occupant of site i will become the eventual ancestor of the population, which is true for exactly one of the N sites.

To determine the overall rate of substitution, we must take into account the likelihood of mutations arising at each site. The rate at which mutations arise at site i is proportional to the turnover rate d i, because each new offspring provides an independent chance of mutation. Specifically, if mutation occurs at rate u ≪ 1 per reproduction, then new mutations arise at site i at rate Nud i/B per generation [32]. Thus the fraction of mutations that arise at site i is d i/B.

The overall fixation probability ρ of new mutations, taking into account all possible initial sites, is therefore

The molecular clock rate K is obtained by multiplying the fixation probability ρ by the total rate of mutation per generation:

The units of K are substitutions per generation. Alternatively, the molecular clock can be expressed in units of substitutions per time-step, in which case the formula is K ˜ = B u ρ = u ∑ i = 1 N d i ρ i .

Effects of spatial structure

How does spatial structure affect the rate of neutral substitution? In a well-mixed population, each individual’s offspring is equally likely to replace each other individual, meaning that e ij is constant over all i and j ( Fig. 2a ). In this case, the unique solution to Eqs. (1)–(2) is ρ i = 1/N for all i, and we recover Kimura’s [2] result K = Nu(1/N) = u. Moreover, if each site is equivalent under symmetry, as in Fig. 2b , this symmetry implies that ρ i = 1/N for all i and K = u as in the well-mixed case.

(a) For a well-mixed population, represented by a complete graph with uniform edge weights, a neutral mutation has a 1/N chance of fixation, where N is the population size. It follows that the rate K of neutral substitution in the population equals the rate u of neutral mutation in individuals. (b) The same result holds for spatial structures in which each site is a priori identical, such as the cycle with uniform edge weights.

However, asymmetric spatial structure can lead to faster (K > u) or slower (K < u) molecular clock rates than a well-mixed population, as shown in Fig. 3 . From Eqs. (2) and (4) we can see that K > u is equivalent to the condition d ρ ¯ > d ‾ ρ ‾ , where the bars indicate averages over i = 1, …, N. This means that the molecular clock is accelerated if and only if d i and ρ i are positively correlated over sites that is, if and only if there is a positive spatial correlation between the arrival of new mutations and the success they enjoy.

This is because the frequency of mutations and the probability of fixation differ across sites. Turnover rates are indicated by coloration, with red corresponding to frequent turnover and consequently frequent mutation. (a) A star consists of a hub and n leaves, so that the population size is N = n+1. Edge weights are chosen so that the birth rates are uniform (b i = 1 for all i). Solving Eqs. (1)–(2), we obtain site-specific fixation probabilities of ρ H = 1/(1+n 2 ) and ρ L = n/(1+n 2 ) for the hub and each leaf, respectively. From Eq. (4), the molecular clock rate is K = 2 n 1 + n 2 u , which equals u for n = 1 and is less than u for n ≥ 2. Thus the star structure slows down the rate of neutral substitution, in accordance with Result 3. Intuitively, the slowdown occurs because mutations are more likely to arise at the hub, where their chances of fixation are reduced. (b) A one-dimensional population with self-replacement only in site 1. Solving Eqs. (1)–(2) we find ρ 1 = 8 15 , ρ 2 = 4 15 , ρ 3 = 2 15 and ρ 4 = 1 15 . (The powers of two arise because there is twice as much gene flow in one direction as the other.) From Eq. (4), the molecular clock rate is K = 16 13 u > u , thus the molecular clock is accelerated in this case.

These results led us to seek general conditions on the spatial structure leading to faster, slower, or the same molecular clock rates as a well-mixed population. We first find

Result 1. If the death rates di are constant over all sites i = 1, …, N, then ρ = 1/N, and consequently K = u.

Thus the molecular clock rate is unaffected by spatial structure if each site is replaced at the same rate ( Fig. 4a ). This result can be seen by noting that if the d i are constant over i, then since ∑ i = 1 N d i = B , it follows that d i = B/N for each i. Substituting in Eq. (3) yields ρ = 1/N.

(a) Our Result 1 states that the molecular clock has the same rate as in a well-mixed population, K = u, if the rate of turnover d i is uniform across sites, as in this example (d i = 0.2 for all i). (b) Result 2 asserts that ρ i = 1/N for all i𠅊gain implying K = u—if and only if each site has birth rate equal to death rate, b i = d i for all i, as in this example. Nodes are colored according to their rates of turnover d i.

Another condition leading to K = u is the following:

Result 2. If the birth rate equals the death rate at each site (bi = di for all i = 1, …, N), then ρ = 1/N, and consequently K = u. Moreover, bi = di for all i = 1, …, N if and only if the fixation probability is the same from each site (ρi = 1/N for all i = 1, …, N).

Thus if births and deaths are balanced at each site, then all sites provide an equal chance for mutant fixation ( Fig. 4b ). In this case the molecular clock is again unchanged from the baseline value. In particular, if dispersal is symmetric in the sense that e ij = e ji for all i and j then K = u. Result 2 can be obtained by substituting ρ i = 1/N for all i into Eq. (1) and simplifying to obtain b i = d i for all i (details in Methods). Alternatively, Result 2 can be obtained as a corollary to the Circulation Theorem of Lieberman et al. [23].

Our third result reveals a “speed limit” to neutral evolution in the case of constant birth rates:

Result 3. If the birth rates bi are constant over all sites i = 1, …, N, then ρ ≤ 1/N, and consequently K ≤ u, with equality if and only if the death rates di are also constant over sites.

In other words, a combination of uniform birth rates and nonuniform death rates slows down the molecular clock. An instance of this slowdown in shown in Fig. 3a . Intuitively, the sites at which mutations occur most frequently are those with high death rates d i because of these high death rates, these sites on the whole provide a reduced chance of fixation. The proof of this result, however, is considerably more intricate than this intuition would suggest (see Methods).

Finally, we investigate the full range of possible values for K with no constraints on birth and death rates. We find the following:

Result 4. For arbitrary spatial population structure (no constraints on eij) the fixation probability can take any value 0 ≤ ρ < 1, and consequently, the molecular clock can take any rate 0 ≤ K < Nu.

This result is especially surprising, in that it implies that the probability of fixation of a new mutation can come arbitrarily close to unity. Result 4 can be proven by considering the hypothetical spatial structure illustrated in Fig. 5 . Any non-negative value of ρ less than 1 can be obtained by an appropriate choice of parameters (details in Methods).

This is proven by considering a population structure with unidirectional gene flow from a hub (H) to N𢄡 leaves (L). Fixation is guaranteed for mutations arising in the hub (ρ H = 1) and impossible for those arising in leaves (ρ L = 0). The overall fixation probability is equal by Eq. (3) to the rate of turnover at the hub: ρ = d H = 1−(N𢄡)a. The molecular clock rate is therefore K = Nuρ = N[1−(N𢄡)a]u. It follows that K > u if and only if a < 1/N. Intuitively, the molecular clock is accelerated if the hub experiences more turnover (and hence more mutations) than the other sites. Any value of ρ greater than or equal to 0 and less than 1 can be achieved through a corresponding positive choice of a less than or equal to 1/(N𢄡). For a = 1/(N𢄡) we have K = 0, because mutations arise only at the leaves where there is no chance of fixation. At the opposite extreme, in the limit a → 0, we have KNu.

Application to upstream-downstream populations

To illustrate the effects of asymmetric dispersal on the molecular clock, we consider a hypothetical population with two subpopulations, labeled “upstream” and 𠇍ownstream” ( Fig. 6 ). The sizes of these subpopulations are labeled N and N , respectively. Each subpopulation is well-mixed, with replacement probabilities e for each pair of upstream sites and e for each pair of downstream sites. Dispersal between the subpopulations is represented by the replacement probabilities e from each upstream site to each downstream site, and e from each downstream site to each upstream site. We assume there is net gene flow downstream, so that e > e .

Each subpopulation is well-mixed. The replacement probability e ij equals e if sites i and j are both upstream, e if i and j are both downstream, e if i is upstream and j is downstream, and e if i is downstream and j is upstream. We suppose there is net gene flow downstream, so that e > e . We find that the molecular clock is accelerated, relative to the well-mixed case, if and only if the upstream subpopulation experiences more turnover than the downstream subpopulation: K > u if and only if d > d .

Solving Eqs. (1)–(2), we find that the fixation probabilities from each upstream site and each downstream site, respectively, are

These fixation probabilities were previously discovered for a different model of a subdivided population [40]. Substituting these fixation probabilities into Eq. (4) yields the molecular clock rate:

Above, d and d are the turnover rates in the upstream and downstream populations, respectively, and B = N d +N d is the total birth rate per time-step. In Methods, we show that K > u if and only if d > d that is, the molecular clock is accelerated if and only if there is more turnover in the upstream population than in the downstream population.

In the case of unidirectional gene flow, e = 0, the molecular clock rate is simply K = (d /B)Nu. The quantity d /B represents the relative rate of turnover in the upstream population, and can take any value in the range 0 ≤ d /B < 1/N thus K takes values in the range 0 ≤ K < (N/N )u. We note that the upper bound on K is inversely proportional to the size N of the upstream population. The largest possible values of K are achieved when N = 1, in which case K can come arbitrarily close to Nu. These bounds also hold if there are multiple downstream subpopulations, since for unidirectional gene flow, the spatial arrangement of downstream sites does not affect the molecular clock rate. In particular, if the hub and leaves in Fig. 5 are each replaced by well-mixed subpopulations, then K is bounded above by (N/N H)u, where N H is the size of the hub subpopulation.

Application to epithelial cell populations

Our results are also applicable to somatic evolution in self-renewing cell populations, such as the crypt-like structures of the intestine. Novel labeling techniques have revealed that neutral mutations accumulate in intestinal crypts at a constant rate over time [41]. The cell population in each crypt is maintained by a small number of stem cells that reside at the crypt bottom and continuously replace each other in stochastic manner ( Fig. 7 [42�]). We focus on the proximal small intestine in mice, for which recent studies [41, 45] suggest there are ∼ 5 active stem cells per crypt, each replaced ∼ 0.1 times per day by one of its two neighbors. In our framework, this corresponds to a cycle-structured population of size 5 with replacement rates 0.05/day between neighbors, so that d i = 0.1/day for all i.

A small number of stem cells (N ∼ 5) residing at the bottom of the intestinal crypt and are replaced at rate d ∼ 0.1 per stem cell per day. Empirical results [41, 45] suggest a cycle structure for stem cells. To achieve the correct replacement rate we set e ij = 0.05/day for each neighboring pair. Stem cells in an individual crypt replace a much larger number of progenitor and differentiated cells (∼ 250 [46]). These downstream progenitor and differentiated cells are replaced about every day [46]. The hierarchical organization of intestinal crypts, combined with the low turnover rate of stem cells, limits the rate of neutral genetic substitutions ( K ˜ ≈ 0.1 u substitutions per day), since only mutations that arise in stem cells can fix.

Only mutations that arise in stem cells can become fixed within a crypt thus we need only consider the fixation probabilities and turnover rates among stem cells. By symmetry among the stem cells, ρ i = 1/5 for each of the five stem cell sites. The molecular clock rate is therefore K ˜ = u ∑ i = 1 5 d i ρ i = 0 . 1 u substitutions per day. This accords with the empirical finding that, for a neutral genetic marker with mutation rate u ≈ 1.1휐 𢄤 , substitutions accumulate at a rate K ˜ ≈ 1.1 × 10 − 5 per crypt per day [41].

Does crypt architecture limit the rate of genetic change in intestinal tissue? Intestinal crypts in mice contain ∼ 250 cells and replace all their cells about once per day [46]. If each crypt were a well-mixed population, the molecular clock rate would be K ˜ = B u / N ≈ u substitutions per day. Thus the asymmetric structure of these epithelial crypts slows the rate of neutral genetic substitution tenfold.

Application to the spread of ideas

Our results can also be applied to ideas that spread by imitation on social networks. In this setting, a mutation corresponds to a new idea that could potentially replace an established one. Neutrality means that all ideas are equally likely to be imitated.

To investigate whether human social networks accelerate or slow the rate of idea substitution, we analyzed 973 Twitter networks from the Stanford Large Network Dataset Collection [47]. Each of these 𠇎go networks” represents follower relationships among those followed by a single 𠇎go” individual (who is not herself included in the network). We oriented the links in each network to point from followee to follower, corresponding to the presumed direction of information flow. Self-loops were removed. To ensure that fixation is possible, we eliminated individuals that could not be reached, via outgoing links, from the node with greatest eigenvector centrality. The resulting networks varied in size from 3 to 241 nodes.

To model the spread of ideas on these networks, we set e ij = 1/L if j follows i and zero otherwise, where L is the total number of links. This can be instantiated by supposing that at each time-step, one followee-follower link is chosen with uniform probability. The follower either adopts the idea of the followee, with probability 1−u, or innovates upon it to create a new idea, with probability u, where u ≪ 1. With these assumptions, the resulting rate of idea substitution (as a multiple of u) depends only on the network topology and not on any other parameters.

We found that the mean value of K among these ego networks is 0.557u, with a standard deviation of 0.222u. 19 of the 973 networks (2%) have K > u. Two networks have K = u exactly each of these has N = 3 nodes and uniform in-degree d i, thus K = u follows from Result 1 for these networks. We found a weak but statistically significant negative relationship between the network size N and value K/u (slope ≈ 𢄠.00164 with 95% confidence interval (𢄠.0023, 𢄠.001) based on the bootstrap method R ≈ 𢄠.45). This negative relationship persists even if small networks with less than 10 nodes are removed (slope ≈ 𢄠.00156 with 95% confidence interval (𢄠.0023, 𢄠.0009) R ≈ 𢄠.43). In summary, while some Twitter ego-networks accelerate the substitution of neutral innovations, the vast majority slow this rate (Figs. ​ (Figs.8 8 and ​ and9 9 ).

(a𠄾) Five of the 973 networks analyzed, including those with (a) the largest value of K, (b) the smallest value of K, and (c) the fewest nodes. (f) A scatter plot of K/u versus N reveals a weak negative correlation (slope ≈ 𢄠.00164 with 95% confidence interval (𢄠.0023, 𢄠.001) based on the bootstrap method R ≈ 𢄠.45). The colored dots on the scatter plot correspond to the networks shown in (a𠄾). The dashed line corresponds to K/u = 1, above which network topology accelerates neutral substitution.

Network structure accelerates idea substitution (K > u) if and only if there is a positive spatial correlation between the generation of new ideas (which for our model occurs proportionally to the rate d i of incoming ideas) and the probability of fixation ρ i. Panels (a) and (b) show the networks with the slowest (K ≈ 0.667u) and fastest (K ≈ 1.085u) rates of idea substitution, respectively, among networks of size 13. The coloration of nodes corresponds to their rate of turnover d i, with warmer colors indicating more rapid turnover. The size of nodes corresponds to their fixation probability ρ i.

One possible explanation for the rarity of networks that accelerate idea substitution has to do with the intrinsic relationship between the turnover rates d i and the site-specific fixation probabilities ρ i. From Eq. (1), we see that ρ i can be written as the product ( 1 / d i ) × ( ∑ j = 1 N e i j ρ j ) , where the first factor can be interpreted as the 𠇊ttention span” of node i and the second can be interpreted as its influence. While these two factors are not strictly independent, we would not necessarily expect a systematic relationship between them in our Twitter network ensemble. In the absence of such a relationship, ρ i is inversely related to d i, which implies K < u. In other words, the most fertile nodes (in terms of generating new ideas) are also the most fickle (in terms of adopting the ideas of others) thus many new ideas are abandoned as soon as they are generated. This heuristic argument suggests that K > u, while possible, might be an uncommon occurrence in networks drawn from statistical or probabilistic ensembles.


Contents

The process of genetic drift can be illustrated using 20 marbles in a jar to represent 20 organisms in a population. [8] Consider this jar of marbles as the starting population. Half of the marbles in the jar are red and half are blue, with each colour corresponding to a different allele of one gene in the population. In each new generation the organisms reproduce at random. To represent this reproduction, randomly select a marble from the original jar and deposit a new marble with the same colour into a new jar. This is the "offspring" of the original marble, meaning that the original marble remains in its jar. Repeat this process until there are 20 new marbles in the second jar. The second jar will now contain 20 "offspring", or marbles of various colours. Unless the second jar contains exactly 10 red marbles and 10 blue marbles, a random shift has occurred in the allele frequencies.

If this process is repeated a number of times, the numbers of red and blue marbles picked each generation will fluctuate. Sometimes a jar will have more red marbles than its "parent" jar and sometimes more blue. This fluctuation is analogous to genetic drift – a change in the population's allele frequency resulting from a random variation in the distribution of alleles from one generation to the next.

It is even possible that in any one generation no marbles of a particular colour are chosen, meaning they have no offspring. In this example, if no red marbles are selected, the jar representing the new generation contains only blue offspring. If this happens, the red allele has been lost permanently in the population, while the remaining blue allele has become fixed: all future generations are entirely blue. In small populations, fixation can occur in just a few generations.

The mechanisms of genetic drift can be illustrated with a simplified example. Consider a very large colony of bacteria isolated in a drop of solution. The bacteria are genetically identical except for a single gene with two alleles labeled A and B. A and B are neutral alleles meaning that they do not affect the bacteria's ability to survive and reproduce all bacteria in this colony are equally likely to survive and reproduce. Suppose that half the bacteria have allele A and the other half have allele B. Thus A and B each have allele frequency 1/2.

The drop of solution then shrinks until it has only enough food to sustain four bacteria. All other bacteria die without reproducing. Among the four who survive, there are sixteen possible combinations for the A and B alleles:

Since all bacteria in the original solution are equally likely to survive when the solution shrinks, the four survivors are a random sample from the original colony. The probability that each of the four survivors has a given allele is 1/2, and so the probability that any particular allele combination occurs when the solution shrinks is

(The original population size is so large that the sampling effectively happens with replacement). In other words, each of the sixteen possible allele combinations is equally likely to occur, with probability 1/16.

Counting the combinations with the same number of A and B, we get the following table.

A B Combinations Probability
4 0 1 1/16
3 1 4 4/16
2 2 6 6/16
1 3 4 4/16
0 4 1 1/16

As shown in the table, the total number of combinations that have the same number of A alleles as of B alleles is six, and the probability of this combination is 6/16. The total number of other combinations is ten, so the probability of unequal number of A and B alleles is 10/16. Thus, although the original colony began with an equal number of A and B alleles, it is very possible that the number of alleles in the remaining population of four members will not be equal. Equal numbers is actually less likely than unequal numbers. In the latter case, genetic drift has occurred because the population's allele frequencies have changed due to random sampling. In this example the population contracted to just four random survivors, a phenomenon known as population bottleneck.

The probabilities for the number of copies of allele A (or B) that survive (given in the last column of the above table) can be calculated directly from the binomial distribution where the "success" probability (probability of a given allele being present) is 1/2 (i.e., the probability that there are k copies of A (or B) alleles in the combination) is given by

where n=4 is the number of surviving bacteria.

Mathematical models of genetic drift can be designed using either branching processes or a diffusion equation describing changes in allele frequency in an idealised population. [9]

Wright–Fisher model Edit

Consider a gene with two alleles, A or B. In diploid populations consisting of N individuals there are 2N copies of each gene. An individual can have two copies of the same allele or two different alleles. We can call the frequency of one allele p and the frequency of the other q. The Wright–Fisher model (named after Sewall Wright and Ronald Fisher) assumes that generations do not overlap (for example, annual plants have exactly one generation per year) and that each copy of the gene found in the new generation is drawn independently at random from all copies of the gene in the old generation. The formula to calculate the probability of obtaining k copies of an allele that had frequency p in the last generation is then [10] [11]

where the symbol "!" signifies the factorial function. This expression can also be formulated using the binomial coefficient,

Moran model Edit

The Moran model assumes overlapping generations. At each time step, one individual is chosen to reproduce and one individual is chosen to die. So in each timestep, the number of copies of a given allele can go up by one, go down by one, or can stay the same. This means that the transition matrix is tridiagonal, which means that mathematical solutions are easier for the Moran model than for the Wright–Fisher model. On the other hand, computer simulations are usually easier to perform using the Wright–Fisher model, because fewer time steps need to be calculated. In the Moran model, it takes N timesteps to get through one generation, where N is the effective population size. In the Wright–Fisher model, it takes just one. [12]

In practice, the Moran and Wright–Fisher models give qualitatively similar results, but genetic drift runs twice as fast in the Moran model.

Other models of drift Edit

If the variance in the number of offspring is much greater than that given by the binomial distribution assumed by the Wright–Fisher model, then given the same overall speed of genetic drift (the variance effective population size), genetic drift is a less powerful force compared to selection. [13] Even for the same variance, if higher moments of the offspring number distribution exceed those of the binomial distribution then again the force of genetic drift is substantially weakened. [14]

Random effects other than sampling error Edit

Random changes in allele frequencies can also be caused by effects other than sampling error, for example random changes in selection pressure. [15]

One important alternative source of stochasticity, perhaps more important than genetic drift, is genetic draft. [16] Genetic draft is the effect on a locus by selection on linked loci. The mathematical properties of genetic draft are different from those of genetic drift. [17] The direction of the random change in allele frequency is autocorrelated across generations. [2]

The Hardy–Weinberg principle states that within sufficiently large populations, the allele frequencies remain constant from one generation to the next unless the equilibrium is disturbed by migration, genetic mutations, or selection. [18]

However, in finite populations, no new alleles are gained from the random sampling of alleles passed to the next generation, but the sampling can cause an existing allele to disappear. Because random sampling can remove, but not replace, an allele, and because random declines or increases in allele frequency influence expected allele distributions for the next generation, genetic drift drives a population towards genetic uniformity over time. When an allele reaches a frequency of 1 (100%) it is said to be "fixed" in the population and when an allele reaches a frequency of 0 (0%) it is lost. Smaller populations achieve fixation faster, whereas in the limit of an infinite population, fixation is not achieved. Once an allele becomes fixed, genetic drift comes to a halt, and the allele frequency cannot change unless a new allele is introduced in the population via mutation or gene flow. Thus even while genetic drift is a random, directionless process, it acts to eliminate genetic variation over time. [19]

Rate of allele frequency change due to drift Edit

Assuming genetic drift is the only evolutionary force acting on an allele, after t generations in many replicated populations, starting with allele frequencies of p and q, the variance in allele frequency across those populations is

Time to fixation or loss Edit

Assuming genetic drift is the only evolutionary force acting on an allele, at any given time the probability that an allele will eventually become fixed in the population is simply its frequency in the population at that time. [21] For example, if the frequency p for allele A is 75% and the frequency q for allele B is 25%, then given unlimited time the probability A will ultimately become fixed in the population is 75% and the probability that B will become fixed is 25%.

The expected number of generations for fixation to occur is proportional to the population size, such that fixation is predicted to occur much more rapidly in smaller populations. [22] Normally the effective population size, which is smaller than the total population, is used to determine these probabilities. The effective population (Ne) takes into account factors such as the level of inbreeding, the stage of the lifecycle in which the population is the smallest, and the fact that some neutral genes are genetically linked to others that are under selection. [13] The effective population size may not be the same for every gene in the same population. [23]

One forward-looking formula used for approximating the expected time before a neutral allele becomes fixed through genetic drift, according to the Wright–Fisher model, is

where T is the number of generations, Ne is the effective population size, and p is the initial frequency for the given allele. The result is the number of generations expected to pass before fixation occurs for a given allele in a population with given size (Ne) and allele frequency (p). [24]

The expected time for the neutral allele to be lost through genetic drift can be calculated as [10]

When a mutation appears only once in a population large enough for the initial frequency to be negligible, the formulas can be simplified to [25]

for average number of generations expected before fixation of a neutral mutation, and

for the average number of generations expected before the loss of a neutral mutation. [26]

Time to loss with both drift and mutation Edit

The formulae above apply to an allele that is already present in a population, and which is subject to neither mutation nor natural selection. If an allele is lost by mutation much more often than it is gained by mutation, then mutation, as well as drift, may influence the time to loss. If the allele prone to mutational loss begins as fixed in the population, and is lost by mutation at rate m per replication, then the expected time in generations until its loss in a haploid population is given by

where γ is Euler's constant. [27] The first approximation represents the waiting time until the first mutant destined for loss, with loss then occurring relatively rapidly by genetic drift, taking time Ne ≪ 1/m. The second approximation represents the time needed for deterministic loss by mutation accumulation. In both cases, the time to fixation is dominated by mutation via the term 1/m, and is less affected by the effective population size.

In natural populations, genetic drift and natural selection do not act in isolation both phenomena are always at play, together with mutation and migration. Neutral evolution is the product of both mutation and drift, not of drift alone. Similarly, even when selection overwhelms genetic drift, it can only act on variation that mutation provides.

While natural selection has a direction, guiding evolution towards heritable adaptations to the current environment, genetic drift has no direction and is guided only by the mathematics of chance. [28] As a result, drift acts upon the genotypic frequencies within a population without regard to their phenotypic effects. In contrast, selection favors the spread of alleles whose phenotypic effects increase survival and/or reproduction of their carriers, lowers the frequencies of alleles that cause unfavorable traits, and ignores those that are neutral. [29]

The law of large numbers predicts that when the absolute number of copies of the allele is small (e.g., in small populations), the magnitude of drift on allele frequencies per generation is larger. The magnitude of drift is large enough to overwhelm selection at any allele frequency when the selection coefficient is less than 1 divided by the effective population size. Non-adaptive evolution resulting from the product of mutation and genetic drift is therefore considered to be a consequential mechanism of evolutionary change primarily within small, isolated populations. [30] The mathematics of genetic drift depend on the effective population size, but it is not clear how this is related to the actual number of individuals in a population. [16] Genetic linkage to other genes that are under selection can reduce the effective population size experienced by a neutral allele. With a higher recombination rate, linkage decreases and with it this local effect on effective population size. [31] [32] This effect is visible in molecular data as a correlation between local recombination rate and genetic diversity, [33] and negative correlation between gene density and diversity at noncoding DNA regions. [34] Stochasticity associated with linkage to other genes that are under selection is not the same as sampling error, and is sometimes known as genetic draft in order to distinguish it from genetic drift. [16]

When the allele frequency is very small, drift can also overpower selection even in large populations. For example, while disadvantageous mutations are usually eliminated quickly in large populations, new advantageous mutations are almost as vulnerable to loss through genetic drift as are neutral mutations. Not until the allele frequency for the advantageous mutation reaches a certain threshold will genetic drift have no effect. [29]

A population bottleneck is when a population contracts to a significantly smaller size over a short period of time due to some random environmental event. In a true population bottleneck, the odds for survival of any member of the population are purely random, and are not improved by any particular inherent genetic advantage. The bottleneck can result in radical changes in allele frequencies, completely independent of selection. [35]

The impact of a population bottleneck can be sustained, even when the bottleneck is caused by a one-time event such as a natural catastrophe. An interesting example of a bottleneck causing unusual genetic distribution is the relatively high proportion of individuals with total rod cell color blindness (achromatopsia) on Pingelap atoll in Micronesia. After a bottleneck, inbreeding increases. This increases the damage done by recessive deleterious mutations, in a process known as inbreeding depression. The worst of these mutations are selected against, leading to the loss of other alleles that are genetically linked to them, in a process of background selection. [2] For recessive harmful mutations, this selection can be enhanced as a consequence of the bottleneck, due to genetic purging. This leads to a further loss of genetic diversity. In addition, a sustained reduction in population size increases the likelihood of further allele fluctuations from drift in generations to come.

A population's genetic variation can be greatly reduced by a bottleneck, and even beneficial adaptations may be permanently eliminated. [36] The loss of variation leaves the surviving population vulnerable to any new selection pressures such as disease, climatic change or shift in the available food source, because adapting in response to environmental changes requires sufficient genetic variation in the population for natural selection to take place. [37] [38]

There have been many known cases of population bottleneck in the recent past. Prior to the arrival of Europeans, North American prairies were habitat for millions of greater prairie chickens. In Illinois alone, their numbers plummeted from about 100 million birds in 1900 to about 50 birds in the 1990s. The declines in population resulted from hunting and habitat destruction, but a consequence has been a loss of most of the species' genetic diversity. DNA analysis comparing birds from the mid century to birds in the 1990s documents a steep decline in the genetic variation in just the latter few decades. Currently the greater prairie chicken is experiencing low reproductive success. [39]

However, the genetic loss caused by bottleneck and genetic drift can increase fitness, as in Ehrlichia. [40]

Over-hunting also caused a severe population bottleneck in the northern elephant seal in the 19th century. Their resulting decline in genetic variation can be deduced by comparing it to that of the southern elephant seal, which were not so aggressively hunted. [41]

Founder effect Edit

The founder effect is a special case of a population bottleneck, occurring when a small group in a population splinters off from the original population and forms a new one. The random sample of alleles in the just formed new colony is expected to grossly misrepresent the original population in at least some respects. [42] It is even possible that the number of alleles for some genes in the original population is larger than the number of gene copies in the founders, making complete representation impossible. When a newly formed colony is small, its founders can strongly affect the population's genetic make-up far into the future.

A well-documented example is found in the Amish migration to Pennsylvania in 1744. Two members of the new colony shared the recessive allele for Ellis–Van Creveld syndrome. Members of the colony and their descendants tend to be religious isolates and remain relatively insular. As a result of many generations of inbreeding, Ellis–Van Creveld syndrome is now much more prevalent among the Amish than in the general population. [29] [43]

The difference in gene frequencies between the original population and colony may also trigger the two groups to diverge significantly over the course of many generations. As the difference, or genetic distance, increases, the two separated populations may become distinct, both genetically and phenetically, although not only genetic drift but also natural selection, gene flow, and mutation contribute to this divergence. This potential for relatively rapid changes in the colony's gene frequency led most scientists to consider the founder effect (and by extension, genetic drift) a significant driving force in the evolution of new species. Sewall Wright was the first to attach this significance to random drift and small, newly isolated populations with his shifting balance theory of speciation. [44] Following after Wright, Ernst Mayr created many persuasive models to show that the decline in genetic variation and small population size following the founder effect were critically important for new species to develop. [45] However, there is much less support for this view today since the hypothesis has been tested repeatedly through experimental research and the results have been equivocal at best. [46]

The role of random chance in evolution was first outlined by Arend L. Hagedoorn and A. C. Hagedoorn-Vorstheuvel La Brand in 1921. [47] They highlighted that random survival plays a key role in the loss of variation from populations. Fisher (1922) responded to this with the first, albeit marginally incorrect, mathematical treatment of the 'Hagedoorn effect'. [48] Notably, he expected that many natural populations were too large (an N

10,000) for the effects of drift to be substantial and thought drift would have an insignificant effect on the evolutionary process. The corrected mathematical treatment and term "genetic drift" was later coined by a founder of population genetics, Sewall Wright. His first use of the term "drift" was in 1929, [49] though at the time he was using it in the sense of a directed process of change, or natural selection. Random drift by means of sampling error came to be known as the "Sewall–Wright effect," though he was never entirely comfortable to see his name given to it. Wright referred to all changes in allele frequency as either "steady drift" (e.g., selection) or "random drift" (e.g., sampling error). [50] "Drift" came to be adopted as a technical term in the stochastic sense exclusively. [51] Today it is usually defined still more narrowly, in terms of sampling error, [52] although this narrow definition is not universal. [53] [54] Wright wrote that the "restriction of "random drift" or even "drift" to only one component, the effects of accidents of sampling, tends to lead to confusion." [50] Sewall Wright considered the process of random genetic drift by means of sampling error equivalent to that by means of inbreeding, but later work has shown them to be distinct. [55]

In the early days of the modern evolutionary synthesis, scientists were beginning to blend the new science of population genetics with Charles Darwin's theory of natural selection. Within this framework, Wright focused on the effects of inbreeding on small relatively isolated populations. He introduced the concept of an adaptive landscape in which phenomena such as cross breeding and genetic drift in small populations could push them away from adaptive peaks, which in turn allow natural selection to push them towards new adaptive peaks. [56] Wright thought smaller populations were more suited for natural selection because "inbreeding was sufficiently intense to create new interaction systems through random drift but not intense enough to cause random nonadaptive fixation of genes." [57]

Wright's views on the role of genetic drift in the evolutionary scheme were controversial almost from the very beginning. One of the most vociferous and influential critics was colleague Ronald Fisher. Fisher conceded genetic drift played some role in evolution, but an insignificant one. Fisher has been accused of misunderstanding Wright's views because in his criticisms Fisher seemed to argue Wright had rejected selection almost entirely. To Fisher, viewing the process of evolution as a long, steady, adaptive progression was the only way to explain the ever-increasing complexity from simpler forms. But the debates have continued between the "gradualists" and those who lean more toward the Wright model of evolution where selection and drift together play an important role. [58]

In 1968, Motoo Kimura rekindled the debate with his neutral theory of molecular evolution, which claims that most of the genetic changes are caused by genetic drift acting on neutral mutations. [6] [7]

The role of genetic drift by means of sampling error in evolution has been criticized by John H. Gillespie [59] and William B. Provine, who argue that selection on linked sites is a more important stochastic force.


Neutral Theory Sets the “Conservation Prior”

Neutral theory serves as the null hypothesis for molecular-evolutionary theory, and particularly, for the coalescent process ( Wakeley 2003 Duret 2008), yet “the value of neutral theory in conservation has gone unrecognized” ( Rosindell et al. 2011, p. 346). Though Rosindell et al. (2011) are referencing Hubbell’s ecological theory in this case, the point also applies to the neutral theory of molecular evolution. In making the connection between Kimura’s theory with Hubbell’s ecological neutral theory, Rosindell et al. (2011) draw special attention to exchangeability as the common conceptual theme. Whether it is genomic loci or individuals within a community, neutrality implies that substituting one allele or individual for another does not impact the evolutionary fitness of either. Thus, when we observe departures from exchangeability, we are alerted to active processes such as human-mediated climate change, deforestation, the introduction of chemical pollutants, and so on (the list is depressingly long), which have disrupted the neutral expectations of population connectivity and gene flow.

The time has arrived for conservation genetics to adopt an explicit hypothesis-driven framework that draws from Kimura’s neutral theory of molecular evolution. Shafer et al. (2015) have described the need for setting “conservation priors” (p. 84) to guide the design of research questions that are actionable and achievable for improving population and species viability. Our expectations for variation at the molecular level as a function of Ne and migration can be used in the planning of experimental designs of conservation genetic research that use model-based hypothesis testing to detect likely migration corridors ( Aguillon et al. 2017), infer population declines that may explain present-day reduced genetic variation ( Figueiró et al. 2017), or evaluate the efficacy of restoration programs ( Li et al. 2014). These actionable hypotheses can be advanced through a synthesis of inventorial, functional, and especially process-driven conservation genomics a synthesis that leverages the power of the many molecular-evolutionary insights that Kimura bestowed on the field 50 years ago.


Fixation of a deleterious allele at one of two "duplicate" loci by mutation pressure and random drift

We consider a diploid population and assume two gene loci with two alleles each, A and a at one locus and B and b at the second locus. Mutation from wild-type alleles A and B to deleterious alleles a and b occurs with mutation rates va and vb, respectively. We assume that alleles are completely recessive and that only the double recessive genotype aabb shows a deleterious effect with relative fitness 1-epsilon. Then, it can be shown that if va greater than vb mutant a becomes fixed in the population by mutation pressure and a mutation-selection balance is ultimately attained with respect to the B/b locus alone. The main aim of this paper is to investigate the situation in which va = vb exactly. In this case a neutral equilibrium is attained and either locus can drift to fixation for the mutant allele. Diffusion models are developed to treat the stochastic process involved whereby the deleterious mutant eventually becomes fixed in one of the two duplicated loci by random sampling drift in finite populations. In particular, the equation for the average time until fixation of mutant a or b is derived, and this is solved numerically for some combinations of parameters 4Nev and 4Ne epsilon, where v is the mutation rate (va = vb = v) and Ne is the effective size of the population. Monte Carlo experiments have been performed (using a device termed "pseudo sampling variable") to supplement the numerical analysis.


1. Introduction

The relationship between genotype and phenotype, the ways in which this map conditions the adaptive dynamics of populations, or the imprints that life histories leave in the genomes of organisms are essential questions to be solved before a complete evolutionary theory can be achieved. Genotypes, which encode much of the information required to construct organisms, are occasionally affected by mutations that modify the phenotype, their visible expression and the target of natural selection. Many mutations are neutral instead [1], varying the regions of the space of genotypes that can be accessed by the population [2] and conditioning its evolvability [3] but leaving the phenotype unchanged. The relation between genotype and phenotype is not one-to-one, but many-to-many. In particular, genotypes encoding a specific phenotype may form vast, connected networks that often span the whole space of possible genotypes. The existence of these networks in the case of proteins was postulated by Maynard Smith [4] as a requirement for evolution by natural selection to occur. Subsequent research has shown that these networks do exist for functional proteins [5], for other macromolecules like RNA [6], and generically appear in simple models of the genotype–phenotype map-mimicking regulatory gene networks [7], metabolic reaction networks [8] or the self-assembly of protein quaternary structure [9].

Nevertheless, systematic explorations of the topological structure of neutral networks (NNs) have been undertaken only recently, despite the fact that some of the implications of NN structure on sequence evolution were identified long ago. For instance, Kimura's neutral theory [1] postulated that the number of neutral substitutions in a given time interval was Poisson distributed. That assumption had an underlying hypothesis that was not explicitly stated at the time, namely that the number of neutral mutations available to any genotype was constant, independent of the precise genotype, of time or of the expressed phenotype. In other words, NNs were assumed to be homogeneous in degree. A consequence was that the variance of the number of mutations accumulated should equal the mean, and the dispersion index (R, the ratio between the variance and the mean) must then be equal to 1. Very early, however, it was observed that R was significantly larger than 1 in almost all cases analysed [10�]. The appearance of short bursts of rapid evolution was ascribed to episodes of positive Darwinian selection [12] that may reflect fluctuations in population size in quasi-neutral environments, where epistatic interactions would become relevant [13].

The fact is that NNs are highly heterogeneous. Some genotypes are brittle and easily yield a different phenotype under mutation. They have one or few neighbours within the NN. Other genotypes, instead, are robust and can stand a very large number of mutations while maintaining their biological or chemical function. The existence of variations in the degree of neutrality of genotypes (in their robustness) was soon put forth as a possible explanation for the overdispersion of the molecular clock [13]. Nowadays, the distributions of robustness of the genotypes in several different NNs have been measured, turning out to be remarkably broad [14�]. The effects of fluctuating neutral spaces in the overdispersion of the molecular clock have been investigated in realistic models of evolution for proteins [17] and quasi-species [18], and also from a theoretical viewpoint [19].

In this contribution, we explore three features of NNs whose consequences in the rate of fixation of mutations have not been systematically investigated. They are (i) the correlations in neutrality between neighbouring genotypes, (ii) the degree of redundancy of phenotypes (the size of NN), and (iii) the fitness of the current phenotype in relation to accessible alternatives. First, the degree of neutrality is not randomly distributed in the NN. Thermodynamical arguments [20,21] and analyses of full NNs [22] indicate that genotypes tend to cluster as a function of their robustness, implying that NNs belong to the class of the so-called assortative networks. It is known that populations evolving on NNs tend to occupy maximally connected regions in order to minimize the number of mutations changing their phenotype [2]. In assortative networks, neutral drift entails a canalization towards mutation–selection equilibrium, progressively increasing the rate of fixation of neutral mutations through a dynamical process that we dub phenotypic entrapment. Second, there is abundant evidence coming from computational genotype–phenotype maps [9,23�] and from empirical reconstruction of NNs [25] that the average robustness of a given phenotype grows with the size of its associated NN: here, we quantify the effect of a systematic difference in average degree on the probability of fixation of neutral mutations. Third, the difference in fitness between genotypes in the current NN and their mutational neighbours affects the probability that a mutation (be it neutral, beneficial or deleterious) gets fixed in the population, and with it, as we explicitly show, the rate of the molecular clock during intervals of strictly neutral drift.

With the former goals in mind, we develop an out-of-equilibrium formal framework to describe the dynamics of homogeneous, infinite populations on generic NNs. We demonstrate that the population keeps memory of its past history because, as time elapses, the likelihood that it visits genotypes of increasingly higher robustness augments in a precise way that we calculate. This is a consequence of assortativity and a dynamic manifestation of the 𠆏riendship paradox’ [26] described in social networks (your friends have more friends than you). As a result, the probability that the population leaves the network explicitly depends on the elapsed time. Further, the decline of this probability with time entails a systematic acceleration of the rate of accumulation of neutral mutations. The degree of entrapment is higher, the larger the NN and the broader the difference between the fitness of the current phenotype and that of accessible alternatives. These results are fairly general and have implications in the derivation of effective models of phenotypic change and in the calibration of molecular clocks.


Methods

Simulations

I conducted 200 replicate forward-time simulations of a metapopulation adapting to a heterogeneous spatial environment (Figure 1) with SLiM v. 3.2 (Haller and Messer 2017) to create SNP data for each individual. The simulations resulted in a population that had isolation-by-distance structure along an environmental gradient (e.g., isolation by environment, Wang and Bradburd 2014). For simplicity in interpreting the results, only one type of genomic heterogeneity was simulated on each LG, such that each LG evolved approximately independently. Each of the 9 LGs were 50,000 bases and 50 cM in length. The base recombination rate Ner = 0.01 (unless manipulated as described below) gave a resolution of 0.001 cM between proximate bases. The recombination rate was scaled to mimic the case where SNPs were collected across a larger genetic map than what was simulated (similar to a SNP chip), but still low enough to allow signatures of selection to arise in neutral loci linked to selected loci (in the simulations 50,000 bases / (r = 1e-05) * 100 = 50 cM in humans 50,000 bp would correspond to 0.05 cM). Thus, SNPs at the opposite ends of linkage groups were likely to have a recombination rate between them of 0.5 (unlinked), but there would otherwise be some degree of linkage among SNPs within linkage groups. For all LGs, the population-scaled mutation rate Neμ equaled 0.001. For computational efficiency, 1000 individuals were simulated with scaling of mutation rate and recombination rate as described above (Fisher 1930 Wright 1931, 1938 Crow and Kimura 1970 Bürger 2000). In the first generation, individuals were placed randomly on a spatial map between the coordinates 0 and 1. Individuals dispersed a distance given by a bivariate normal distribution with zero mean and variance (Table 2).

Example landscape simulation. Each box is an individual, colored by their phenotypic value. The background is the selective environment. This output was generated after 1900 generations of selection by the environment, resulting in a correlation of 0.52 between the phenotype and the environment.



Comments:

  1. Waldmunt

    Yes, I understand you. In it something is also to me it seems it is very excellent thought. Completely with you I will agree.

  2. Jose

    Do not take in the head!

  3. Vudokazahn

    I think you are wrong. Email me at PM.



Write a message