Epistasis Across Chromosomes and Individuals 'Homozygous for Interactions'

Epistasis Across Chromosomes and Individuals 'Homozygous for Interactions'

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Apologies for any failures in nomenclature. I'm a mathematician who is making a foray into genetics for a masters thesis. Specifically, I'm generating artificial diploid genetic sequence data and phenotype data based on known epistatic interactions.

Primary Question

I am familiar with the concept that having multiple copies of a single allele (e.g. one from each parent) can actually increase expression and thereby change the quantitative phenotype (at least in some cases?). Can this also happen with epistatic effects? For example, consider the case that locus 1 ($L_1$) and locus 2 ($L_2$) exhibit some positive interaction ($eta$) on the trait ($Y$). $$E[Y]=mu+eta(L_1L_2)$$ If both chromosomes contain the allele in question at both loci, then does this individual exhibit more of a trait increase than if he only had one chromosome with the epistatic alleles?

Secondary question

Can alleles interact epistatically even when they are present on different chromosomes? e.g. assume allele A at locus 1 interacts with allele B at locus 2. Individual has A at locus 1 and b at locus 2 on chromosome 1. He also has a at locus 1 and B at locus 2 on chromosome 2. Do A and B from the different chromosomes interact? My intuition is yes but I want verification.

Many thanks!!!

Question 1: The phenomena you describe in which it matters whether you have one or two copies of an allele (e.g., the AA phenotype being different than the Aa phenotype) are known as dominance effects. Dominance effects can interact with epistatic effects (in which the phenotypic effect of one locus depends on the genotype at the another locus).

One good example is the interaction between the Agouti locus and the Mc1R locus in the oldfield mouse (Peromyscus polionotus) in determining coat color. Dark-color homozygotes for the Agouti locus have dark coats irrespective of the allele at the Mc1R locus, whereas color in Agouti heterozygotes or light color homozygotes depends on genotype at the Mc1R locus. The figure below, from Steiner et al (2007), illustrates:

Question 2: Epistatically interacting loci need not be on the same chromosome. The loci discussed above provide a perfectly good example. Mc1R is on chromosome 1, where as Agouti is on chromosome 7, yet they interact epistatically.

Extensive epistasis for olfactory behaviour, sleep and waking activity in Drosophila melanogaster

Epistasis is an important feature of the genetic architecture of quantitative traits, but the dynamics of epistatic interactions in natural populations and the relationship between epistasis and pleiotropy remain poorly understood. Here, we studied the effects of epistatic modifiers that segregate in a wild-derived Drosophila melanogaster population on the mutational effects of P-element insertions in Semaphorin-5C (Sema-5c) and Calreticulin (Crc), pleiotropic genes that affect olfactory behaviour and startle behaviour and, in the case of Crc, sleep phenotypes. We introduced Canton-S B (CSB) third chromosomes with or without a P-element insertion at the Crc or Sema-5c locus in multiple wild-derived inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) and assessed the effects of epistasis on the olfactory response to benzaldehyde and, for Crc, also on sleep. In each case, we found substantial epistasis and significant variation in the magnitude of epistasis. The predominant direction of epistatic effects was to suppress the mutant phenotype. These observations support a previous study on startle behaviour using the same D. melanogaster chromosome substitution lines, which concluded that suppressing epistasis may buffer the effects of new mutations. However, epistatic effects are not correlated among the different phenotypes. Thus, suppressing epistasis appears to be a pervasive general feature of natural populations to protect against the effects of new mutations, but different epistatic interactions modulate different phenotypes affected by mutations at the same pleiotropic gene.


It has often been proposed that the fitness reduction caused by the joint presence of several deleterious alleles in a given genotype may be larger than expected under a between-loci multiplicative fitness model. Such enhancement of deleterious effects, denoted synergistic, reinforcing or negative epistasis, underlies the mutational deterministic hypothesis (MDH) of the evolution of sex. This hypothesis considers that, under synergistic epistasis, selection preferably removes genotypes formed by gametes loaded with several deleterious, and this process is more efficient if sex and recombination shuffle genes so that such gametes are produced anew at each generation. Therefore, sexual reproduction and recombination could have evolved because they allow a reduction of the number of segregating deleterious alleles at mutation-selection balance (e.g. Kondrashov, 1988 Otto & Lenormand, 2002 ). It has been shown that, in many species with sexual anisogamous reproduction, the rate of deleterious mutation is so small that the advantage of sex accounted by the MDH would not cancel the two-fold cost of anisogamy ( Crow, 1970 Kondrashov, 1988 Keightley & Eyre-Walker, 2000 García-Dorado et al., 2003 ). However, synergistic epistasis might still be responsible for a substantial advantage of less costly shuffling systems, such as recombination or isogamous sexual reproduction. Synergistic epistasis has been assayed a number of times but the results, briefly discussed below, are not generally consistent.

Accelerated viability decay has been found in some long-term mutation accumulation experiments with Drosophila melanogaster, and it has been ascribed to synergistic epistasis ( Mukai, 1969 ). However, such accelerated decay is not a general phenomenon ( García-Dorado & Caballero, 2002 Fry, 2004 ). Similarly, using two insect species, synthetic combinations of deleterious mutations showed strong synergy for fitness traits, but this only became apparent when the number of deleterious mutations that were brought together was so large that the corresponding genotypes should be rare in natural population ( Whitlock & Bourguet, 2000 Rivero et al., 2003 ). Furthermore, similar experiments using specific combinations of deleterious mutations for RNA viruses, bacteria, yeast and nematode, although occasionally detecting between-loci interactions, did not show a consistent trend towards synergy ( Elena & Lenski, 1997 Visser et al., 1997 Peña et al., 2000 Peters & Keightley, 2000 Wloch et al., 2001 Burch et al., 2003 ).

Synergy will reduce the fitness of genotypes homozygous for several deleterious alleles below the product of the homozygous fitness separately computed for each deleterious allele, so that the mean of log-fitness components will show an accelerated decay with increasing inbreeding coefficient F. This is the rationale for a second experimental approach in which synergistic epistasis is sought by assaying the linearity of the depression of log-fitness (or log-fitness components) on the inbreeding coefficient (F) in segregating populations. A considerable number of experiments have been performed with different organisms in which populations with different inbreeding levels were synchronously assayed for some fitness trait, and where the depression usually turned out to be linear on F, suggesting no synergy. However, natural selection could have purged genotypes homozygous for larger numbers of deleterious alleles, thus hiding the effects of synergy (see Willis, 1993 ). In fact, estimates of the inbreeding depression rate (δ) obtained by practising inbreeding are commonly smaller than those obtained by forcing chromosomal homozygosity ( Dobzhansky et al., 1963 Malogolowkin-Cohen et al., 1964 ). Regarding our specific biological material, D. melanogaster, δ estimates for viability obtained through inbreeding (δ ≈ 0.7, López-Fanjul & Villaverde, 1989 García et al., 1994 ) are smaller than those obtained by forcing chromosomal homozygosis (δ ≈ 1, δ ≈ 2, δ ≈ 2.3, δ ≈ 2.8, δ ≈ 1.2, δ ≈ 1, in Temin et al., 1969 Mukai & Yamaguchi, 1974 Seager & Ayala, 1982 Mukai & Nagano, 1983 Kusakabe & Mukai, 1984 Kusakabe et al., 2000 , respectively results adjusted for the whole genome). This might suggest purging selection in inbreeding experiments, although it should be noted that the estimates correspond to different populations that may conceal different inbreeding loads.

To prevent natural selection, several experiments have been performed with different Drosophila species using the methodology proposed by Greenberg & Crow (1960) . Thus, using crossing schemes involving balancer stocks, the inbreeding depression rate was estimated from different genomic F values obtained by forcing homozygosity for different numbers of whole chromosomes. In this way, Spassky et al. (1965) found that the viability of individuals made homozygous for both the second and third chromosomes was, on average, less than the product of the viability of individuals made homozygous for each chromosome. However, another similar experiment found no evidence of synergy ( Seager & Ayala, 1982 ), and Temin et al. (1969) found roughly linear inbreeding depression of log-viability for different levels of inbreeding for chromosomes II and III, with weak and nonsignificant synergy affecting quasinormal (QN) chromosomes. Thus, evidence for synergy is generally inconsistent and most detected cases of synergistic epistasis were associated with high F values or with laboratory genotypes carrying many deleterious mutations.

The evolutionary relevance of negative synergy depends on the amount of load required for it to occur. For example, if synergy only appears for rare genotypes carrying many deleterious alleles, it will rarely confer to sex the advantage required to compensate the two-fold cost of anisogamy. Similarly, synergy that only occurs between loci at different chromosomes cannot be invoked as a cause for the evolution of recombination. Our purpose is to inquire into the generality and nature of within chromosome synergy by assaying the linearity of the inbreeding depression for different levels of inbreeding involving chromosome II (FII) and representing a moderate total inbreeding (F) when considered relative to the whole genome. We assayed chromosome II viability relative to that of a common marker genotype (Cy/L 2 ) for genotypes: (i) carrying two chromosomes sampled from different females (F = 0), (ii) carrying two chromosomes sampled from the gametes produced by the same female (FII = 0.5, i.e. F ≈ 0.2 for the whole genome), and (iii) being homozygous for one chromosome (FII = 1, i.e. F ≈ 0.4 for the whole genome). In order to avoid differences between the three assays caused by natural selection, chromosomes missing or being excluded from one assay were also removed from the other two, so that the three groups of genotypes analysed (FII = 0, 0.5, 1) include exactly the same sample of chromosomes II. To evaluate the advisability of assaying synergy for fecundity, the genetic relationship of viability and fecundity with competitive fitness was investigated at a subsample of the above genotypes.

Epistasis Across Chromosomes and Individuals 'Homozygous for Interactions' - Biology

In this section, you will explore the following questions:

  • What is the relationship between Mendel’s law of segregation and independent assortment in terms of genetics and the events of meiosis?
  • How can the forked-lined method and probability rules be used to calculate the probability of genotypes and phenotypes from multiple gene crosses?
  • How do linkage, cross-over, epistasis, and recombination violate Mendel’s laws of inheritance?

Connection for AP ® Courses

As was described previously, Mendel proposed that genes are inherited as pairs of alleles that behave in a dominant and recessive pattern. During meiosis, alleles segregate, or separate, such that each gamete is equally likely to receive either one of the two alleles present in the diploid individual. Mendel called this phenomenon the law of segregation, which can be demonstrated in a monohybrid cross. In addition, genes carried on different chromosomes sort into gametes independently of one another. This is Mendel’s law of independent assortment. This law can be demonstrated in a dihybrid cross involving two different traits located on different chromosomes. Punnett squares can be used to predict genotypes and phenotypes of offspring involving one or two genes.

Although chromosomes sort independently into gametes during meiosis, Mendel’s law of independent assortment refers to genes, not chromosomes. In humans, single chromosomes may carry more than 1,000 genes. Genes located close together on the same chromosome are said to be linked genes. When genes are located in close proximity on the same chromosome, their alleles tend to be inherited together unless recombination occurs. This results in offspring ratios that violate Mendel’s law of independent assortment. Genes that are located far apart on the same chromosome are likely to assort independently. The rules of probability can help to sort this out (pun intended). The law states that alleles of different genes assort independently of one another during gamete formation.

Information presented and the examples highlighted in the section support concepts outlined in Big Idea 3 of the AP ® Biology Curriculum Framework. The Learning Objectives listed in the Curriculum Framework provide a transparent foundation for the AP ® Biology course, an inquiry-based laboratory experience, instructional activities, and AP ® exam questions. A learning objective merges required content with one or more of the seven Science Practices.

Big Idea 3 Living systems store, retrieve, transmit and respond to information essential to life processes.
Enduring Understanding 3.A Heritable information provides for continuity of life.
Essential Knowledge 3.A.3 The chromosomal basis of inheritance provides an understanding of the pattern of passage (transmission) of genes from parent to offspring.
Science Practice 2.2 The student can apply mathematical routines to quantities that describe natural phenomena.
Learning Objective 3.14 The student is able to apply mathematical routines to determine Mendelian patterns of inheritance provided by data.
Essential Knowledge 3.A.4 The inheritance pattern of many traits cannot be explained by simple Mendelian genetics.
Science Practice 6.5 The student can evaluate alternative scientific explanations.
Learning Objective 3.15 The student is able to explain deviations from Mendel’s model of the inheritance of traits.
Essential Knowledge 3.A.4 The inheritance pattern of many traits cannot be explained by simple Mendelian genetics.
Science Practice 6.3 The student can articulate the reasons that scientific explanations and theories are refined or replaced.
Learning Objective 3.16 The student is able to explain how the inheritance patterns of many traits cannot be accounted for by Mendelian genetics.
Essential Knowledge 3.A.4 The inheritance pattern of many traits cannot be explained by simple Mendelian genetics.
Science Practice 1.2 The student can describe representations and models of natural or man-made phenomena and systems in the domain.
Learning Objective 3.17 The student is able to describe representations of an appropriate example of inheritance patterns that cannot be explained by Mendel’s model of the inheritance of traits.

The Science Practice Challenge Questions contain additional test questions for this section that will help you prepare for the AP exam. These questions address the following standards:
[APLO 3.11][APLO 3.15][APLO 3.14][APLO 3.17][APLO 3.12]

Mendel generalized the results of his pea-plant experiments into four postulates, some of which are sometimes called “laws,” that describe the basis of dominant and recessive inheritance in diploid organisms. As you have learned, more complex extensions of Mendelism exist that do not exhibit the same F2 phenotypic ratios (3:1). Nevertheless, these laws summarize the basics of classical genetics.

Pairs of Unit Factors, or Genes

Mendel proposed first that paired unit factors of heredity were transmitted faithfully from generation to generation by the dissociation and reassociation of paired factors during gametogenesis and fertilization, respectively. After he crossed peas with contrasting traits and found that the recessive trait resurfaced in the F2 generation, Mendel deduced that hereditary factors must be inherited as discrete units. This finding contradicted the belief at that time that parental traits were blended in the offspring.

Alleles Can Be Dominant or Recessive

Mendel’s law of dominance states that in a heterozygote, one trait will conceal the presence of another trait for the same characteristic. Rather than both alleles contributing to a phenotype, the dominant allele will be expressed exclusively. The recessive allele will remain “latent” but will be transmitted to offspring by the same manner in which the dominant allele is transmitted. The recessive trait will only be expressed by offspring that have two copies of this allele (Figure 12.15), and these offspring will breed true when self-crossed.

Since Mendel’s experiments with pea plants, other researchers have found that the law of dominance does not always hold true. Instead, several different patterns of inheritance have been found to exist.

Figure 12.15 The child in the photo expresses albinism, a recessive trait.

Equal Segregation of Alleles

Observing that true-breeding pea plants with contrasting traits gave rise to F1 generations that all expressed the dominant trait and F2generations that expressed the dominant and recessive traits in a 3:1 ratio, Mendel proposed the law of segregation . This law states that paired unit factors (genes) must segregate equally into gametes such that offspring have an equal likelihood of inheriting either factor. For the F2 generation of a monohybrid cross, the following three possible combinations of genotypes could result: homozygous dominant, heterozygous, or homozygous recessive. Because heterozygotes could arise from two different pathways (receiving one dominant and one recessive allele from either parent), and because heterozygotes and homozygous dominant individuals are phenotypically identical, the law supports Mendel’s observed 3:1 phenotypic ratio. The equal segregation of alleles is the reason we can apply the Punnett square to accurately predict the offspring of parents with known genotypes. The physical basis of Mendel’s law of segregation is the first division of meiosis, in which the homologous chromosomes with their different versions of each gene are segregated into daughter nuclei. The role of the meiotic segregation of chromosomes in sexual reproduction was not understood by the scientific community during Mendel’s lifetime.

Independent Assortment

Mendel’s law of independent assortment states that genes do not influence each other with regard to the sorting of alleles into gametes, and every possible combination of alleles for every gene is equally likely to occur. The independent assortment of genes can be illustrated by the dihybrid cross, a cross between two true-breeding parents that express different traits for two characteristics. Consider the characteristics of seed color and seed texture for two pea plants, one that has green, wrinkled seeds (yyrr) and another that has yellow, round seeds (YYRR). Because each parent is homozygous, the law of segregation indicates that the gametes for the green/wrinkled plant all are yr, and the gametes for the yellow/round plant are all YR. Therefore, the F1 generation of offspring all are YyRr (Figure 12.16).


  1. ppYY, Ppyy, ppYY, ppyy yielding white flowers with yellow peas, purple flowers with yellow peas, and white flowers with green peas. You can find this with a 3×3 Punnett square.
  2. PPYY, PpYy, ppYY, ppyy yielding purple flowers with yellow peas, white flowers with yellow peas, and white flowers with green peas. You can find this with a 2×2 Punnett square.
  3. Ppyy, PpYy, ppYY, ppyy yielding purple flowers with green peas, purple flowers with yellow peas, white flowers with yellow peas, and white flowers with green peas. You can find this with a 3×3 Punnett square.
  4. PpYY, PpYy, ppYY, ppYy yielding purple flowers with yellow peas, and white flowers with yellow peas. You can find this with a 2×2 Punnett square.

For the F2 generation, the law of segregation requires that each gamete receive either an R allele or an r allele along with either a Yallele or a y allele. The law of independent assortment states that a gamete into which an r allele sorted would be equally likely to contain either a Y allele or a y allele. Thus, there are four equally likely gametes that can be formed when the YyRr heterozygote is self-crossed, as follows: YR, Yr, yR, and yr. Arranging these gametes along the top and left of a 4 × 4 Punnett square (Figure 12.16) gives us 16 equally likely genotypic combinations. From these genotypes, we infer a phenotypic ratio of 9 round/yellow:3 round/green:3 wrinkled/yellow:1 wrinkled/green (Figure 12.16). These are the offspring ratios we would expect, assuming we performed the crosses with a large enough sample size.

Because of independent assortment and dominance, the 9:3:3:1 dihybrid phenotypic ratio can be collapsed into two 3:1 ratios, characteristic of any monohybrid cross that follows a dominant and recessive pattern. Ignoring seed color and considering only seed texture in the above dihybrid cross, we would expect that three quarters of the F2 generation offspring would be round, and one quarter would be wrinkled. Similarly, isolating only seed color, we would assume that three quarters of the F2 offspring would be yellow and one quarter would be green. The sorting of alleles for texture and color are independent events, so we can apply the product rule. Therefore, the proportion of round and yellow F2 offspring is expected to be (3/4) × (3/4) = 9/16, and the proportion of wrinkled and green offspring is expected to be (1/4) × (1/4) = 1/16. These proportions are identical to those obtained using a Punnett square. Round, green and wrinkled, yellow offspring can also be calculated using the product rule, as each of these genotypes includes one dominant and one recessive phenotype. Therefore, the proportion of each is calculated as (3/4) × (1/4) = 3/16.

The law of independent assortment also indicates that a cross between yellow, wrinkled (YYrr) and green, round (yyRR) parents would yield the same F1 and F2 offspring as in the YYRR x yyrr cross.

The physical basis for the law of independent assortment also lies in meiosis I, in which the different homologous pairs line up in random orientations. Each gamete can contain any combination of paternal and maternal chromosomes (and therefore the genes on them) because the orientation of tetrads on the metaphase plane is random.

Forked-Line Method

When more than two genes are being considered, the Punnett-square method becomes unwieldy. For instance, examining a cross involving four genes would require a 16 × 16 grid containing 256 boxes. It would be extremely cumbersome to manually enter each genotype. For more complex crosses, the forked-line and probability methods are preferred.

To prepare a forked-line diagram for a cross between F1 heterozygotes resulting from a cross between AABBCC and aabbcc parents, we first create rows equal to the number of genes being considered, and then segregate the alleles in each row on forked lines according to the probabilities for individual monohybrid crosses (Figure 12.17). We then multiply the values along each forked path to obtain the F2 offspring probabilities. Note that this process is a diagrammatic version of the product rule. The values along each forked pathway can be multiplied because each gene assorts independently. For a trihybrid cross, the F2 phenotypic ratio is 27:9:9:9:3:3:3:1.

Probability Method

While the forked-line method is a diagrammatic approach to keeping track of probabilities in a cross, the probability method gives the proportions of offspring expected to exhibit each phenotype (or genotype) without the added visual assistance. Both methods make use of the product rule and consider the alleles for each gene separately. Earlier, we examined the phenotypic proportions for a trihybrid cross using the forked-line method now we will use the probability method to examine the genotypic proportions for a cross with even more genes.

For a trihybrid cross, writing out the forked-line method is tedious, albeit not as tedious as using the Punnett-square method. To fully demonstrate the power of the probability method, however, we can consider specific genetic calculations. For instance, for a tetrahybrid cross between individuals that are heterozygotes for all four genes, and in which all four genes are sorting independently and in a dominant and recessive pattern, what proportion of the offspring will be expected to be homozygous recessive for all four alleles? Rather than writing out every possible genotype, we can use the probability method. We know that for each gene, the fraction of homozygous recessive offspring will be 1/4. Therefore, multiplying this fraction for each of the four genes, (1/4) × (1/4) × (1/4) × (1/4), we determine that 1/256 of the offspring will be quadruply homozygous recessive.

For the same tetrahybrid cross, what is the expected proportion of offspring that have the dominant phenotype at all four loci? We can answer this question using phenotypic proportions, but let’s do it the hard way—using genotypic proportions. The question asks for the proportion of offspring that are 1) homozygous dominant at A or heterozygous at A, and 2) homozygous at B or heterozygous at B, and so on. Noting the “or” and “and” in each circumstance makes clear where to apply the sum and product rules. The probability of a homozygous dominant at A is 1/4 and the probability of a heterozygote at A is 1/2. The probability of the homozygote or the heterozygote is 1/4 + 1/2 = 3/4 using the sum rule. The same probability can be obtained in the same way for each of the other genes, so that the probability of a dominant phenotype at A and B and C and D is, using the product rule, equal to 3/4 × 3/4 × 3/4 × 3/4, or 27/64. If you are ever unsure about how to combine probabilities, returning to the forked-line method should make it clear.

Rules for Multihybrid Fertilization

Predicting the genotypes and phenotypes of offspring from given crosses is the best way to test your knowledge of Mendelian genetics. Given a multihybrid cross that obeys independent assortment and follows a dominant and recessive pattern, several generalized rules exist you can use these rules to check your results as you work through genetics calculations (Table 12.5). To apply these rules, first you must determine n, the number of heterozygous gene pairs (the number of genes segregating two alleles each). For example, a cross between AaBb and AaBb heterozygotes has an n of 2. In contrast, a cross between AABb and AABb has an n of 1 because A is not heterozygous.

General Rules for Multihybrid Crosses
General RuleNumber of Heterozygous Gene Pairs
Number of different F1 gametes 2 n
Number of different F2 genotypes 3 n
Given dominant and recessive inheritance, the number of different F2 phenotypes 2 n

Linked Genes Violate the Law of Independent Assortment

Although all of Mendel’s pea characteristics behaved according to the law of independent assortment, we now know that some allele combinations are not inherited independently of each other. Genes that are located on separate non-homologous chromosomes will always sort independently. However, each chromosome contains hundreds or thousands of genes, organized linearly on chromosomes like beads on a string. The segregation of alleles into gametes can be influenced by linkage , in which genes that are located physically close to each other on the same chromosome are more likely to be inherited as a pair. However, because of the process of recombination, or “crossover,” it is possible for two genes on the same chromosome to behave independently, or as if they are not linked. To understand this, let’s consider the biological basis of gene linkage and recombination.

Homologous chromosomes possess the same genes in the same linear order. The alleles may differ on homologous chromosome pairs, but the genes to which they correspond do not. In preparation for the first division of meiosis, homologous chromosomes replicate and synapse. Like genes on the homologs align with each other. At this stage, segments of homologous chromosomes exchange linear segments of genetic material (Figure 12.18). This process is called recombination, or crossover, and it is a common genetic process. Because the genes are aligned during recombination, the gene order is not altered. Instead, the result of recombination is that maternal and paternal alleles are combined onto the same chromosome. Across a given chromosome, several recombination events may occur, causing extensive shuffling of alleles.

When two genes are located in close proximity on the same chromosome, they are considered linked, and their alleles tend to be transmitted through meiosis together. To exemplify this, imagine a dihybrid cross involving flower color and plant height in which the genes are next to each other on the chromosome. If one homologous chromosome has alleles for tall plants and red flowers, and the other chromosome has genes for short plants and yellow flowers, then when the gametes are formed, the tall and red alleles will go together into a gamete and the short and yellow alleles will go into other gametes. These are called the parental genotypes because they have been inherited intact from the parents of the individual producing gametes. But unlike if the genes were on different chromosomes, there will be no gametes with tall and yellow alleles and no gametes with short and red alleles. If you create the Punnett square with these gametes, you will see that the classical Mendelian prediction of a 9:3:3:1 outcome of a dihybrid cross would not apply. As the distance between two genes increases, the probability of one or more crossovers between them increases, and the genes behave more like they are on separate chromosomes. Geneticists have used the proportion of recombinant gametes (the ones not like the parents) as a measure of how far apart genes are on a chromosome. Using this information, they have constructed elaborate maps of genes on chromosomes for well-studied organisms, including humans.

Mendel’s seminal publication makes no mention of linkage, and many researchers have questioned whether he encountered linkage but chose not to publish those crosses out of concern that they would invalidate his independent assortment postulate. The garden pea has seven chromosomes, and some have suggested that his choice of seven characteristics was not a coincidence. However, even if the genes he examined were not located on separate chromosomes, it is possible that he simply did not observe linkage because of the extensive shuffling effects of recombination.


Testing the Hypothesis of Independent Assortment

To better appreciate the amount of labor and ingenuity that went into Mendel’s experiments, proceed through one of Mendel’s dihybrid crosses.

Question: What will be the offspring of a dihybrid cross?

Background: Consider that pea plants mature in one growing season, and you have access to a large garden in which you can cultivate thousands of pea plants. There are several true-breeding plants with the following pairs of traits: tall plants with inflated pods, and dwarf plants with constricted pods. Before the plants have matured, you remove the pollen-producing organs from the tall/inflated plants in your crosses to prevent self-fertilization. Upon plant maturation, the plants are manually crossed by transferring pollen from the dwarf/constricted plants to the stigmata of the tall/inflated plants.

Hypothesis: Both trait pairs will sort independently according to Mendelian laws. When the true-breeding parents are crossed, all of the F1 offspring are tall and have inflated pods, which indicates that the tall and inflated traits are dominant over the dwarf and constricted traits, respectively. A self-cross of the F1 heterozygotes results in 2,000 F2 progeny.

Test the hypothesis: Because each trait pair sorts independently, the ratios of tall:dwarf and inflated:constricted are each expected to be 3:1. The tall/dwarf trait pair is called T/t, and the inflated/constricted trait pair is designated I/i. Each member of the F1 generation therefore has a genotype of TtIi. Construct a grid analogous to Figure 12.16, in which you cross two TtIiindividuals. Each individual can donate four combinations of two traits: TI, Ti, tI, or ti, meaning that there are 16 possibilities of offspring genotypes. Because the T and I alleles are dominant, any individual having one or two of those alleles will express the tall or inflated phenotypes, respectively, regardless if they also have a t or i allele. Only individuals that are tt or ii will express the dwarf and constricted alleles, respectively. As shown in Figure 12.19, you predict that you will observe the following offspring proportions: tall/inflated:tall/constricted:dwarf/inflated:dwarf/constricted in a 9:3:3:1 ratio. Notice from the grid that when considering the tall/dwarf and inflated/constricted trait pairs in isolation, they are each inherited in 3:1 ratios.

Test the hypothesis: You cross the dwarf and tall plants and then self-cross the offspring. For best results, this is repeated with hundreds or even thousands of pea plants. What special precautions should be taken in the crosses and in growing the plants?

Analyze your data: You observe the following plant phenotypes in the F2 generation: 2706 tall/inflated, 930 tall/constricted, 888 dwarf/inflated, and 300 dwarf/constricted. Reduce these findings to a ratio and determine if they are consistent with Mendelian laws.

Form a conclusion: Were the results close to the expected 9:3:3:1 phenotypic ratio? Do the results support the prediction? What might be observed if far fewer plants were used, given that alleles segregate randomly into gametes? Try to imagine growing that many pea plants, and consider the potential for experimental error. For instance, what would happen if it was extremely windy one day?



In the shepherd’s-purse plant (Capsella bursa-pastoris), seed shape is controlled by two genes, A and B. When both the A and B loci are homozygous recessive (aabb), the seeds are ovoid. However, if the dominant allele for either or both of these genes is present, the seeds are triangular. Based on this information, what are the expected phenotypic ratios for a cross between plants that are heterozygous for both traits?

What is the expected ratio of phenotypes from a dihybrid cross? How do you explain the difference between the expected dihybrid cross ratio and ratio observed in the shepherd’s-purse plant?


Mendel’s studies in pea plants implied that the sum of an individual’s phenotype was controlled by genes (or as he called them, unit factors), such that every characteristic was distinctly and completely controlled by a single gene. In fact, single observable characteristics are almost always under the influence of multiple genes (each with two or more alleles) acting in unison. For example, at least eight genes contribute to eye color in humans.


Eye color in humans is determined by multiple genes. Use the Eye Color Calculator to predict the eye color of children from parental eye color.

  1. Both parents are homozygous for the dominant trait of brown eyes.
  2. Both parents are heterozygous, having the green trait on the green-blue eye gene.
  3. Both parents are heterozygous with the recessive trait of brown eyes.
  4. Both parents are homozygous having the green trait on the green-blue eye gene.

In some cases, several genes can contribute to aspects of a common phenotype without their gene products ever directly interacting. In the case of organ development, for instance, genes may be expressed sequentially, with each gene adding to the complexity and specificity of the organ. Genes may function in complementary or synergistic fashions, such that two or more genes need to be expressed simultaneously to affect a phenotype. Genes may also oppose each other, with one gene modifying the expression of another.

In epistasis , the interaction between genes is antagonistic, such that one gene masks or interferes with the expression of another. “Epistasis” is a word composed of Greek roots that mean “standing upon.” The alleles that are being masked or silenced are said to be hypostatic to the epistatic alleles that are doing the masking. Often the biochemical basis of epistasis is a gene pathway in which the expression of one gene is dependent on the function of a gene that precedes or follows it in the pathway.

An example of epistasis is pigmentation in mice. The wild-type coat color, agouti (AA), is dominant to solid-colored fur (aa). However, a separate gene (C) is necessary for pigment production. A mouse with a recessive c allele at this locus is unable to produce pigment and is albino regardless of the allele present at locus A (Figure 12.20). Therefore, the genotypes AAcc, Aacc, and aacc all produce the same albino phenotype. A cross between heterozygotes for both genes (AaCc x AaCc) would generate offspring with a phenotypic ratio of 9 agouti:3 solid color:4 albino (Figure 12.20). In this case, the C gene is epistatic to the A gene.

Epistasis can also occur when a dominant allele masks expression at a separate gene. Fruit color in summer squash is expressed in this way. Homozygous recessive expression of the W gene (ww) coupled with homozygous dominant or heterozygous expression of the Y gene (YY or Yy) generates yellow fruit, and the wwyy genotype produces green fruit. However, if a dominant copy of the W gene is present in the homozygous or heterozygous form, the summer squash will produce white fruit regardless of the Y alleles. A cross between white heterozygotes for both genes (WwYy × WwYy) would produce offspring with a phenotypic ratio of 12 white:3 yellow:1 green.

Finally, epistasis can be reciprocal such that either gene, when present in the dominant (or recessive) form, expresses the same phenotype. In the shepherd’s purse plant (Capsella bursa-pastoris), the characteristic of seed shape is controlled by two genes in a dominant epistatic relationship. When the genes A and B are both homozygous recessive (aabb), the seeds are ovoid. If the dominant allele for either of these genes is present, the result is triangular seeds. That is, every possible genotype other than aabb results in triangular seeds, and a cross between heterozygotes for both genes (AaBb x AaBb) would yield offspring with a phenotypic ratio of 15 triangular:1 ovoid.

As you work through genetics problems, keep in mind that any single characteristic that results in a phenotypic ratio that totals 16 is typical of a two-gene interaction. Recall the phenotypic inheritance pattern for Mendel’s dihybrid cross, which considered two non-interacting genes—9:3:3:1. Similarly, we would expect interacting gene pairs to also exhibit ratios expressed as 16 parts. Note that we are assuming the interacting genes are not linked they are still assorting independently into gametes.


For an excellent review of Mendel’s experiments and to perform your own crosses and identify patterns of inheritance, visit the Mendel’s Peas web lab.


We are discovering huge amounts of variation in 45S rRNA genes on every level. At the gross level of total copy number, variation in 45S rRNA gene copy number is largely responsible for an over 10% variation in genome size among A. thaliana accessions [35] and the relative size of the two rDNA clusters varies greatly among accessions [36]. At the sequence level, there is variation in the conserved catalytic subunits themselves both within and among accessions. Furthermore, these rRNA gene variants readily express and make for a heterogenous rRNA pool in the cell, the functional significance of which is completely unknown. Ribosome heterogeneity has been studied mainly in the context of the regulation of the ribosomal proteins, the diversity and activity of other ribosome-associated factors and, although not fully understood, the modifications that the rRNA subunits suffer after transcription [56, 57]. In eukaryotes, there have been few attempts to study heterogeneity at the sequence level of the rRNA subunits: in the parasite Plasmodium two structurally distinct 18S rRNAs are differentially expressed during its life cycle [58, 59] in humans, the 28S rRNAs have been shown to be heterogeneous in both mono- and polysomal fractions [26, 60] similarly, in both the sea urchin Paracentrotus lividus [61] and A. thaliana several transcribed 5S rRNA variants are readily incorporated in functional ribosomes [62]. Our study provides the most comprehensive catalogue of rRNA gene variants to date, and will hopefully be useful for investigating their possible adaptive role, either at the level of transcriptional regulation, rRNA stability or translational efficiency.

Irrespective of their functional significance, these variants, although rarely homogenized throughout an entire rDNA cluster, can be used as markers of the expression of a particular rDNA cluster. Our findings make it clear that the silencing phenomenon known as nucleolar dominance occurs both within [32] and among natural lines of A. thaliana [51]. Furthermore, we demonstrate that dominance is a property neither of the parental strain nor of the chromosomal position (i.e. chromosome 2 versus 4), but rather of the specific “allelic” content of each rRNA gene cluster—including, perhaps, flanking DNA. Indeed, a recent study implicates centromere-proximal sequences in this regulation [63].

However, the molecular basis of the epistatic and allelic interactions among rDNA clusters remains unknown. Interestingly, rDNAs derived from different species in the genus Brassica follow a hierarchical dominance relationship [13] similar to the one among the many alleles at the self-incompatibility locus (S-locus) in Brassicaceae [64]. In Arabidopsis halleri, dominance at the S-locus is largely controlled by a set of small non-coding RNAs produced by dominant S-alleles that target a repertoire of more recessive S-alleles resulting in their epigenetic silencing [65, 66]. Since uniparental rRNA gene silencing involves short interfering RNAs (siRNAs)-directed DNA methylation (RdDM) pathway proteins in the hybrid plant Arabidopsis suecica [67, 68] and there is evidence that non-coding RNAs can act in trans to silence other rRNAs gene repeats in mice [69–71], it is tempting to speculate that a similar mechanism to the one described for the S-locus might explain how the rDNA clusters “talk” to each other. Surprisingly, in A. thaliana Col-0 RdDM pathway mutants RNA polymerase IV (nrpd1), Dicer-like-3 (dcl2/3/4), and DNA methyltransferases DRM1 and 2 (drm1/2) have, if any, a negligible effect on disrupting silencing of rDNA-2 (Additional file 5: Figure S5 [11]). In contrast, DNA maintenance methyltransferase MET1 (the ortholog of mammalian DNMT1), which is responsible for cytosine methylation in the CG context independently of siRNAs, is needed to silence rDNA-2 [11]. However, these results do not necessarily exclude the involvement of the RdDM pathway in the silencing of rDNA clusters in A. thaliana. In A. suecica, for instance, re-establishment of nucleolar dominance was observed in T2 progeny of one of the three RNA-dependent RNA polymerase RDR2-RNAi lines [67]. As appears to be the case for transposable elements, the establishment of silencing may well be distinct from its maintenance and many semi-redundant mechanisms may be involved [72].

Additional Activities You May Want to Incorporate in this Simulation

Analysis of Class Data

  • You can use the table shown below to collect information on the phenotype of all the baby dragons produced by the pairs of students in a class and then use these data for a class discussion of questions such as:
  • Is any phenotypic trait observed in all the babies of this mother and father? If so, what is the genetic explanation for this? (To answer this last question, students may want to use Punnett squares to figure out possible genotypes and phenotypes of the baby dragons.)
  • Are any two of the baby dragons produced by these dragon parents phenotypically identical? (For the discussion of this question, you may want to calculate the large number of possible combinations of phenotypic characteristics (over 500), and you may want to relate the simulation results to the phenotypic differences between human siblings.)
  • Are male baby dragons more likely to lack horns, as predicted (see question 6 in the Student Handout)?

Comparison to Human Inheritance

You may want to have students identify examples of traits in humans that have the same pattern of inheritance as specific traits in this simulation.


Coordinated Polygenic Epistasis.

Throughout the paper, we assume a polygenic pairwise epistasis model: y i = ∑ j = 1 M G ij β j + ∑ j ≤ j' G ij G ij ' Ω jj' + ε i , [1] where y i is the phenotype for individual i, and G ij is the genotype for individual i at marker j, and ε is the residual error and assumed to be independent and identically distributed (i.i.d.) Gaussian. The vector β contains the marginal polygenic effects. Ω jj' is the pairwise interaction effect of SNPs j not equal to j', so Ω is the matrix of all pairwise SNP epistasis effects in the genome.

The standard additive model assumes no epistasis, that is, Ω = 0 . In this model, SNP j always has the same effect β j on the phenotype, regardless of genetic background or environment. In the polygenic setting, where there are many more SNPs than individuals (M > N), total heritability can still be reliably estimated by the random-effect model Genome-based restricted maximum likelihood (GREML), which models β j as i.i.d. Gaussian (40).

Epistatic models go further by allowing nonzero Ω. To date, epistatic tests have focused either on candidate SNPs or genome-wide screens for SNP pairs, which reduce M < N and facilitate simple fixed-effect models (21). More recently, random-effect models akin to GREML have become popular for estimating the total size of Ω (i.e., the heritability from pairwise epistasis) (25 ⇓ ⇓ –28). Another recent approach tests for interaction between a single SNP and a genome-wide kinship matrix, a useful compromise that provides SNP-level resolution and also aggregates genome-wide signal (22).

While these methods are useful for characterizing the existence and impact of epistasis, all are limited by the assumption that β and Ω are independent. We are interested in an orthogonal question: when are β and Ω deeply intertwined by latent interacting pathways? Conceptually, β and Ω encode all relevant information, so the goal is to decode the presence of interacting pathways from these parameters. Concretely, we prove that these pathways exist if, and only if, the CE γ is nonzero, where γ is defined as: γ = Cov j < j ' β j β j ' , Ω jj ' . [2] Intuitively, γ<0 negatively skews the total polygenic effect relative to additivity. This is equivalent to dampening the positive marginal phenotypic effects in aggregate, that is, antagonism between positive main effects. Conversely, γ>0 positively skews the population, increasing the probability of extremely high phenotypes. Note that γ = 0 does not imply that epistasis is absent rather, it implies epistasis effects do not necessarily systematically align with main effects, as has been implicitly assumed by prior polygenic models of epistasis (25 ⇓ ⇓ –28).

As a stylistic example, imagine there are two genetically independent pathways that are each sufficient for T2D, one based on body mass index (BMI) and one based on pancreas function. Then γ<0 is expected, because high-BMI cases are not likely to also have high pancreas risk, which is rare. On the other hand, if a disease requires high risk across distinct systems, then γ>0 is expected. As another stylistic example, if asthma requires both immune components and tobacco exposures, then the impact of a smoking-risk SNP on asthma is greater in the presence of immune risk factors.

We provide a more rigorous exploration of coordination in SI Appendix. In particular, we prove that several biologically plausible epistasis models induce CE, including a trans genetic regulation model (32) (SI Appendix, section 4.1), a polygenic generalization of molecular buffers (16) (SI Appendix, section 4.2), and gene–environment interaction with heritable environment (41) (SI Appendix, section 4.3 and Fig. S1). We also show that within-pathway coordination contributes to CE and can cause γ ≠ 0 (SI Appendix, Example 3 and Fig. S2). We also study CE as a function of the number of latent pathways in particular, γ = 0 results in the mathematically natural many-pathway limit, hence CE typically indicates the existence of a parsimonious set of interacting pathways (SI Appendix, Example 4).

The EO Estimator for CE.

We have defined our target, the CE γ, as a function of the true genetic effects β and Ω . However, these parameters are not known even worse, they are high dimensional and cannot be accurately estimated. Nonetheless, we develop a simple and powerful method to test for nonzero γ.

The key idea in the EO test is that randomly defined pathways can act as proxies for true pathways. These random pathways will have a significant interaction if, and only if, there truly are latent pathways that interact. Intuitively, this randomized approach has power to detect true signals because many interacting SNP pairs are appropriately assigned under the random partition—the probability of incorrectly assigning all interacting pairs is negligible. Furthermore, the EO test is calibrated because incorrectly assigned SNP pairs do not interact and hence do not cause bias. In other words, correctly assigned SNP pairs will exist and will suffice to drive interactions between the even and odd proxy pathways. Although the incorrectly assigned SNPs will cause power loss and downward bias for | γ | , this bias can be corrected post hoc if we assume an infinitesimal epistatic model (SI Appendix, Proposition 1). In this case, exactly half of all interacting SNP pairs will be captured by the random SNP partition, which allows us to perfectly estimate the aggregate contribution of SNPs whose pairwise interactions are not captured by the EO partition. Mathematically, these facts are related to other randomized approaches for high-dimensional estimation, including GREML, randomized linear algebra, and compressed sensing (42 ⇓ ⇓ ⇓ –46).

We propose estimating γ by regressing on the interaction between PRS built specifically from even-indexed and odd-indexed chromosomes ( PRS e and PRS o , respectively): y ∼ α o PRS o + α e PRS e + γ e o PRS o *PRS e . [3] The ordinary least squares estimate γ ^ e o is the EO estimator of the coordination γ. We prove that γ e o = 0 if, and only if, γ = 0 assuming the model in Eq. 1 and that there are many causal SNPs (SI Appendix, section 3). Therefore, we can simply use a regression test for γ ^ e o to test for the existence of CE. Furthermore, we prove that the EO estimate is unbiased and consistent when causal SNPs are in linkage equilibrium (SI Appendix, Proposition 1). Full details, assumptions, and guarantees of the EO test are provided in SI Appendix.

Our choice of even and odd chromosomes is clearly arbitrary. Crucially, we prove that any partition of chromosomes into two groups without linkage disequilibrium (LD) gives identical results under a perfectly infinitesimal model (SI Appendix, section 3.2). Therefore, for concreteness, simplicity, and to emphasize the arbitrariness and flexibility of the EO test, we have chosen to define it by the case where one PRS is built specifically from SNPs on even chromosomes and the other is built specifically from SNPs on odd chromosomes.

With finite genomes, however, results may depend heavily on the precise choice of chromosomes used to build PRS o and PRS e . Nonetheless, it is important that SNPs are independent across splits to avoid false positives from nonlinear single-variant effects, that is, dominant or recessive variants. Hence, we always evaluate chromosome-level genome partitions. This is analogous to our choice to exclude the Ω jj terms from CE in both cases, CE implies nonlinear effects of each individual SNP and the aggregate PRS, but the converse is not true. In practice, we use a more flexible form for the EO test, where we jointly fit and test the interactions between all (22 choose 2) chromosome-specific PRS, which simultaneously obviates LD concerns and the arbitrariness of assigning chromosomes based on their canonical indices. Nonetheless, our method does not assume the true causal pathways are located in contiguous or independent genomic regions rather, this is a condition on our chosen partition of the genome that is used in the EO test. In particular, both CE and EO remain valid when causal SNPs for each pathway are distributed across the genome randomly, and/or when SNPs contribute to multiple pathways (SI Appendix, section 3.3).

EO Test Can Distinguish CE from Population Structure and Uncoordinated Epistasis.

To examine the properties of the EO test, we simulate data under three biologically plausible genetic architectures: additivity, isotropic (i.e., uncoordinated) epistasis, and CE. For each architecture, we also test the impact of population structure, assortative mating, and adjustment with principal components (see Materials and Methods). In each case, we perform a GWAS and then build ordinary PRS as well as PRS based only on variants from odd chromosomes and even chromosomes ( PRS o and PRS e , respectively). We then test PRS in a second dataset.

We first tested the correlation between PRS o and PRS e , called θ e o , which is related to an existing method of assortative mating (47). We found that the test for θ e o ≠ 0 reliably indicated the presence of assortative mating or uncorrected population structure. Note that after adjusting for PCs, the test for θ e o ≠ 0 was roughly null in the presence of population structure (Table 1).

Polygenic simulations under additivity, isotropic epistasis, or CE assuming random mating, population structure, or assortative mating

However, our main focus is on the EO test for γ e o ≠ 0 . We found that the EO test was calibrated under both the additive and isotropic interaction models, as expected, with false-positive rates near 0.05 at a nominal P < 0.05 threshold. In particular, this is a clear demonstration of a setting where significant pairwise epistasis exists and yet CE does not exist, thereby drawing a clear distinction between CE and the more general concept of epistasis. Further, the EO test has power of roughly 95% under the CE model (Table 1).

We did not observe substantial false positives in the EO test under assortative mating. However, we did observe high false-positive rates for the EO test under uncorrected population structure (0.95), although after correcting for PCs, the test is calibrated with an empirical false-positive rate near 0.05 (Table 1). In practice, we recommend adjusting for population-structure proxies when running the EO test, as is standard in human genetics.

We then sought to more comprehensively profile the behavior of the EO test across a range of simulation parameters. First, we varied the sample size, n (SI Appendix, Fig. S3A). As expected, power at nominal P < 0.05 grew with sample size, from roughly 5% at n = 1,000, up to roughly 95% at n = 10,000 (our baseline), and reaching nearly 100% at n = 20,000. Next, we varied the additive heritability, which affects the EO test power because it governs the accuracy of the PRS estimates (SI Appendix, Fig. S3B). As expected, power increased with heritability, growing from 35% at h 2 = 0.1 to nearly 100% at h 2 = 0.7 (our baseline is h 2 = 0.5). Finally, we varied the strength of CE parameter, γ. As expected, power grew with | γ | , reaching 100% when we doubled the parameter relative to our baseline (SI Appendix, Fig. S3C). Furthermore, when γ = 0 , we recovered calibrated tests with roughly 5% false-positive rate. Interestingly, our simulations had similar power regardless the sign of γ, as expected because the sign of the phenotype is arbitrary in these simulations.

An essential feature of the EO test is that it is unbiased even when the even/odd partition of SNPs is chosen randomly with respect to partition of SNPs into the causal interacting pathways (under conditions stated above). Nonetheless, the power to detect CE increases when the even/odd partition more closely aligns with the causal partition we demonstrate this in our simulations by directly matching the two partitions (see Materials and Methods). As expected, the EO test power increases from 95 to 100% when we use this more precise SNP partition (Table 1).

Our above simulations used minor allele frequencies (MAFs) drawn i.i.d. from a uniform (0.01,0.5) distribution. To assess a more realistic MAF spectrum, we repeated our baseline simulations using MAFs drawn randomly from the MAF spectrum we observed in UKBB, excluding rare variants with MAF < 0.01 for simplicity. Nonetheless, the simulation results were nearly identical (SI Appendix, Table S1).

Finally, we assessed the EO test under third-order epistasis (SI Appendix, Table S2). First, when interacting SNP triples were distributed uniformly at random across the genome, as in the isotropic model of pairwise epistasis, the EO test had positive rate near 0.05. This is appropriately calibrated behavior in the informal sense that there is no relationship between main effects and epistasis effects and in the formal sense of the definition of CE (Eq. 1, above). Second, we simulated a three-pathway model where the product of all three pathways, but not any two, had an effect on the phenotype. In this setting, the main and interaction effects are related, but the formal CE definition is still 0. In the simulation, we observe power near 8% at P < 0.05 significance. This is expected in our simulations that use finite genomes for any simulated dataset, the pathways will have some nonzero mean that induces pairwise interaction, resulting in an average γ estimate that is zero but overdispersed (this is related to similar issues in GREML with finite numbers of causal SNPs) (48). Overall, the EO test is calibrated under isotropic third-order epistasis and has some signal under higher-order coordinated epistasis.

Testing for CE with the EO Test in the UKBB.

We next tested for CE with the EO test in the UKBB (49). We studied 21 quantitative and 5 binary traits (SI Appendix, Table S3) chosen to represent a range of trait classes including anthropometric, disease, and blood traits. We specifically analyzed the subjects classified as “White British” and filtered out related individuals to minimize population structure bias while retaining large sample sizes (max n = 342,816). We calculated PRS for each chromosome for each sample using a standard thresholding approach (50) (see Materials and Methods). The total PRS is then the sum of the individual chromosomes’ PRS. We evaluated a range of PRS P value thresholds and corrected for this source of multiple testing using a Benjamini–Hochberg false discovery rate (FDR) threshold of 10%. We used 10-fold cross-validation across individuals to minimize bias from in-sample overfitting.

Unlike in perfectly infinitesimal models, in real data, the EO test depends on the specific partition of chromosomes used to estimate γ. To minimize bias from choosing a single split, for example, even versus odd, in practice we jointly test interactions across all distinct pairs of chromosome-specific PRS with an F-test (see Materials and Methods). Following common practice (51 ⇓ ⇓ ⇓ ⇓ –56), we report the significant results per phenotype.

We discovered 18 traits with significant CE (Table 2), including the complex diseases asthma, cardiovascular disease (CVD), eczema, and T2D. We also detected CE for the complex quantitative traits basal metabolic rate, bone mineral density (BMD), lung function (forced expiratory volume [FEV]/ forced vital capacity [FVC]), and height, as well as nine blood traits: glucose, low density lipoprotein (LDL), platelet distribution width, mean platelet volume (MPV), red blood cell distribution width, sphered cell volume, triglycerides (TG), and counts of monocytes, lymphocytes, and platelets (PLAT). Taken together, these results show that CE contributes to the genetic architecture of multiple complex traits.

EO test results for 18 traits with significant CE in the UKBB

To assess the potential for impact by population structure and assortative mating, we calculate the correlation between each pair of chromosome-specific PRS ( θ e o ) . In simulations, we found that θ e o > 0 indicates the presence of structured genotypes, which may be due to assortative mating and/or uncorrected population stratification. Although correction for PCs was sufficient for calibrated inference in simulations (Table 1), we nonetheless studied the relationship between θ e o and γ e o as a metric for possible confounding of the EO test by stratification. This caution is motivated by the complexities of population structure in massive datasets like the UKBB (57 ⇓ ⇓ –60). However, we found no evidence that population stratification drove our CE inference larger values of θ did not correlate with γ values or their corresponding P or log(P) values (SI Appendix, Fig. S4). Overall, even with these conservative extra tests we did not identify evidence that our EO test for CE was substantially confounded by uncorrected structure or assortative mating.

As a final assessment, we repeated our EO test for 100 random permutations of the PRS among samples. The resulting EO-test FDR was null, as expected (SI Appendix, Fig. S5). Furthermore, we used these permutations to construct empirical P values for the minimum EO test across all PRS thresholds as an alternative to FDR correction. This gives qualitatively similar results to our primary FDR-based analysis, although computational costs bound the minimum attainable empirical P values (SI Appendix, Fig. S6). These analyses do not rule out the possibility of subtle confounding, but they do provide further support for the calibration of the EO test in practice.

Replicating EO Test Results with External PRS.

Although we have cross-validated our PRS, we wished to additionally check that our results are not specific to a single population by overfitting dataset-specific confounders (57, 58, 60). We investigated this by constructing PRS using external summary statistics to remove the potential impact of cohort-specific artifacts. We selected eight traits that have large external GWAS summary statistics: asthma, T2D, CVD, height, BMI, TG, educational attainment, and LDL additionally, we constructed PRS for nine other blood traits that were only available for a specific P-value threshold (see Materials and Methods and SI Appendix, Table S3). We tested replication for the 10 traits in this list with internally significant CE.

We applied the EO test as described above and found that CE replicated for LDL, PLAT, and MPV (Table 2). This validates CE in the sense that our results are not specific to the main effect–size estimates derived from the UKBB dataset. However, significant CE did not replicate for several traits. Broadly, the CE P values were less significant using external PRS, likely due to a combination of winner’s curse and the fact that external studies generally analyze datasets smaller than UKBB and do not perfectly match UKBB in terms of environment and genetic background.

To further strengthen confidence in CE, we performed an alternate replication analysis by directly comparing the internal and external PRS-interaction effect estimates across all pairs of chromosomes (Fig. 2). This test identified highly significant CE replication for 7/10 tested traits. This test is more powerful than simply applying the EO test to the external PRS because it tests a narrower alternative hypothesis. Specifically, the EO test asks whether any chromosome pairs interact, whereas the direct replication test asks whether the external interaction effect estimates correlate with their internal counterparts. Conceptually, this is related to one- versus two-sided tests, as the former add power when prior knowledge of the estimand’s sign is available.

CE replication in UKBB. For each of the 10 phenotypes with internally significant CE and with available external PRS, we plot internal interaction estimates (x-axis) against external estimates (y-axis) for each pair of chromosomes. Chromosomes that do not contribute to a particular PRS are excluded. We use the PRS P-value threshold for each PRS to minimize its CE P value this does not cause inflation because we are not testing the size of these interaction estimates but rather their correlation. This indirect replication test is Bonferroni significant for 7/10 traits when correcting for the number of tested traits. Effect sizes are displayed after each chromosome-specific PRS is centered and scaled, and the P values and red lines correspond to regressions with intercepts constrained to 0.

In addition to strengthening confidence in the existence of CE, these replication analyses also suggest that CE can be reliably tested using internally constructed PRS in the future. This can be essential for applications to under-studied traits, populations, or environmental contexts (61 ⇓ ⇓ ⇓ –65).

Tissue-Specific Coordination in UKBB.

Having demonstrated the existence of CE for several traits, we now consider the possibility that interacting pathways are enriched in trait-relevant tissues. Specifically, we test for tissue-specific enrichment of CE across the above 26 traits and 13 tissue-specific genomic annotations: 7 based on specifically expressed genes (adipocytes, blood cells, brain, hippocampus, liver, muscles, and pancreas) (66, 67) and 6 based on tissue-specific chromatin-marker patterns (adipose, brain, hippocampus, liver, pancreas, and skeletal muscle) (68, 69) as used in ref. 51.

For each tissue–trait pair, the tissue-specific EO test asks whether the coordination mediated through a specific tissue exceeds the genome-wide average. This test is conducted by first creating a tissue-specific PRS for each chromosome ( TPRS i ) and testing it for interaction with the standard PRS on other chromosomes ( PRS j , i ≠ j ). To focus our test on truly tissue-specific effects, rather than global effects that happen to be tagged by tissue-specific regions, we adjust for the ordinary chromosome-level PRS interactions ( PRS i ∗ PRS j ) as well as the main effects of each chromosome’s PRS and TPRS. We use a P = 0.05 threshold with a conservative Bonferroni correction to account for the multiple tested tissues. We only evaluate phenotypes with significant ordinary CE (Table 2).

We found 53 instances of tissue-specific CE enrichment using internal PRS (Table 3 and SI Appendix, Table S4). This includes nine tissues enriched for LDL CE, several of which are essentially positive controls: the liver is a key regulator of LDL metabolism and adipocytes and muscles are primary sinks for the triglycerides carried by LDL (70). Further confirming these results, adipocyte CE-enrichment replicates when using external LDL PRS.

Tissue-specific CE in the UKBB

Another biologically plausible set of tissue–trait pairs is for MPV, which also had the strongest ordinary CE of any trait. First, we find significant enrichment for liver, consistent with its known role as the main producer of the main regulator of PLAT production, thrombopoietin (TPO) (71). Second, we find significant CE enrichment for muscles, which also modestly produce TPO (71) furthermore, PLATs are important in healing muscles. Third, we found suggestive CE with brain tissues—while the underlying biology is less obvious, there is evidence that TPO affects brain development (72). Additionally, recent complementary studies have found evidence that brain tissue plays a role in MPV (73). All three of these tissues replicate for CE enrichment when using external PRS.

We also found tissue-specific CE for complex disease. For example, CVD has CE enrichment for brain, liver, muscle, and adipose and has a clear link to CVD several brain regions have been implicated in the genetic basis of BMI, a CVD risk factor (74, 75), and liver, muscle, and adipose all are deeply involved in energy homeostasis (76). While none of these tissues replicate, this likely reflects lack of power as even ordinary CE was not significant in either of our external replication tests.

To add further confidence in tissue-specific CE, we tested five permutations of the TPRS across samples. We observed that the resulting P values were appropriately null (SI Appendix, Fig. S7).

In general, the tissue-specific EO test can improve power over the ordinary EO test when the correct tissue is identified, despite the fact that the TPRS almost necessarily explain less trait variation than the overall PRS. This is consistent with our simulation result where the EO test power increased under the Oracle approach that partitions SNPs into the true pathways, even though this yields PRS with weaker predictive power. For example, the empirical EO test P value for sphered cell volume is 0.03 (SI Appendix, Table S3), but its P value for blood-cell CE is 0.003 however, the overall PRS explains 1.5% of trait variance while the blood-cell TPRS explains only 0.5%. Similar properties can be observed for many plausible tissue–trait pairs, for example, in the liver-specific CE for glucose and triglycerides. This is further evidence that CE is truly enriched in specific tissues for some traits and illustrates that CE captures a genetic axis that is partially distinct from additive variance.

Tissue-Pair CE in UKBB.

We next ask whether CE can be detected between pairs of tissues based on the hypothesis that pathways may be tissue specific. Rather than test the interaction between tissue-specific PRSi and global PRSj, we now test for interaction between tissue 1–specific PRSi and tissue 2–specific PRSj. For tissue-specific CE, we adjust for the same covariates as above and now additionally adjust for PRS i ∗ TPRS j for both tested TPRS and all i ≠ j . In particular, these tissue–tissue interaction tests are statistically independent of the EO tests and the tissue-specific tests in the above sections under the null hypothesis for tissue–tissue interaction.

If tissue–tissue CE exists, it will likely cause tissue-specific CE hence, to improve power, we only test tissue–trait pairs that are significant in the above TPRS test (internal or external, Table 3). This reduction in testing dimension is particularly important here because the number of tests scales by the number of evaluated tissues squared. As each phenotype now has a different number of tests, and this test is the most complex and high dimensional we consider, we use an aggressive Bonferroni correction and adjust for all tested trait-tissue-tissue triples, performed separately for internal and external PRS.

We find five Bonferroni-significant examples of tissue–tissue CE (Table 4 and SI Appendix, Table S5). Two highlight the role of muscles in PLAT, one interacting with adipocytes (internal is significant external P = 0.02), and another with hippocampus (external is significant internal P = 0.01) despite the nominal internal/external replication, the biological interpretation of these tissue pairs is unclear. Two other tissue pairs highlight brain interaction with blood cells and liver for MPV again, the role of the brain is not clear, but both blood cells and liver have strong connection to MPV biology. Finally, we find evidence for liver–pancreas CE driving FEV/FVC1, which is particularly striking as the P value for this tissue–tissue CE exceeds either P value for these tissues’ individual CE.

Tissue-Pair CE in the UKBB

Puzzling Inheritance Patterns Explained

There are many examples of epistasis. One of the first to be described in humans is the Bombay phenotype , involving the ABO blood group system. Individuals with this phenotype lack a protein called the H antigen (geno-type hh), which is used to form A and B antigens. Even though such individuals may have A or B genes, they appear to be blood group O because they lack the H antigen.

Another well-known example is coat color in mice. Two coat-color loci are involved. At locus A, color is dominant over albino (lack of pigment). At locus B, the coat color agouti is dominant over black. A mouse that is homozygous for the albino gene will show no pigment regardless of its genotype at the other locus. Thus the A and B loci are epistatic.

It is likely that the phenomenon of lack of penetrance, in which a dominant gene fails to be expressed, is often due to epistasis. There are many cases where dominant disorders, such as polydactyly (in which individuals have extra fingers or toes), appear to "skip generations." The nonexpression of the dominant gene is likely due to the alleles the individual has at an independent locus that is epistatic to the polydactyly locus. Lack of penetrance may also be accompanied by variable expressivity, where a gene is only partially expressed. As the molecular basis of these disorders becomes known, the reason for nonpenetrance will be easier to determine.

Such interactions between loci probably occur in the genetic etiology of complex traits such as the psychiatric disorders schizophrenia and manic depression. David Lykken, a genetic psychologist at the University of Minnesota, coined the term "emergenesis" to describe multiple gene interactions involved in a specific complex trait. After comparing EEG (electroencephalogram, or "brain wave") data from identical and fraternal twins, Lykken concluded that multiple-level interactions of independent or partly independent genes must be involved.

Epistatic interactions make it difficult to identify loci conferring risk for complex disorders, and they may be a major reason that researchers have made only slow progress in mapping susceptibility genes for complex disorders. To locate interacting loci involved in the genetic origins of complex diseases requires collecting DNA samples from a large number of families where two or more individuals have the disorder. Such large-scale studies are usually difficult to conduct.

Monohybrid Inheritance

There are different forms of genes called alleles. These different forms of genes are what create variations amongst living things. There are two types of allele: dominant and recessive. Let's look at an example to understand this.

One type of gene is for eye color. The allele for brown is dominant and thus represented as 'B' (capital letter). The allele for blue eyes is recessive and represented as 'b' (lower case).

The MOTHER has brown eyes and the genotype Bb The FATHER has blue eyes and the genotype bb.

If they have a baby, the baby will get an allele from each parent. You can work out the possible outcomes by representing it in a punnett square.

Two of the outcomes are Bb. This means the baby has a 50% chance of having brown eyes. Even though it has a b (blue eye) allele, it has one B allele, and as this is dominant, the baby will have this characteristic. In order to get blue eyes, the baby must have two recessive blue alleles (bb).

Materials and Methods

CeMEE derivation

The panel was derived in three stages (Figure 1). First, 16 wild isolates (AB1, CB4507, CB4858, CB4855, CB4852, CB4856, MY1, MY16, JU319, JU345, JU400, N2 (ancestral), PB306, PX174, PX179, and RC301 obtained from the Caenorhabditis Genetics Center) were inbred by selfing for 10 generations to ensure homozygosity, then intercrossed to funnel variation into a single multiparental hybrid population, as described in Teotónio et al. (2012). Each of the four funnel phases comprised multiple pairwise, reciprocal crosses at moderate population sizes (see Supplemental Material, and figure S1 and supporting information of Teotonio et al. (2012) for full details of replication and population sizes).

CeMEE derivation. The multiparental intercross funnel phase comprised four stages of pairwise crosses and progeny mixing, carried out in parallel at controlled population sizes. One hybridization cycle for a single-founder cross is inset at left: in each cycle, multiple reciprocal crosses were initiated, increasing in replicate number and census size each filial generation. and progeny were first sib-mated, then reciprocal lines were merged by intercrossing the and expanding the pooled (for three to four generations) before commencing the next reduction cycle. The resulting multiparental hybrid population was archived by freezing, and samples were thawed and maintained for 140 nonoverlapping generations of mixed selfing and outcrossing under standard laboratory conditions to generate the A140 population. Hermaphrodites were then sampled from the A140 and selfed to generated the A140 RILs. Additionally, the outbred A140 population was evolved for a further 50 generations under the same conditions (CA) or under adaptation to a salt gradient with varying sex ratios (GT, GM, and GA lines Theologidis et al. 2014). See Materials and Methods for description of subpanels, and Teotónio et al. (2012) for details of replicate numbers and population sizes for each funnel generation. CA, control adapted lines CeMEE, C. elegans multiparental experimental evolution panel GA, gradual adaptation androdioecious GM, gradual adaptation monoecious GT, gradual adaptation trioecious RILs, recombinant inbred lines.

Second, the multiparental hybrid population was evolved for 140 discrete generations at population sizes of (outcrossing rate ), to obtain the A140 population [as reported in Teotónio et al. (2012), Chelo and Teotónio (2013), and Chelo et al. (2013)]. Sex-determination mutations were then mass introgressed into the A140, while maintaining genetic diversity, to generate monoecious (obligately selfing hermaphrodites) and trioecious (partial selfing with males, females, and hermaphrodites) populations, as detailed in Theologidis et al. (2014). Further replicated experimental evolution was carried out for 50 generations under two environmental regimes: (1) a control regime (conditions as before), with the wild-type androdioecious reproductive system (CA50 collectively) and (2) a gradual exposure to an increasing gradient of NaCl, from 25 mM (standard NGM-lite medium United States Biological) to 305 mM until generation 35 and thereafter, varying reproductive system (GX50, where X is androdioecious, monoecious, or trioecious). Although trioecious populations started evolution with only of hermaphrodites, by generation 50 they were abundant [50% see figure S7 in Theologidis et al. (2014)]. Androdioecious populations maintained outcrossing rates of > 0.4 until generation 35, soon after losing males to finish with an outcrossing rate of ∼0.2 by generation 50 [figure S5 in Theologidis et al. (2014)]. This complex experimental evolution scheme was designed to study the effects of reproductive system on the genetics and evolution of complex traits however, here we consider this structure only in so far as it is relevant to the mapping of quantitative traits in the panel as a whole.

Finally, hermaphrodites were inbred by selfing to obtain RILs. Population samples ( individuals) were thawed from −80° and maintained under standard laboratory conditions for two generations. At the third generation, single hermaphrodites were picked at the late-third to early-fourth larval stage (L3/L4) and placed in wells of 12-well culture plates, containing M9 medium (25 mM NaCl) seeded with Escherichia coli. Lines were propagated at 20° and 80% relative humidity (RH) by transferring a single L3/L4 individual for 16 (A140 population) or 13 generations (4–7 days between transfers). At each passage, parental plates were kept at 4° to prevent growth until offspring production was verified, and in cases of failure two additional transfers were attempted before declaring line extinction. Inbreeding was done in several blocks from 2012 to 2016, in two different locations. A total of 709 RILs were obtained and archived at −80°. Full designation of CeMEE RILs and subpanels are in File S1 and File S2.

Sequencing and genotyping

Full details of sequencing, genotype calling, and variant filtration can be found in the supplemental material. In brief, founders were sequenced to ≥ 30× depth with Illumina 50 or 100 bp paired-end reads, and variants were called against the WS245 C. elegans N2 reference genome (GATK 3.3-0 HaplotypeCaller McKenna et al. 2010). After depth, quality, zygosity, and frequency filtering, we arrived at a final set of 388,201 founder SNP markers at which to genotype RILs.

RILs were sequenced with 100 or 150 bp paired-end reads to a mean depth of 5.1×. Genotypes were imputed by Hidden Markov Model (HMM), considering the 16 founder states and mean base qualities of reads. After removing closely related lines, we retained 178 A140 RILs, 118 CA50 RILs (from three replicate populations), 127 GA50 RILs (three replicates), and 79 GT50 RILs (two replicates). The 98 GM50 RILs (two replicates) are derived from monoecious populations and are highly related on average, grouping together into a small number of “isotypes.” To prevent the introduction of strong structure, we discarded all but five below a panel-wide pairwise identity threshold for the purposes of trait mapping (taking the line with greatest sequence coverage for each isotype, grouped by mean pairwise identity among lines of all other subpanels + 5 SD). In total, the CeMEE comprises 507 RILs from five subpanels, with 352,583 of the founder markers segregating within it. Raw and filtered founder variant calls are in File S3 and File S4, and imputed RIL genotypes are in File S5.

We estimated residual heterozygosity for 25 A140 lines sequenced to > 20× coverage (single sample calls using GATK 3.3-0 HaplotypeCaller, variant filtration settings MQ < 50.0 || DP < 5 || MQRankSum < −12.5 || SOR < 6 || FS > 60.0 || ReadPosRankSum < −8.0 || QD < 10.0 || DP > mean × 3). Mean heterozygosity in these lines at founder sites is 0.095% (SD 0.042%, range 0.033–0.18%).

Genetic marker sets

Four subsets of the 352,583 founder SNPs segregating in the CeMEE panel are used for analysis (referenced in the corresponding sections), which we define here:

Subset of 248,668 markers used for GWAS with MAF > 0.05 in phenotyped lines.

Subset of 88,508 markers pruned of strong local LD (generated by LDAK in a two-pass window-based filtering on < 0.98 (see Heritability and phenotype prediction), used for analysis of interchromosomal LD and panel structure.

Subset of 4960 markers used for 2D testing, with MAF > 0.05, weak local LD (Plink –indep-pairwise, window = 200 kb, step = 10, < 0.5), missing or ambiguous imputed genotypes, and filtering across marker pairs for the presence of all four two-locus homozygote classes at a frequency of in at least one test.

Subset of 256,535 diallelic sites shared between the CeMEE and CeNDR panel with no missing or heterozygous calls.

CeMEE genetic structure

Differentiation from natural isolates and founders:

We compared similarity within and between the CeMEE RILs and 152 sequenced wild isolates from the CeNDR panel (release 20160408). The distributions for all pairwise genotype and haplotype (% identity at 0.33 cM scale in map distance) distances are plotted in Figure S1 in File S22, using marker set 4.

LD ( ) was computed for founders and CeMEE RILs at the same set of sites (marker set 2, additionally filtered to MAF > 1/16, then subsampled by a factor of 10 for computational tractability), and plotted against genetic distances [obtained by linear interpolation from the N2/CB4856 map, scaled to distances (Rockman and Kruglyak 2009)]. To assess the presence of subtle, long-range LD in the form of interchromosomal structure, we compared mean among chromosomes to a null distribution generated by permutation, where associations between chromosomes are randomized within each RIL genome and the contribution of allele frequency differentiation between subpanels is controlled. In each permutation (n = 5000), RIL genotypes (marker set 2) were randomly subsampled to equal size across chromosomes, split by chromosome, then shuffled within each subpanel, before taking the mean correlation across chromosomes as the test statistic (or omitting all single and pairwise chromosome combinations). The effect of local LD pruning is to reduce the weighting of long haplotypes in strong LD, to better assay weak interactions involving loci distributed throughout the majority of the genome. Permutation code is in File S6 (interchromLD.R).

Reconstruction of ancestral haplotypes and genetic map expansion:

For each RIL, founder haplotypes were inferred with the RABBIT HMM framework implemented in Mathematica (Zheng et al. 2015), conditioning on the recombination frequencies observed for the N2/CB4856 RILs (scaled to map length coordinates are in File S7 WS220_geneticMap.txt) (Rockman and Kruglyak 2009). Realized map expansion was estimated by maximum likelihood for each chromosome, before full marginal reconstruction (explicitly modeling recombination on the X chromosome and autosomes) using posterior decoding under the fully-dependent homolog model (depModel). Under this model, appropriate for fully inbred diploids, chromosome homologs are assumed to have identical ancestral origins (prior identity by descent probability ), and the recombination junction density (transition probability) is given by the estimated map expansion ( ) and genotyping error rates (set to for founders and for RILs based on likelihood from a parameter sweep). Sites called as heterozygous or missing in the founders, or unresolved to by the genotype imputation HMM in RILs, were set to NA (missing data) before reconstruction. To summarize performance, per marker posterior probabilities were filtered to > 0.2, and haplotype lengths and breakpoints were estimated from run lengths of marker assignments, taking the single best haplotype (if present), maintaining haplotype identity (if multiple assignments of equal probability), or the first among equals otherwise.

To test reconstruction accuracy as a function of haplotype length, we performed matched simulations varying only the number of generations of random mating (code in File S8 RABBIT_simulations.R). Starting from a single population representing all founders [N = 1000, corresponding to the expected during experimental evolution (Chelo and Teotónio 2013)], mating occurred at random with equal contribution to the next generation. Recombination between homologous chromosomes occurred at a rate of 50 cM, with full crossover interference, and the probability of meiotic crossover based on distances between marker pairs obtained by linear interpolation of genetic positions (Rockman and Kruglyak 2009). For each chromosome, 10 simulations were run sampling at 10, 25, 50, 100, 150, 200, 250, and 300 generations, and haplotype reconstruction was carried out as above. Maximum likelihood estimates of realized map expansion for simulations were used to calibrate a model for prediction of the effective number of generations in the RILs. With increasing generation number, was progressively underestimated due to unresolved small recombination events (e.g., 14% mean deviation at generation 300). Given this, we used a second-degree polynomial regression of on the known number of generations, which was significantly preferred over a linear fit by likelihood ratio test (LRT).

Population stratification:

Population stratification was assessed using principal component (PC) decomposition, and supervised and unsupervised discriminant analysis of PCs (DAPC Jombart et al. 2010). In all cases, decomposition was of genotypes pruned of strong local LD (marker set 2), mean centered, and scaled to unit variance.

Of the first 50 PCs, 10 are individually significantly associated with subpanel identity by ANOVA (linear regression of each PC on subpanel identity, tested against an intercept only model by LRT, P < 0.05 after Bonferroni correction). Seven of the top 10 PCs are significant, though others up to 38 are also associated, showing that multiple sources of structure contribute to the major axes of variation.

For DAPC [R package adegenet, Jombart (2008)], we used 100 rounds of 10-fold cross-validation to determine the number of PCs required for optimal subpanel assignment accuracy (the mean of per-group correct assignments). This value (40 PCs) was then used to infer groups by unsupervised k-means clustering (default settings of 10 starts, iterations) with the number of groups selected on the Bayesian Information Criterion (BIC).



In the experimental evolution scheme under which the CeMEE RILs were generated, a hermaphrodite’s contribution to the next generation is the number of viable embryos that survive bleaching (laid, but unhatched, or in utero) that subsequently hatch to L1 larvae 24-hr later. We treat this phenotype as fertility, and measured it for individual worms of 230 RILs. Full details are provided in File S22. In brief, we used manually-scored plate-based assays of the number of viable embryos produced by single adult hermaphrodites, with two independent plates for most RILs, which we consider as replicates for estimation of repeatability (see below). In total, the median number of measurements per line was 43 (range 4–84). Final trait values were the Box–Cox transformed line coefficients from a Poisson generalized linear model (log link) with fixed categorical effects of plate row, column, and edge (exterior rows and columns), and the count of offspring per worm as response variable (model S1 in supplemental material). Data for 227 RILs passed filtering (raw data are in File S9, model coefficients are in File S10), coming from RILs of three subpanels (170 A140, 45 GA50, and 12 GT50). Subpanel explains 4% of the variance in this trait, with GA50 RILs having higher mean fertility than the A140 (linear regression of trait values on subpanel identity, regression coefficient = 0.43, see Figure S2 in File S22).

Adult hermaphrodite body size:

The area of adult hermaphrodite worms was measured using a Multi-Worm Tracker (Swierczek et al. 2011). Data were generated in two laboratory locations over several years, recording the relative humidity and temperature at the time of assay (see supplemental material for full details). Final trait values were the Box–Cox transformed line coefficients from a linear model incorporating fixed effects of year, nested within location, and humidity and temperature, nested within location (model S3 in Supplemental Methods). Data for 410 RILs passed filtering, with two independent thaw blocks for most RILs (raw data are in File S11 and final trait values are in File S12). Data come from RILs of three subpanels (165 A140, 118 CA50, and 127 GA50), which explain 17% of variance in this trait. GA50 RILs are much larger than the A140 (regression coefficient = 0.94, see Figure S2 in File S22), and this is not driven by technical covariates: data acquisition for A140 RILs and GA50 RILs was relatively balanced with respect to location and time, and GA50 RILs are significantly larger across all five laboratory/year blocks.

Fertility and body size show significant phenotypic and genetic correlations [Figure S2 in File S22 see also Poullet et al. (2016)], justifying the latter being considered a fitness-proximal trait. For 202 lines with data for both traits, the phenotypic correlation = 0.35 (Spearman’s ρ for the final trait model coefficients used for QTL mapping, ), and genetic correlation ( / where is genetic covariance between size and fertility, was estimated by restricted/residual maximum likelihood (REML) [R package sommer, Covarrubias-Pazaran (2016)] using unweighted additive genetic similarity A (see below).

Heritability and phenotype prediction


Repeatability was estimated from ANOVA of the line replicate means for each trait as /( ), where (mean square among lines − mean square error)/ , and is a coefficient correcting for a varying number of observations (1−4 plate means) per line (Lessells and Boag 1987 Sokal and Rohlf 1995). Assuming equal variance and equal proportions of environmental and genetic variance among replicates, R represents an upper bound on broad-sense heritability (Falconer 1981 Hayes and Jenkins 1997). Fertility data were square root-transformed to decouple the mean and variance.


In inbred, isogenic lines, broad-sense heritability can also be estimated by linear mixed-effects model (LMM) from the covariance between genetic and phenotypic variances. However, the measurement of genetic similarity is subject to a number of assumptions and is (almost) always, at best, an approximation (Speed and Balding 2015).

A first assumption is that all markers are the causal alleles of phenotypic variation. However, it is unavoidable that markers tag the (unknown) causal alleles to different degrees due to variable LD. A second, usually implicit, assumption in calculating genetic similarity is the weight given to markers as a function of allele frequency. Greater weight has typically been given to rare alleles in human research, which has support under scenarios of both selection and neutrality (Pritchard 2002). A third assumption, related to the first two, is the relationship between LD and causal variation. If the relationship is positive—causal variants being enriched in regions of high LD—then heritability estimated from all markers will be upwardly biased, since the signal from causal variation contributes disproportionately to genetic similarity (Speed et al. 2012).

The use of whole-genome sequencing largely addresses the first assumption, given (as here) very high marker density and an accurate reference genome, although in the absence of full de novo genomes from long-read data for each individual, the contribution of large-scale copy number and structural variation, and new mutation, will remain unknown. To account for the second and third assumptions, we used LDAK (v5.0) to explicitly account for LD in the CeMEE (decay half-life = 200 kb, min-cor (minimum squared correlation coefficient) = 0.005, min-obs (minimum percentage of non-missing data per marker) = 0.95 Speed et al. 2012). Heritability estimates were not sensitive to variation in the decay parameter over a 10-fold range or to the measurement unit (physical or genetic), and we used physical distance. Across the set of 507 RILs, 88,508 segregating markers were used after local LD-based pruning (marker set 2) and, of these, 22,984 markers received nonzero weights (File S13). LD weighting can magnify the effects of genotyping errors. We tested the effect of excluding 17,740 markers with particularly low local LD (mean over a 20-marker window < 0.3, or the ratio of mean to that of the window mean < 0.3) before estimation of LD weights. Heritability estimates were largely unchanged (within the reported intervals), as were our general conclusions on variance components and model performance.


Given m SNPs, genetic similarity is calculated by first scaling S, the matrix of mean centered genotypes, where is the number of minor alleles carried by line i (of n) at marker j and frequency f, to give X: (1) The additive genetic similarity matrix (GSM) A is then Here, α scales the relationship between allele frequency and effect size (Speed et al. 2012), corresponds to the assumption of equal variance explained per marker (an inverse relationship of effect size and allele frequency), while common alleles are given greater weight at α > 0. We tested and report results that maximized prediction accuracy. With Y the mean centered vector of n phenotype values scaled to unit variance, the additive model fit for estimating genomic heritability ( ) is then: (2) where β represents random SNP effects capturing genetic variance and e is the residual error capturing environmental variance Given Y and A, heritability can be estimated from REML estimates of genetic and residual variance as Note that we use the terms and genomic heritability interchangeably here for convenience, although in some cases nonadditive covariances are included. We assume RILs are fully inbred.

The existence of near-discrete recombination rate domains across chromosomes has led to a characteristic structure of nucleotide variation, correlated with gene density and function (Cutter et al. 2009). Variation also varies widely among chromosomes (Rockman et al. 2010 Andersen et al. 2012). This heterogeneity is not captured by aggregate genome-wide similarity with equal marker weighting (Speed et al. 2012 Goddard et al. 2016). To better reflect observed LD, markers were first weighted by the amount of genetic variation they tag along chromosomes (Speed et al. 2012). Given m weights, genetic effects for the basic model become: (3) where W is a normalizing constant. Second, we jointly measured the variance explained by individual chromosomes (and by genetic variation in recombination rate domains within each chromosome), which can potentially improve the precision of heritability estimation if causal variants are not uniformly distributed by allowing variance to vary among partitions. Third, we tested epistatic as well as additive genetic similarity with (1) the entrywise (Hadamard) product of additive GSMs, giving the probability of allele pair sharing (Henderson 1985 Jiang and Reif 2015), (2) higher exponents up to fourth-order interactions, and (3) haplotype-based similarity at multi-gene scale. Additional similarity components (additive or otherwise) are added as random effects to the above model to obtain independent estimation of variance components (see supplemental materials for details).

Model fit was assessed by phenotype predictions from leave-one-out cross-validation, calculating the genomic best linear unbiased prediction (GBLUP) (Meuwissen et al. 2001 VanRaden 2008) for each RIL and returning the squared correlation coefficient ( ) between observed and predicted trait values (carried out in LDAK). To avoid bias associated with sample size, all models were unconstrained (nonerror variance components were allowed to vary outside 0–1 during convergence) unless otherwise noted, which generally gave better likelihood for multi-component models.

1D tests:

For single trait, single marker association, we fitted LMMs: (4) where X is the matrix of fixed effects (SNP genotype) and β is the effect on phenotypic variation that is estimated. g are the random effects describing genetic covariances (Equation 3) accounting for nonindependence among tests due to an assumed polygenic contribution to phenotype, with A the GSM from the most predictive additive fit found for each trait, and e is residual error. The above model was compared to a null model excluding genotype effects by LRT (fit using the LIMIX Python package, GWAS P-values for size and fertility are in File S14, using genetic similarities in File S15 and File S16.

To assess the mapping resolution and power of the CeMEE panel, we carried out GWAS according to the model above for simulated phenotypes. We simulated a single additive locus ( from 1 to 30%) and a background polygenic component of equal variance (scenarios of 10, 100, or 1000 loci), chosen at random from SNPs with MAF > 0.05, with genetic and environmental effect sizes drawn independently from the standard normal distribution (code is in File S17, GWAS_simulations). GWAS was carried out 1000 times for each scenario, controlling for relatedness with LD-weighted additive genetic similarity ( ). Power was estimated from a binomial generalized linear model considering all three polygenic scenarios together. Precision, the fraction of significant QTL that are true positives, was assessed after masking a 1-cM window around the simulated causal SNP. Detection intervals around QTL were defined as a drop in the logarithm of odds (LOD) score of 2 (Morton 1955), and were calculated from similarly powered markers with ≥ MAF, with P-values converted to LOD scores as (Nyholt 2000). True positives were defined as cases where the exact simulated site was detected, and false positives were 2-LOD drop QTL among all other markers detected at a 5% significance permutation threshold (see below).

All 507 lines were used for simulation, and all GWAS tested 248,668 markers with MAF > 0.05 (marker set 1 code is in File S18, Significance thresholds were established by permutation (Anderson and Ter Braak 2003), with phenotypes generated by permuting phenotype residuals given the estimated relatedness among lines to ensure exchangeability in the presence of polygenic causal effects or structure (A), using the R package mvnpermute (Abney 2015). Significance level α is the corresponding percentile of the minimum P-values from 1000 permutations.

Given correlation between traits, we also tested phenotype residuals for each trait after linear regression on the other, and a multi-trait LMM fitting general and specific effects. No markers were significant in any case (analysis not shown).

2D tests:

We tested for epistasis over a reduced search space (marker set 3), on the assumption of complete homozygosity, for a total of 19,913,422 marker pairs (inter- and intrachromosomal). We used a two-level hierarchical procedure, first testing a full linear model ( main and interaction effects) against a reduced model ( intercept only) by ANOVA, taking as summary statistic the P-value from an LRT. Significance at level 1 was tested against a null distribution generated by full phenotype permutation (i.e., no additive or interaction effects), with from the minimum values seen for each chromosome pair ( permutations). We then tested the interaction term specifically for marker pairs significant at level 1 using a parametric bootstrap (Bůžková et al. 2011): was fitted to responses sampled with replacement from (n = 10,000), taking the interaction P-value as test statistic, and then comparing the observed statistic to the null distribution (significance declared at P < 0.01). Code is in File S19 (5) We initially ignored relatedness for 2D testing, then fitted LMMs as above (Equation 2) with genetic covariance A for candidate interactions (R package hglm Shen et al. 2014). From eight candidate interactions for size, we excluded two for which interaction P-values by LMM were almost an order of magnitude higher. The six remaining candidates changed little (three were lower by LMM). For fertility, interaction P-values were largely insensitive to relatedness (lower for six of eight cases by LMM). Models were also fitted to raw trait values (in addition to the power transformed values) to assess scale effects. One interaction for fertility was significant for transformed values only and was excluded. The amount of phenotypic variance explained by interactions for each trait was estimated by linear model adjusted jointly fitting all main and two-locus interactions. These estimates were similar to those from LMM variance components, fitting random effects corresponding to additive and additive-by-additive genetic similarity separately at candidate interactions and background markers (point estimates were 6% lower for size and < 1% lower for fertility).

We also tested for excess weaker polygenic interactions by taking the sum of log likelihood ratios (LRs) for each marker against all other markers on one other chromosome (2D sum tests). Significance was tested at a single threshold (LR > 16, around the maximum value seen among pairwise interaction null P-values), using the equivalent of the above hierarchical procedure: LRs for vs. were first summed for each marker and compared to a null distribution generated by full phenotype permutation. Candidate markers significant at level 1 (α = 0.01) were then tested for significance of the interaction terms against a distribution of LR sums from null additive models for tests with LR > 16 in the observed data. This was repeated 1000 times, with permutation order fixed across bootstraps to maintain correlation structure, and significance was declared at P < 0.01. Code is in File S20, 2D_sumLRbootstrap.R.

Data availability

Sequence data are available from the National Center for Biotechnology Information Sequence Read Archive under BioProject PRJNA381203. Raw and processed phenotype and genotype data, and analysis scripts are provided as supplemental material (see DataDocument) and archived in FigShare under DOI: 10.6084/m9.figshare.5466574.v1. RILs are available from the authors.

Synergism between inbreeding and parasitism

There was no significant interaction, and thus, no evidence for synergism between the inbreeding coefficient and the consequences of parasitic infection found in this study. The parasite reduced the fitness of all inbred clones equally. Previously, Coltman et al. (1999) reported an interaction between inbreeding and parasitism. Inbred soay sheep showed a higher mortality when parasitized by gastrointestinal nematodes. They suggest that parasitism intensifies selection against inbreeding. Addressing similar questions, Stevens et al. (1997) examined the effect of a tapeworm parasite on inbred Tribolium lines. Although inbred beetles did not have significantly different infection intensities than noninbred individuals, inbreeding increased parasite prevalence significantly. In our study, the probability of becoming infected (prevalence) at different levels of inbreeding could not be studied, as within each replicate either all or none of the tested animals were parasitized. This is because we kept the conditions such that only vertical transmission of O. bayeri occurred.

With respect to parasitism, the result of the present study is consistent with the result of Peters (1999) , who exposed Arabidopsis thaliana individuals to five levels of chemical (EMS) mutagenesis and three levels of Pseudomonas syringae infections and measured different fitness components. Although he found a negative effect of mutation and of the pathogen, there was neither a synergistic interaction of mutations, nor a synergistic effect of mutations and pathogens on the characters measured. Thus, as in the present study, the consequences of infection and of deleterious mutations seemed to be independent of each other. In contrast to Peters’ (1999) study, the present study did show a nonlinear decrease in fitness with increasing numbers of mutations.

Consistent with our findings, another study on D. magna ( Haag et al., 2003 ), which was specifically designed to test whether parasitic infections increase selection against inbred genotypes, also found no evidence for a synergism between parasites and inbreeding. Haag et al. (2003) included 14 genetic backgrounds, but contrasted only controls with one level of inbreeding. Some of the clones they used were the G1x and G1s clones, used for breeding the experimental clones of the study presented here. In contrast to our study, Haag et al. (2003) had higher power to find a possible synergism between parasites and inbreeding, but their design did not allow to test for epistasis across different levels of inbreeding. Although their experiment clearly showed an absence of synergism between parasites and inbreeding, they found that the effect of parasites on inbreeding depression depended on the genetic background. In our experiment, this was tested by the three-way interaction, which was only marginally significant.

In conclusion, this study observed epistasis using three increasingly inbred clones of D. magna in a competition experiment. The clones with the highest inbreeding coefficient showed a significant decrease in their fitness compared with the other two, less inbred clones. However, no synergism between inbreeding and a parasite was found, the parasite reduced the fitness of all clones equally, independent of their inbreeding coefficient. Thus, this study suggests that mutations and one environmental factor (a parasite) may contribute to the maintenance of sex independently, rather than synergistically.

Watch the video: Alleles and Genes (July 2022).


  1. Sucki

    I think, you will come to the correct decision. Do not despair.

  2. Renzo

    I think you are not right. I'm sure. Write in PM, we will communicate.

  3. Dougar

    In my opinion, this article was stolen from you and placed on another site. I've seen her before.

  4. Abdul-Haqq

    Thank you, went to read.

  5. Davey

    I agree, very good message

  6. Faukasa

    I consider, that you commit an error. Let's discuss it. Write to me in PM, we will communicate.

Write a message