Information

Linkage and LD: quantitative or qualitative?

Linkage and LD: quantitative or qualitative?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

My understanding is that the concept "genetic linkage" can be expressed in quantitative form, like:

A predisposing gene X was found in close genetic linkage to Y.

And that linkage disequilibrium (LD) is a quality, like being pregnant, you are either in LD or your are not:

A predisposing gene X was found in close genetic linkage to Y and in linkage disequilibrium with Z.

Is this correct?


Linkage disequilibrium (LD) occurs when there is a non-random association or correlation between genotypes. Note I used the word correlation; this is a quantitative trait.

Some genotypes may well correlate perfectly (R=1), i.e. they are always inherited together. Others may not be in 'perfect' linkage (e.g. R=0.9), but are still considered to be in LD because there is a strong correlation between the genotypes (I think 0.8 is generally seen in published papers as the 'cut-off').

The correlation coefficient is derived from the D' value - this (simply) denotes the observed vs. expected frequencies of the genotypes (whether they are in linkage or not). Therefore either value can be used, but I think it is more common (and more interpretable) to express the correlation coefficient (or to be more precise, the coefficient of determination, R^2).

If you'd like an example: I might be interested to know if any SNPs are in LD with rs10757278 (located on 9p21, associated with heart disease). Using SNAP (by the BROAD) I can input my search, choose my options (e.g. use 1000 genomes data, and an R^2 cut-off of 0.8) and search. ~5 SNPs are found to be in 'perfect' LD (R^2=1), but a further ~40 are still considered to be in linkage with the input-SNP because their R^2 values are above 0.8.

So in summary, both statements can be used correctly, but it is always more informative to state the degree of linkage (otherwise it might be assumed they are in perfect correlation).


Polygenic Inheritance

Mendel performed his experiments with garden pea plant, which has traits or alleles having complete dominance and hence the laws of inheritance were proved. Other scientists performed their experiments on different plants and animals and found deviations to Mendelian ratios. Depending upon these experiments and observations, a different pattern of inheritance called gene interactions was discovered. This study is known as Post – Mendelian genetics or Neo-Mendelian genetics. In this article, we shall study the concept of polygenic inheritance.

Polygenic Inheritance or Quantitative Inheritance:

These characters are determined by two or more gene pairs and they have an additive or cumulative effect. These genes are called cumulative genes or polygenes or multiple factors. Polygenes are two or more different pairs of nonallelic genes, present on different loci, which influence a single phenotypic character and have an additive or cumulative effect. They are also called quantitative genes or cumulative genes or multiple factors.

A single phenotypic character governed by more than one pair of genes is called polygenic character or quantitative character. Polygenic characters or quantitative character show continuous variation. Galton (1883) predicted that in human population characters such as height, skin colour and intelligence show continuous variations in expression and not only two contrasting expressions.

In cumulative or polygenic inheritance each gene has a certain amount of effect. So more is the number of dominant genes, the greater is the expression of the character. It is generally believed that during evolution there was a duplication of chromosome or chromosome parts. This resulted in multiple copies of the same gene. Note that Mendel studied qualitative inheritance, where complete dominance is observed.

Polygenic Inheritance in Wheat Kernel Colour:

Swedish geneticist H. Nilsson-Ehle discovered polygenic inheritance. He crossed a red kernelled variety of wheat with white kernelled variety. In F1 generation all plants have grains with intermediate colour between red and white. In F2 generation five different phenotypic expressions (the darkest red, medium red, intermediate red, light red, white) appeared in the ratio 1:4:6:4:1. Nilson Ehle suggested that the kernel colour in wheat is controlled by two pairs of genes, Aa and Bb. Genes A and B determine the red colour. a and b which do not produce red colour pigment and their expression is a white colour of the kernel.

Polygenic Inheritance in Human Skin Colour:

The presence of melanin pigment is responsible for the colour of the skin in a human being. Each dominant gene is responsible for the synthesis of a fixed amount of melanin. The amount of melanin synthesized is directly proportional to the number of dominant genes.

The amount of melanin developing in persons is determined by three pars of genes A, B, C. These are present on three different loci and each dominant gene is responsible for the synthesis of a fixed amount of melanin. A genotype of a pure black parent in which melanin is produced is the highest is AABBCC, while that of pure white also called albino no melanin is formed is aabbcc.

Mulattoes i.e. F1 offspring produce (2 3 = 8) different types of gametes. Let us consider mulatto intermediate whose genotype is AaBbCc. By doing cross among two mulatto intermediate we get (2 6 = 64) combinations in F2 generation. But there only 7 phenotypes due to a cumulative effect of each dominant gene.

When we analyze all possible combinations and plot the probability graph by taking frequency distribution of colour, the number of dominant genes in various shades on the x-axis and the frequency of different shades onthe y-axis. In Polygenic inheritance often we get a bell-shaped curve as shown below.

This means that most people fall in the middle of the phenotypic range, such as skin colour, while very few people are at the extremes, such as pure white or pure dark. At one end of the curve will be individuals who are recessive for all the alleles (for example, aabbcc). They are rare at the other end will be individuals who are dominant for all the alleles (for example, AABBCC) they are rare. In the middle of the curve will be individuals who have a combination of dominant and recessive alleles (for example, AaBbCc or AaBBcc). The graph also shows that the expression level of the phenotype is dependent on the number of contributive alleles and hence more quantitative.

Other examples are the height of human being, cob length of maize.

Comparative Study of Qualitative and Quantitative Inheritance:

Qualitative Inheritance:

  • Qualitative characters are classical Mendelian traits which have two contrasting expressions and are controlled by a single pair of genes. e.g. tall and dwarf pea plants. A qualitative character can be expressed by a single pair of the gene. Hence the traits are called monogenic traits. The inheritance of monogenic traits (monogene) or qualitative characters is called qualitative or monogenic inheritance.
  • A qualitative trait is expressed qualitatively, which means that the phenotype falls into different categories. These categories do not necessarily have a certain order.
  • Qualitative inheritance was first studied by Mendel.

Characteristics of Qualitative Inheritance:

  • A quantitative inheritance or monogenic inheritance deals with the inheritances of qualitative characters which have two contrasting expressions e.g. tall and dwarf pea plants.
  • Each character is controlled by a single pair of contrasting alleles.
  • There is no intermediate type.
  • Each character has two distinct contrasting expressions i.e. they exhibit two distinct phenotypes.
  • The degree of expression remains the same whether the character is controlled by one or both the dominant genes.
  • Single effect genes are seen.
  • It is not influenced by environmental factors.
  • It shows a discontinuous pattern of inheritance.
  • Individuals of F1 generation resembles the dominant parent.
  • Individuals of the F2 generation are in the ratio 3:1. An intermediate expression is absent.
  • It concerns with individual matings and their progeny.
  • Analysis of this inheritance can be done by counting and finding ratios.
  • Examples: Inheritances of qualitative characters like height, seed coat and seed colour of the pea plant.

Quantitative Inheritance:

A quantitative inheritance or polygenic inheritance deals with the inheritances of quantitative characters like height, weight, skin colour, intelligence, etc in the human population and exhibits continuous variation. Few characters in plants like height, the size, shape, number of seeds and fruits also exhibit quantitative inheritance.

In quantitative inheritance each gene has a certain amount of effect and the more number of dominant genes, the more is the degree of expression of the character. The gradation in the expression of the characters is determined by the number of gene pairs and all the gene pairs have an additive or cumulative effect.

Quantitative or polygenic inheritance was first studied by J. Kolreuter (1760) in case of height in tobacco and F. Galton (1883) in case of height and intelligence in human beings. Nilsson-Ehle (1908) obtained the first experimental proof of polygenic inheritance in case of kernel colour in wheat. The possible origin of polygenic inheritance is due to the duplication of a chromosome or its part, the increase in chromosomes number (Polyploidy) or the mutations producing genes having the similar effect.


Access to Document

  • APA
  • Standard
  • Harvard
  • Vancouver
  • Author
  • BIBTEX
  • RIS

In: Genetic epidemiology , Vol. 22, No. 4, 2002, p. 298-312.

Research output : Contribution to journal › Article › peer-review

T1 - Unified sampling approach for multipoint linkage disequilibrium mapping of qualitative and quantitative traits

N2 - Rapid development in biotechnology has enhanced the opportunity to deal with multipoint gene mapping for complex diseases, and association studies using quantitative traits have recently generated much attention. Unlike the conventional hypothesis-testing approach for fine mapping, we propose a unified multipoint method to localize a gene controlling a quantitative trait. We first calculate the sample size needed to detect linkage and linkage disequilibrium (LD) for a quantitative trait, categorized by decile, under three different modes of inheritance. Our results show that sampling trios of offspring and their parents from either extremely low (EL) or extremely high (EH) probands provides greater statistical power than sampling in the intermediate range. We next propose a unified sampling approach for multipoint LD mapping, where the goal is to estimate the map position (τ) of a trait locus and to calculate a confidence interval along with its sampling uncertainty. Our method builds upon a model for an expected preferential transmission statistic at an arbitrary locus conditional on the sampling scheme, such as sampling from EL and EH probands. This approach is valid regardless of the underlying genetic model. The one major assumption for this model is that no more than one quantitative trait locus (QTL) is linked to the region being mapped. Finally we illustrate the proposed method using family data on total serum IgE levels collected in multiplex asthmatic families from Barbados. An unobserved QTL appears to be located at τ̂ = 41.93 cM with 95% confidence interval of (40.84, 43.02) through the 20-cM region framed by markers D12S1052 and D12S1064 on chromosome 12. The test statistic shows strong evidence of linkage and LD (chi-square statistic = 18.39 with 2 df, P-value = 0.0001).

AB - Rapid development in biotechnology has enhanced the opportunity to deal with multipoint gene mapping for complex diseases, and association studies using quantitative traits have recently generated much attention. Unlike the conventional hypothesis-testing approach for fine mapping, we propose a unified multipoint method to localize a gene controlling a quantitative trait. We first calculate the sample size needed to detect linkage and linkage disequilibrium (LD) for a quantitative trait, categorized by decile, under three different modes of inheritance. Our results show that sampling trios of offspring and their parents from either extremely low (EL) or extremely high (EH) probands provides greater statistical power than sampling in the intermediate range. We next propose a unified sampling approach for multipoint LD mapping, where the goal is to estimate the map position (τ) of a trait locus and to calculate a confidence interval along with its sampling uncertainty. Our method builds upon a model for an expected preferential transmission statistic at an arbitrary locus conditional on the sampling scheme, such as sampling from EL and EH probands. This approach is valid regardless of the underlying genetic model. The one major assumption for this model is that no more than one quantitative trait locus (QTL) is linked to the region being mapped. Finally we illustrate the proposed method using family data on total serum IgE levels collected in multiplex asthmatic families from Barbados. An unobserved QTL appears to be located at τ̂ = 41.93 cM with 95% confidence interval of (40.84, 43.02) through the 20-cM region framed by markers D12S1052 and D12S1064 on chromosome 12. The test statistic shows strong evidence of linkage and LD (chi-square statistic = 18.39 with 2 df, P-value = 0.0001).


Notation and Data

The notation used in the various quantitative TDT papers on which our comments are based is not consistent from one author to another, and we adopt a unifying notation that is loosely based on that of these papers. In accordance with standard statistical practice, we use upper case notation for random variables and the corresponding lower case notation for the observed values of these random variables. To focus on the main points in this expository review, we assume a specific (and restricted) form of data. We assume that the data concern a marker locus “A,” having two possible alleles, denoted by A and a, and consist of information on n family trios, with complete marker locus genotype information on the two parents and the child in each trio. The value of the quantitative trait of interest is known for the child in each trio but not for the parents. We assume, in line with the original qualitative TDT, that all parental mating types are informative (i.e., contain at least one Aa parent). The observed number of A alleles in the child in trio i is denoted by xi (i = 1, 2,…, n), and the observed value of the quantitative trait of interest in the child in trio i is denoted by yi.

We do not consider here the extent to which the comments made below carry over to data other than those described above, for example cases where several children are observed in each family, where parental phenotype information is available, and where the data contain families with noninformative mating types. We restrict our analysis in this way so as to highlight the main features of the testing procedures that we discuss without getting into the analyses required for forms of data more complicated than those we consider.

The null hypothesis tested is “no linkage (or no linkage disequilibrium) between the marker locus and a locus involved with the quantitative trait.” Under this null hypothesis, the mean number of A alleles, Xi, in the child in trio i will depend on the parental mating type, being 0.5 if it is aa×Aa, 1.0 if it is Aa×Aa, and 1.5 if it is AA×Aa. The null hypothesis variance of Xi also depends on the parental mating type, being 0.25 if it is aa×Aa or AA×Aa and 0.5 if it is Aa×Aa. We frequently use the convenient Abecasis [5] notation Wi (“within family”) to describe Xi minus its null hypothesis mean as computed from the mating type in trio i. The null hypothesis mean of Wi is zero and the null hypothesis variance of Wi is the same as that for Xi, and depends on the mating type in trio i. When discussing the typical family we drop the suffix i and use the generic notation W, w, X, x, Y, and y.


Results

We evaluated different methods to handle LD in the framework of our proposed two-step processing strategy. In step 1, we performed NPL analysis over 1,000 replicates of the simulated data at different LD thresholds for families with 2, 3 or 4 affected siblings and zero or two ungenotyped parents to determine which subset of SNPs retained full IC while reducing the LOD score bias. In step 2, we performed NPL analysis over 1,000 replicates of the simulated data at different LD thresholds using the subset of SNPs obtained from step 1 for all 9 study designs, with zero or two ungenotyped parents. We evaluated different approaches at varying degrees of LD (D' and r 2 ) in terms of reducing the LOD score bias. In general, with complete data where both parents are genotyped, there was no LOD score inflation in all study designs. Thus our discussions to follow mainly focus on those results where both parents are ungenotyped.

Step 1

Table ​ Table1 1 summarizes average maximum LOD score (MLS) and average IC obtained using the three techniques as they compared to the unadjusted subset of SNPs. The IC values obtained at unadjusted subset of SNPs were 0.72, 0.80 and 0.84 for the 2, 3 or 4 sibling data, respectively.

Table 1

Step 1: Summary descriptive statistics of the average maximum NPL LOD scores and the average IC with ungenotyped parents.

2 affected sibs3 affected sibs4 affected sibs
Ave # SNPsMLS (SD)ICMLS (SD)ICMLS (SD)IC
Unadjusted601213.27 (2.77)0.7214.41 (3.64)0.807.86 (2.20)0.84
MAF ≥ 0.05538713.49 (2.66)0.7214.77 (3.64)0.808.06 (2.23)0.84
MAF ≥ 0.10461613.87 (2.64)0.7215.00 (3.61)0.808.18 (2.25)0.84
MAF ≥ 0.20320315.01 (2.69)0.7215.47 (3.61)0.797.90 (2.42)0.84
r 2 ≥ 0.9545965.47 (1.79)0.724.93 (2.06)0.802.96 (1.55)0.85
MAF ≥ 0.05 & r 2 ≥ 0.9537139.62 (2.75)0.7214.01 (3.43)0.805.02 (1.95)0.85

Removing uninformative SNPs reduced the number of markers while maintaining the level of IC however, the LOD score inflation remained unchanged compared to the baseline LOD score inflation. Removing redundant SNPs resulted in no loss of IC and showed moderate reduction of the LOD score bias and outperformed the combination technique however there was still a large number of markers selected. When the two techniques were combined, 38% fewer SNPs than the baseline subset of SNPs were selected while maintaining IC. In addition, this combination technique showed moderate reduction in LOD score bias. In general, for all marker subsets, the average MLS remained above the null expectation, pointing out the need for further adjustment/SNP selection to reduce the bias introduced by ignoring LD in dense marker sets. We carried forward the marker subset created by the combination approach to be evaluated in step 2 because we achieved the same information content while reducing bias. Setting the results from the combination approach in Step 1 as the baseline for Step 2 allowed us to evaluate the different approaches and measure the level of bias each approach could reduce by starting from a moderate baseline of bias.

Step 2

In step 2, we examined four different approaches to handle LD among dense SNPs using the selected subset of 3,713 SNPs in step 1 as the baseline marker subset. We compared the resulting average MLS from these different approaches to the baseline and considered them as showing reduction of bias where we noted the observed average MLS at least 10% below the baseline. Table ​ Table2 2 shows the average MLS with ungenotyped parents, where LOD score bias was observed. As a note, with the addition of one or two unaffected siblings, we observed lower LOD score bias as shown in Table ​ Table1. 1 . For example, we observed 9.62 (SD = 2.75), 8.66 (SD = 2.49) and 4.41 (SD = 1.80) average MLS for families of 2 affected sibs and zero, one or two unaffected sibs, respectively, with ungenotyped parents. Similar trends were observed for other study designs with 3 or 4 affected siblings. In term of IC, we observed increased average IC as the size of a sibship in a family increased.

Table 2

Summary of average maximum LOD scores and average IC using the baseline marker subset from step 1 for the 9 study designs with ungenotyped parents.

Number of SibsMLS (SD)Average IC
2 Affected9.62 (2.75)0.72
3 Affected14.01 (3.43)0.80
4 Affected5.02 (1.95)0.85
2 Affected + 1 Unaffected8.66 (2.49)0.80
3 Affected + 1 Unaffected5.41 (2.12)0.85
4 Affected + 1 Unaffected3.17 (1.55)0.88
2 Affected + 2 Unaffected4.41 (1.80)0.85
3 Affected + 2 Unaffected3.18 (1.57)0.88
4 Affected + 2 Unaffected1.89 (1.29)0.90

In Table ​ Table3 3 and ​ and4, 4 , we summarize the average MLS using the four methods in handling LD at varying LD cut points using families with 2, 3 or 4 affected siblings and ungenotyped parents. Table ​ Table3 3 shows results obtained using D' thresholds, and Table ​ Table4 4 shows results obtained using r 2 thresholds for families with 2, 3 or 4 affected siblings with ungenotyped parents. In addition, similar patterns across the LD cut points were observed with further reduction of the LOD score bias for families with 1 or 2 additional siblings who were not affected (data not shown). As the number of unaffected siblings increased in a family the inflation of LOD scores diminished further and in some cases were eliminated completely.

Table 3

Step 2 using D' LD threshold and MT: Summary descriptive statistics of average maximum NPL LOD scores and average IC for families with 2, 3 or 4 affected sibling and ungenotyped* parents.

2 affected sibs3 affected sibs4 affected sibs
MethodLD thresholdAve # SNPsMLS (SD)ICMLS (SD)ICMLS (SD)IC
Unadjusted 37139.62 (2.75)0.7214.01 (3.43)0.805.02 (1.95)0.85
MT8snp1cM4801.20 (0.96)0.721.34 (1.01)0.800.36 (0.47)0.85
MT4snp1cM2590.65 (0.67)0.710.60 (0.63)0.800.22 (0.36)0.85
MT2snp1cM1350.66 (0.68)0.670.76 (0.74)0.790.26 (0.38)0.84
MT1snp1cM680.76 (0.73)0.611.05 (0.88)0.760.34 (0.47)0.83
RE0.74090.44 (0.53)0.730.54 (0.61)0.800.2 (0.33)0.85
RE0.53090.37 (0.50)0.720.45 (0.55)0.800.18 (0.34)0.85
RE0.32000.43 (0.54)0.70.58 (0.60)0.800.23 (0.39)0.85
RE0.1620.73 (0.73)0.611.18 (0.96)0.760.44 (0.57)0.83
SNPLINK0.75310.65 (0.78)0.730.66 (0.79)0.800.3 (0.46)0.85
SNPLINK0.54010.75 (0.81)0.720.73 (0.83)0.800.31 (0.46)0.85
SNPLINK0.32870.85 (0.88)0.710.81 (0.84)0.800.33 (0.47)0.85
SNPLINK0.11200.65 (0.74)0.640.70 (0.81)0.770.31 (0.46)0.83

*With complete data where both parents are genotyped, the unadjusted average MLS for 2, 3 or 4 affected sibs are 0.58, 0.5 and 0.47.

Table 4

Step 2 using r 2 LD threshold: Summary descriptive statistics of average maximum NPL LOD scores and average IC for families with 2, 3 or 4 affected sibling and ungenotyped* parents.

2 affected sibs3 affected sibs4 affected sibs
MethodLD thresholdAve # SNPsMLS (SD)ICMLS (SD)ICMLS (SD)IC
Unadjusted 37139.62 (2.75)0.7214.01 (3.43)0.805.02 (1.95)0.85
RE0.715621.39 (0.98)0.732.00 (1.28)0.800.73 (0.70)0.85
RE0.512401.21 (0.90)0.731.67 (1.17)0.810.59 (0.63)0.85
RE0.38920.45 (0.53)0.730.66 (0.68)0.810.26 (0.39)0.85
RE0.14350.32 (0.44)0.720.40 (0.49)0.800.18 (0.34)0.85
MERLINLD0.75621.35 (0.97)0.731.83 (1.22)0.800.62 (0.61)0.85
MERLINLD0.55751.12 (0.96)0.731.53 (1.19)0.800.53 (0.59)0.85
MERLINLD0.35420.55 (0.61)0.740.77 (0.77)0.800.32 (0.43)0.85
MERLINLD0.14230.51 (0.58)0.740.69 (0.70)0.810.29 (0.41)0.85
SNPLINK0.725963.94 (1.85)0.735.10 (2.11)0.802.06 (1.23)0.85
SNPLINK0.523513.63 (1.81)0.734.88 (2.09)0.801.97 (1.20)0.85
SNPLINK0.320573.04 (1.73)0.734.10 (1.95)0.801.56 (1.07)0.85
SNPLINK0.115192.69 (1.64)0.733.57 (1.87)0.801.28 (0.98)0.85

*With complete data where both parents are genotyped, the unadjusted average MLS for 2, 3 or 4 affected sibs are 0.58, 0.5 and 0.47

Marker Thinning (MT) Algorithm

For the four MT cut points with 8, 4, 2 or 1 SNPs per 1 cM region, we observed the average MLS of 1.20 (SD = 0.96), 0.65 (SD = 0.67), 0.66 (SD = 0.68) and 0.76 (SD = 0.73), respectively (Table ​ (Table3). 3 ). The corresponding average IC values were 0.72, 0.71, 0.67 and 0.61, respectively. Compared to the baseline LOD score of 9.62 (SD = 2.75), all four subsets of markers resulted in greatly reduced LOD score bias. At 8 and 4 SNPs per cM cut points, there was negligible loss of information however, at the other two lower cut points, there was some loss of information compared to the baseline IC of 0.72. Nevertheless, none of these sets of markers completely eliminated the LOD score bias, as compared to the average MLS values observed with complete data where both parents are genotyped. Similar trends were observed for families with 3 or 4 siblings.

Recursive Elimination (RE) Algorithm

Using D' cut points, we observed a substantial reduction of the LOD score bias as compared to the baseline LOD score of 9.62 (SD = 2.75) shown in Table ​ Table3. 3 . Compared to the observed average MLS with complete data, using RE approach eliminated the bias at D' > 0.3, but we did not observe the same pattern at the lowest D' cut point. Some loss of information was observed at D' 0.3 (IC = 0.70) and 0.1 (IC = 0.61) cut points compared to the baseline. With r 2 thresholds applied to the data with 2 affected siblings, we also observed reduction of the LOD score bias for the four r 2 cut points, respectively (Table ​ (Table4). 4 ). When compared to the complete data, the LOD score bias was no longer observed at the two lowest r 2 cut points. In addition, the IC values did not change from the baseline IC of 0.72. In general for both LD measures, similar trends were observed for families with 3 or 4 affected siblings.

SNPLINK

Compared to the baseline LOD score of 9.62, SNPLINK adjusted marker subsets reduced the LOD score bias when using D' thresholds (Table ​ (Table3) 3 ) however none of the cut points completely eliminated the bias. There was some information loss only at the lowest cut point (IC = 0.64) compared to the baseline IC of 0.72. In contrast, using r 2 thresholds, we observed only moderate reduction of the LOD score bias (Table ​ (Table4), 4 ), and there was no loss of information at any level of threshold. Relatively similar patterns of reduction were observed in families with 3 or 4 siblings across all cut points for both LD measures.

MERLIN-LD

We applied the clustering method implemented in MERLIN, MERLIN-LD, using four r 2 cut points. There were 562, 575, 542 and 423 clusters formed at 0.7, 0.5, 0.3 and 0.1 r 2 cut points, respectively (Table ​ (Table4). 4 ). Across all thresholds, the IC remained unchanged from the baseline IC. For families with 2 affected siblings and ungenotyped parents, we observed a substantial reduction of LOD score bias as compared to the baseline LOD score of 9.62 (Table ​ (Table4). 4 ). At the two lowest cut points, we observed elimination of the LOD score bias as they compared to the average MLS with complete data.


Abstract

A major challenge in current biology is to understand the genetic basis of variation for quantitative traits. We review the principles of quantitative trait locus mapping and summarize insights about the genetic architecture of quantitative traits that have been obtained over the past decades. We are currently in the midst of a genomic revolution, which enables us to incorporate genetic variation in transcript abundance and other intermediate molecular phenotypes into a quantitative trait locus mapping framework. This systems genetics approach enables us to understand the biology inside the 'black box' that lies between genotype and phenotype in terms of causal networks of interacting genes.


Linkage disequilibrium clustering-based approach for association mapping with tightly linked genomewide data

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Zitong Li, Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland.

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Zitong Li, Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland.

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Ecological Genetics Research Unit, Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, Department of Biosciences, University of Helsinki, Helsinki, Finland

Abstract

Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster-based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single- and multilocus models that can efficiently conduct the association tests on such high-dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.


Complex and long-range linkage disequilibrium and its relationship with QTL for Marek’s Disease resistance in chicken populations

Chicken long-range linkage disequilibrium (LRLD) and LD blocks, and their relationship with previously described Marek’s Disease (MD) quantitative trait loci regions (QTLRs), were studied in an F6 population from a full-sib advanced intercross line (FSAIL), and in eight commercial pure layer lines. Genome wide LRLD was studied in the F6 population by random samples of non-syntenic and syntenic marker pairs genotyped by Affymetrix HD 600K SNP array. To illustrate the relationship with QTLRs, LRLD and LD blocks in and between the MD QTLRs were studied by all possible marker pairs of all array markers in the QTLRs, using the same F6 QTLR genotypes and genotypes of the QTLR elements’ markers in the eight lines used in the MD mapping study. LRLD was defined as r 2 ≥ 0.7 over a distance ≥ 1 Mb, and 1.5% of all syntenic marker pairs were classified as LRLD. Complex fragmented and interdigitated LD blocks were found, over distances ranging from a few hundred to a few million bases. Vast high, long-range, and complex LD was found between two of the MD QTLRs. Cross QTLRs STRING networks and gene interactions suggested possible origins of this exceptional QTLRs’ LD. Thus, causative mutations can be located at a much larger distance from a significant marker than previously appreciated. LRLD range and LD block complexity may be used to identify mapping errors, and should be accounted for while interpreting genetic mapping studies. All sites with high LD with a significant marker should be considered as candidate for the causative mutation.


Discussion

This study provides an overview of LD in the Thoroughbred using a high density SNP panel. Validation work by Khatkar et al. (2008) on their cattle data suggests that our sample size of more than 800 horses is more than sufficient to obtain an unbiased picture of LD in our population. The pattern of decline of LD with distance in this population is consistent with that reported by Wade et al. (2009) in a sample of 24 Thoroughbreds, with both data sets exhibiting a decrease in r 2 from ∼0.6 to 0.2 when the distance between markers is increased to 0.5 Mb. The LD observed is higher at short distances and more extensive than that observed in human populations ( Shifman et al. 2003 ). Linkage disequilibrium declines more slowly in our population than in the range of cattle populations studied by de Roos et al. (2008) , with r 2 remaining above 0.3 for distances up to 185 kb in our data, compared with a maximum distance of 35 kb in the cattle data.

The mean value of r 2 between non-syntenic SNPs was 0.0018, and this provides an approximation of the LD that can be expected by chance, assuming that the markers used have not undergone simultaneous selection. The value observed here is lower than, but of a similar magnitude to, that observed by Khatkar et al. (2008) in a sample of over 1500 cattle (0.0032). The mean non-syntenic r 2 value reflects both sampling of animals and genetic sampling (drift), and hence may be expected to decrease with increases in both sample size and Ne. Therefore, the larger non-syntenic value in Australian Holstein–Friesian cattle may more reflect a lower Ne in this cattle population. The low LD seen between non-syntenic SNPs in our population suggests that the LD created by admixture during breed formation ( Hill et al. 2002 ) has declined to negligible levels for these markers. A similar decline of LD between non-syntenic markers was observed in Coopworth sheep approximately ten generations after the foundation of the breed through crossing ( McRae et al. 2002 ). At distances greater than 100 Mb, average r 2 between syntenic SNPs is reduced to non-syntenic or background levels, and is no longer a function of distance. This is expected, as the recombination rate at such distances approaches 0.5.

By using Sved’s (1971) formula for the expectation of r 2 , a non-linear regression model was fitted to the data to describe the relationship between linkage distance and LD. Without making any assumptions about the value of r 2 at the intercept, estimates of a and b, as predicted using Eq. 3 and averaged over all autosomes, were 2.25 and 103, respectively. Parameter a determines the value of expected r 2 when the line crosses the y-axis (i.e. when the distance between markers is effectively zero). Our estimate of a supports an alternative version of Sved’s (1971) equation, derived by Tenesa et al. (2007) , which takes into account mutation and puts a equal to two, whilst at the same time raising the question of whether fixing a to unity in the model (as in Abasht et al. (2009) , Toosi et al. (2010) and Zhao et al. (2005) ) is appropriate. The impact of such model assumptions are explored in Corbin et al. (2010) . The heterogeneity of variance associated with the observed r 2 , such that the variance of r 2 declined with increasing distance between markers, may also have impacted on our results. We observed a significant negative relationship between chromosome length (cM) and estimates of b from the non-linear model, suggesting LD is higher in longer chromosomes. This contrasts with the findings of Tenesa et al. (2007) , who observed a positive relationship, but is in keeping with the observations of Khatkar et al. (2008) and Muir et al. (2008) in domestic livestock species.

Our estimate of b (103) is an estimate of Ne assuming constant population size. However, this assumption is difficult to sustain, and therefore, b more likely represents a conceptual average Ne over the period inferred from the marker distance range, for example see Toosi et al. (2010) . For this reason, Fig. 5 shows the results following the approach of Hayes et al. (2003) by calculating historical Ne, assuming linear population growth. The pattern observed shows a decrease in Ne up until around 20 generations ago, followed by an increase until one generation ago. The interpretation of such trends is difficult, with the observed dip in Ne potentially representing any one of a number of scenarios, including a founder event, an immigration event, a hybridization event or any combination of these ( Wang 2005 ). Therefore, it is useful to consider our observation in the context of what is known about the Thoroughbred’s demographic history.

Documentary evidence suggests that the Thoroughbred was derived from a cross between sires originating from the Mediterranean Middle East and British native breeds, and the breed was established during the seventeenth century ( Hill et al. 2002 ). It is not clear from published literature what effects an admixture like this would have on patterns of estimated Ne prior to the crossing event, although clues may be observed in Toosi et al. (2010) . However, what may be predicted is that such a crossing event would appear as a bottleneck in the population, creating an initially high level of LD in the beginning. Therefore, one might infer from our results that the lowest point of the curve reflects the point at which the breed was formed this approximately coincides with the findings of Mahon & Cunningham (1982) that Thoroughbreds born in the 1960s were separated from seventeenth century founders by an average of 21.5 generations. Cunningham et al. (2001) also found evidence for a population bottleneck at the time of breed formation.

The reliability of this method depends both on the technical implementation ( Corbin et al. 2010 ) and, as discussed above, on the demographic history of the breed. Some calibration of the accuracy of the Ne profile presented can be obtained by comparison with values obtained from pedigree analyses. For example, Cunningham et al. (2001) calculate the effective number of studbook founders of the Thoroughbred to be 28.2. As this relies on calculating the long-term contributions of the founders, quantitative genetic theory ( Woolliams & Bijma 2000 ) suggests that the Ne for this generation is twice this value if in HWE, providing an estimate of 56 soon after breed formation. This may be compared with the minimum Ne of 88 obtained in this analysis, which gives fair agreement. A further estimate of reliability can be obtained by comparing the mean inbreeding of 0.125 (SE 0.005) obtained by Mahon & Cunningham (1982) for the 21.5 generations from breed foundation to 1964 with the accumulated inbreeding for generations four to 25 (assuming four generations since 1964) using , with Ne being estimated from Fig. 5. The value obtained of 0.112 is remarkably close. Therefore, our minimum of Ne ≈ 90 is of the correct magnitude, and the increase in Ne over the last ten generations may be explained by an increase in actual population size. In Thoroughbreds, with low reproductive rate of the mare and the ban upon use of artificial insemination, there is a greater likelihood that increases in census size will be translated into effective population sizes. The trend in Ne observed in the most recent generations should be interpreted with caution because of the technical limitations of the methods.

Implications for genome-wide association studies, marker-assisted selection and genomic selection

The extent of LD in a population can be used to estimate the SNP density required for GWAS studies to be effective, as well as giving some indication as to the likely precision with which the QTL region will be located. The required sample size is said to be inflated by 1/r 2 , when it is necessary to rely on marker-QTL LD, rather than on the QTL itself ( Du et al. 2007 ), and this has prompted authors to propose thresholds for useful LD. The term ‘useful LD’ has been described as the proportion of QTL variance explained by a marker ( Zhao et al. 2005 ), and the consensus is that an average r 2 > 0.3 will permit reasonable sample sizes to be employed for GWAS ( Ardlie et al. 2002 Du et al. 2007 Khatkar et al. 2008 ). In this dataset, markers 185 kb apart achieve an average LD of r 2 = 0.3, and this corresponds to approximately 14 500 evenly spaced markers across the genome. However, because markers with r 2 = 1 will likely be excluded in genomic selection, and given the high variability of r 2 values at small distances, this is likely to be an underestimation of the actual number of SNPs needed. Indeed, in this study, whilst markers separated by less than 250 kb had a mean r 2 of 0.32 (after the exclusion of those pairs in complete LD), less than half the SNP pairs exhibited r 2 values of greater than 0.3. With MAS also relying on close and consistent linkage between markers and QTL, the high LD observed here is promising. Genomic selection (GS) appears to be effective at lower average r 2 than that required for GWAS, with simulation results demonstrating accuracies of up to 0.65 with an average r 2 between adjacent SNPs as low as 0.2 and a trait heritability of 0.1 ( Calus et al. 2008 ). Deterministic equations derived by ( Daetwyler et al. 2009 ) demonstrate that the accuracy of GS can be expressed as a function of the effective number of loci (Me) in a population. Me relates to the number of independent chromosome segments and, given our current Ne estimate of ∼180 and assuming a random mating population, the Me for our population is ∼1500 ( Meuwissen 2009 ). Thus, we are now able to predict the potential accuracy of GS in this population for a range of scenarios.

In summary, we used dense SNP genotype data to characterize LD and make inferences regarding ancestral Ne for a large sample of Thoroughbred horses. In the population studied, LD extended for long distances, reaching baseline levels at around 50 Mb. From the decay in LD with distance, we inferred ancestral Ne and observed a decrease in Ne since the distant past, which reached a minimum of ∼90 20 generations ago, followed by an increase until the present time. Such an approach could be used to investigate the demographic histories and rates of inbreeding of horse breeds with less extensive pedigree records than the Thoroughbred. The results indicate that genomic methodologies which are reliant on LD between markers and QTL have the potential to perform well within Thoroughbred populations genotyped for the 50-K SNP chip.


Basic Genetics

David P. Clark , . Michelle R. McGehee , in Molecular Biology (Third Edition) , 2019

8.1 Recombination During Meiosis Ensures Genetic Diversity

However, the alleles A, B, and C (or a, b, and c) do not always stay together during reproduction. Swapping of segments of the chromosomes can occur by breaking and rejoining of the neighboring DNA strands. Note that the breaking and joining occurs in equivalent regions of the two chromosomes and neither chromosome gains or loses any genes overall. The point at which the two strands of DNA cross over and recombine is called a chiasma (plural, chiasmata) . The genetic result of such crossing over , the shuffling of different alleles between the two members of a chromosomal pair, is called recombination ( Fig. 2.21 ). The farther apart two genes are on the chromosome, the more likely a crossover will form between them and the higher will be their frequency of recombination. Recombination frequency is an important value for a geneticist, and the values range between 0% or 0, which means that the two genes are so close together, they are always found in the same progeny after a mating, to 50% or 0.5, which means that the two genes are so far apart that they appear to be on separate chromosomes.

Figure 2.21 . Linkage of Genes and Recombination During Meiosis

At the top, the two members of a chromosome pair are shown, each carrying different alleles. Because the three alleles A, B, and C are on the same molecule of DNA, they will tend to stay together. So if the offspring inherits allele A from one parent, it will usually get alleles B and C, rather than b and c. If recombination occurs during meiosis, the DNA breaks and the chromosomes rejoin such that part of one chromosome is exchanged with the homologous partner. Now, the offspring can receive allele A with alleles b and c from one parent.

This type of recombination occurs during meiosis, the process that reduces the genome from diploid to haploid. The process of meiosis is divided into two parts, meiosis I and meiosis II ( Fig. 2.22 ). Table 2.01 describes the events that occur at each stage of meiosis. In a typical diploid organism, there are two homologs for each chromosome inside the normal cell. After a single round of DNA replication, there are four different copies of each chromosome, two copies of homolog 1 and two copies of homolog 2. These are attached at their centromeres and form what is called a tetrad .

Figure 2.22 . Meiosis Forms Haploid Gametes

This figure demonstrates how the diploid cell forms four haploid gametes during a special cell division called meiosis. Only one homologous chromosome (red and green) is shown for clarity, but it has undergone DNA replication to create two copies of each homolog.

DivisionStage of MeiosisSubstage of Prophase IChromosome Structure
MEIOSIS IProphase ILeptonemaTetrads begin to condense
ZygonemaHomologous chromosomes begin to pair up
PachynemaHomologous chromosomes are fully paired recombination occurs
DiplonemaHomologous chromosomes separate (except at the centromere) chiasmata are visible
DiakinesisPaired chromosomes condense further and attach to spindle fibers
Metaphase I Paired tetrads align at the middle of the cell
Anaphase I Homologous chromosomes split so that two copies move to each half of the cell
Telophase I Two new nuclei form, each containing a set of two sister chromatids
MEIOSIS IIProphase II Each chromosome condenses once again and start to attach to spindle fibers
Metaphase II Chromosomes align in the center of each new cell
Anaphase II Each of the sister chromatids separates and moves to each side of the new cell
Telophase II Chromosomes decondense and two new nuclei form
Cytokinesis Each of the two cells divides, completely forming four new cells, each containing one copy of each chromosome (a haploid genome)

When cells are in the substages of meiosis I, the genetic information on each of the four copies of the chromosomes aligns perfectly, matching gene for gene along the entire length of each chromosome. When the chromosomes are in this state, genetic information is exchanged with the other copies, forming new genetic combinations. The alignment stage, called synapsis , occurs due to a set of conserved proteins that link the chromosomes. These proteins form a structure where the DNA of each pair of homologous chromosomes is linked together with a zipper-like structure consisting of lateral elements and central elements connected by transverse fibers ( Fig. 2.23 ).

Figure 2.23 . Synaptonemal Complex

The synaptonemal complex is a set of proteins that link the two homologous chromosomes during the zygotema stage of meiosis I. Only one chromosome pair is shown for clarity. The red and green chromosomes form a homologous pair.

Genetic linkage is often defined, from a molecular viewpoint, as the tendency of alleles carried by the same DNA molecule to be inherited together. However, if two genes are very far apart on a very long DNA molecule, linkage may not be observed in practice. In this example, consider a long chromosome, carrying all five genes, A, B, C, D, and E. It can be observed that A is linked to B and C, and that C and D are linked to E, but that no linkage is observed between A and E ( Fig. 2.24 ). Given that A is on the same DNA molecule as B and that B is on the same DNA molecule as C, etc., it can be deduced that A, B, C, D, and E must all be on the same chromosome. In genetic terminology, it is said that A, B, C, D, and E are all in the same linkage group . Even though the most distant members of a linkage group may not directly show linkage to each other, their relationship can be deduced from their mutual linkage to intervening genes.

Figure 2.24 . Linkage Groups

In this example chromosome, genes A and E are linked even though the recombination frequencies suggest they are not linked. After this parent mates, the percentage of progeny that have a recombination event between the labeled genes is indicated above the chromosome. When the progeny are assayed for the presence of both the A and C allele, there are only 30% that have this combination of genes. When progeny are assayed for the presence of both the C and E allele, about 25% of the progeny have this combination. Since A is linked to C, and C is linked to E, it can be deduced that A and E are on the same linkage group.

The exchange of alleles between homologous chromosomes and independent assortment of chromosomes during anaphase provide all the new combinations of genes to make each person unique.


Watch the video: Qualitative and Quantitative Research (May 2022).


Comments:

  1. Dirisar

    It is already nothing less than an exception

  2. Engel

    the information very entertaining

  3. Augustine

    I'm sorry, but I think you are wrong. I'm sure. Let's discuss this. Email me at PM.

  4. Adlai

    Yes, the quality is excellent

  5. Quesnel

    Aftar idiot

  6. Malami

    For now, I'll just know))))



Write a message