What are haplotype blocks and what is the effect of hybridization on these?

What are haplotype blocks and what is the effect of hybridization on these?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

In this PDF, there is a quick definition of haplotype blocks.

A haplotype block is a set of closely linked alleles/markers on a chromosome that, over evolutionary time, tend to be inherited together.

Also look at this definition:

We defined a haplotype block as a region over which a very small proportion (<5%) of comparisons among informative SNP pairs show strong evidence of historical recombination. [We allow for 5% because many forces other than recombination (both biological and artifactual) can disrupt haplotype patterns, such as recurrent mutation, gene conversion, or errors of genome assembly or genotyping.]

Although in the abstract they say:

haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed.

But the first definition confuses me because it really looks like the definition of linkage disequilibrium:

linkage disequilibrium is the non-random association of alleles at different loci in a given population.

Which means that they are inherited together (non-randomly inherited). The other definitions don't seem to agree in the same paper…

Just as a reminder a haplotype is: "The set of alleles found on a single sequence is referred to as a haplotype." (Hahn, M. W. 2018. Molecular Population Genetics. Sinauer Associates, Incorporated, Sunderland, Massachusetts.)


  1. What are haplotypes blocks precisely and compared to LD? How to estimate haplotypes blocks?

  2. Why is this an important concept (what is it informing us, why would want to study that specifically)?

  3. What influences haplotypes blocks and how is it useful to get some information when doing genomic analysis (can we account for that)?

Here are 2 relevant resources on the subject:

  • Daly, M. J., J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. 2001. High-resolution haplotype structure in the human genome. Nature Genetics 29:229-232.
  • Patil, N., A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi, C. R. Hacker, C. R. Kautzer, D. H. Lee, C. Marjoribanks, D. P. McDonough, B. T. N. Nguyen, M. C. Norris, J. B. Sheehan, N. Shen, D. Stern, R. P. Stokowski, D. J. Thomas, M. O. Trulson, K. R. Vyas, K. A. Frazer, S. P. A. Fodor, and D. R. Cox. 2001. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719-1723.

Association of the Lactase Persistence Haplotype Block With Disease Risk in Populations of European Descent

Among people of European descent, the ability to digest lactose into adulthood arose via strong positive selection of a highly advantageous allele encompassing the lactase gene. Lactose-tolerant and intolerant individuals may have different disease risks due to the shared genetics of their haplotype block. Therefore, the overall objective of the study was to assess the genetic association of the lactase persistence haplotype to disease risk. Using data from the 1000Genomes project, we estimated the size of the lactase persistence haplotype block to be 1.9 Mbp containing up to 9 protein-coding genes and a microRNA. Based on the function of the genes and microRNA, we studied health phenotypes likely to be impacted by the lactase persistence allele: prostate cancer status, cardiovascular disease status, and bone mineral density. We used summary statistics from large genome-wide metanalyses-32,965 bone mineral density, 140,306 prostate cancer and 184,305 coronary artery disease subjects-to evaluate whether the lactase persistence allele was associated with these disease phenotypes. Despite the fact that previous work demonstrated that the lactase persistence haplotype block harbors increased deleterious mutations, these results suggest little effect on the studied disease phenotypes.

Keywords: diet human evolution lactose lactose tolerance phenotype physiological traits population genetics selective sweep.

Copyright © 2020 Joslin, Durbin-Johnson, Britton, Settles, Korf and Lemay.


Expanded Manhattan plot of lactase…

Expanded Manhattan plot of lactase persistence haplotype block on chromosome 2 encompassing nine…

Wilcoxon rank sum tests showing…

Wilcoxon rank sum tests showing beta signed p -values for (A) prostate cancer…

Wilcoxon rank sum tests showing…

Wilcoxon rank sum tests showing beta signed p -values for (A) femoral neck…

Interaction of FKBP5, a stress-related gene, with childhood trauma increases the risk for attempting suicide

Childhood trauma is associated with hypothalamic-pituitary-adrenal (HPA) axis dysregulation and is a known risk factor for suicidal behavior. In this study we sought to determine whether the impact of childhood trauma on suicide risk might be modified by FKBP5, an HPA-axis regulating gene. Sixteen FKBP5 haplotype-tagging single nucleotide polymorphisms (SNPs) were genotyped in a sample of African Americans: 398 treatment-seeking patients with substance dependence (90% men 120 suicide attempters) and 432 nonsubstance-dependent individuals (40% men 21 suicide attempters). In all, 474 participants (112 suicide attempters) also completed the Childhood Trauma Questionnaire (CTQ). Primary haplotype analyses were conducted with the four SNPs implicated in earlier studies: rs3800373, rs9296158, rs1360780, and rs9470080. We found that childhood trauma was associated with suicide attempt (P<0.0001). Although there was no main effect of the two major yin yang haplotypes in the four SNP haplotype blocks, there was a haplotype influence on suicide risk (p=0.006) only in individuals exposed to high levels of childhood trauma. In this group, 51% with two copies of the risk haplotype, 36% with one copy, and 20% with no copies had attempted suicide. The total logistic regression model accounted for 13% of the variance in attempted suicide. Analyses of the 16 SNPs showed significant main effects on suicide attempt of rs3777747, rs4713902, and rs9470080 and interactive effects of rs3800373, rs9296158, and rs1360780 with CTQ score on suicide attempt. These data suggest that childhood trauma and variants of the FKBP5 gene may interact to increase the risk for attempting suicide.


FKBP5 11 single nucleotide polymorphism…

FKBP5 11 single nucleotide polymorphism (SNP) haplotype block structure and four SNP haplotype…

The interaction of FKBP5 yin…

The interaction of FKBP5 yin yang diplotypes with high levels of childhood trauma…

Frequencies of FKBP5 haplotypes in…

Frequencies of FKBP5 haplotypes in individuals who have attempted suicide compared with nonsuicide…

Interaction between FKBP5 haplotypes and…

Interaction between FKBP5 haplotypes and childhood trauma influence on suicide attempt. Haplotype H2…

Blocks and Haplotypes

Haploview generates blocks whenever a file is opened, but these blocks can be edited and redefined in a number of ways. In the Analysis menu, you can clear all the blocks in order to start over, define blocks based on one of several automated methods or customize the parameters of those algorithms. Additionally, the blocks can be edited by hand.

Confidence Intervals [DEFAULT]

The default algorithm is taken from Gabriel et al, Science, 2002. 95% confidence bounds on D prime are generated and each comparison is called "strong LD", "inconclusive" or "strong recombination". A block is created if 95% of informative (i.e. non-inconclusive) comparisons are "strong LD". This method by default ignores markers with MAF < 0.05. The MAF cutoff and the confidence bound cutoffs can be edited by choosing "Customize Block Definitions" (Analysis menu). This definition allows for many overlapping blocks to be valid. The default behavior is to sort the list of all possible blocks and start with the largest and keep adding blocks as long as they don't overlap with an already declared block.

This is a variant on the algorithm described in Wang et al, Am. J. Hum. Genet., 2002. For each marker pair, the population frequencies of the 4 possible two-marker haplotypes are computed. If all 4 are observed with at least frequency 0.01, a recombination is deemed to have taken place. Blocks are formed by consecutive markers where only 3 gametes are observed. The 1% cutoff can be edited to make the definition more or less stringent.

This internally developed method searches for a "spine" of strong LD running from one marker to another along the legs of the triangle in the LD chart (this would mean that the first and last markers in a block are in strong LD with all intermediate markers but that the intermediate markers are not necessarily in LD with each other).

Markers can be removed from blocks by clicking on the marker number (along the top of the D prime graph). Blocks can be defined by hand by clicking and dragging along the marker number row. Any block which overlaps with an existing block will take precedence and delete the existing block.

View haplotypes for selected blocks by clicking on the "Haplotypes" tab or selecting "Haplotypes" from the Display menu. Haplotypes are estimated using an accelerated EM algorithm similar to the partition/ligation method described in Qin et al, 2002, Am J Hum Genet. This creates highly accurate population frequency estimates of the phased haplotypes based on the maximum likelihood as determined from the unphased input.

The haplotype display shows each haplotype in a block with its population frequency and connections from one block to the next. In the crossing areas, a value of multiallelic D' is shown. This represents the level of recombination between the two blocks. Note that the value of multiallelic D' is computed for only the haplotypes ("alleles") currently displayed. This usually does not have a strong effect, as the rare haplotypes contribute only slightly to the overall value. Above the haplotypes are marker numbers along with a tick beneath haplotype tag SNPs (htSNPs).

The display can be edited using the controls at the bottom of the screen to display only more common haplotypes or to adjust the connecting lines. By default, alleles are displayed using A,C,G,T along with the special symbol 'X' which represents a fairly rare situation in which only one allele is unambiguously observed in phased data. The 'X' represents the allele of unknown identity. The display can also be changed to show the alleles numerically from 1-4 with 8 being the equivalent of 'X', or as blue and red boxes, with blue being the major allele and red the minor.

Haplotype tag SNPs are no longer displayed by default in the Haplotypes tab. It is recommended that all tagging be one via the Tagger tab. The block-by-block tags can be displayed by ticking the "Show tags in blocks" option in the Display menu.


Microarray data source

In a previous study, we surveyed genetic variation associated with differences in isoform level expression in humans ( 3 ). We characterized this effect in a sample of 57 unrelated HapMap individuals of European ancestry ( 17 ) for which ∼4 million single nucleotide polymorphism (SNP) genotypes are available. Lymphoblast cells derived from these individuals were grown in triplicates and RNA was extracted from each of these growths and hybridized onto an Affymetrix Human Exon array ( n = 171). The resulting probe-fluorescent intensities were used for the present analysis. We restricted our analysis to probes targeting core exons because of their high confidence annotation.

Effect of mismatches on hybridization

Probe expression signals were quantile-normalized and GC-background corrected using the Affymetrix Power Tools (APT) software package (Affymetrix). To investigate how mismatches affect probe-to-target hybridization on the Affymetrix Human Exon array, we took advantage of the high-resolution genotyping information available from HapMap cell lines and identified 6110 probes that were targeted to a region with only one SNP in at least 1 of the 57 HapMap individuals. These probes were selected because the exon and gene they targeted were considered expressed. Expression of an exon or gene was established using the detected above background (DABG) metric generated by Affymetrix. This metric represents the probability that an exon or gene is expressed below the background. We used false discovery rate (FDR) correction ( 18 ) to establish the significance threshold for expression above background at DABG ≤0.02 and DABG ≤0.043 for exons and genes, respectively. Next, we categorized each of these probes in 25 bins, depending on the position of the SNP within the target region (from 5′ to 3′ end). For each of these bins, we determined the fold change between the average probe intensity derived from individuals with a perfect complementary target region and the average probe intensity from individuals with one mismatch ( Figure 1).

Masking procedure

We have previously shown ( 3 , 19 ) that SNPs located within probe-targets affect their hybridization to Affymetrix Human Exon array probes and consequently cause erroneous expression estimates. To mitigate this effect, we devised a simple procedure that consists of removing all probes from the analysis whose target region contains a known SNP. In total, we found 21 843 core probes target regions out of 1 096 799 probes overlapping at least one polymorphic HapMap II SNP (release 21).

Preprocessing and summarization of hybridization data

To study how probe-to-target hybridization is affected by SNPs, we generated two data sets of exon and gene expression estimates. The APT software package was used to quantile-normalize and GC-background correct each data set at the probe level. The average probe set (representing exons) and meta-probe set (representing genes) expression scores (averaged from triplicates) for each data set were computed using the probe logarithmic error intensity model (Affymetrix). The first data set consisted of probe set and meta-probe set expression estimates produced by summarizing all core probes, regardless of polymorphic probe target regions. The second data set was generated by implementing our masking procedure (see above). Thus, probe set and meta-probe set expression scores, for this last data set, were estimated from probes where no HapMap SNP overlapped their target region.

Association analyses

For each of the two data sets, the first generated from the full core probe list and the second from the masked core probe list, we examined probe, exon, and transcript expression estimates (averaged from triplicate samples for each individual) for association with flanking HapMap SNPs (release 21). One of the objectives of our previous analysis ( 3 ) was to identify possible cis -regulatory determinants of differential alternative splicing. The presence of linkage disequilibrium in humans has created haplotype blocks, where SNPs in close proximity to each other escape rearrangements due to recombination. Therefore, assuming physical proximity of a regulatory variant to the target and to limit the cost of multiple testing, we only tested for SNPs within a 50-kb region flanking either side of the gene containing either the probe or probe set. It should be noted that the SNPs associated with a change in microarray hybridization intensity may either be the actual causative SNPs, or simply be in linkage disequilibrium (part of the same haplotype block) with the causative SNP. We measured the level of association between expression scores (probes, probe sets and meta-probe sets) and the genotypes of a given SNP using linear regression analysis, implemented in the Plink software package ( 20 ), under a codominant genetic model. This model considers genotypes AA, AB and BB as the independent discrete variable. The genotypes are encoded as 0, 1 and 2, respectively, whereas expression scores were considered a quantitative trait and treated as the dependent variable in the linear regression. Raw P -values were obtained from the linear regression using the standard asymptotic t -statistic. To correct for testing multiple SNPs against each probe set and meta-probe set expression values, we carried out permutation tests ( 21 ) followed by 5% FDR correction. Permutation analyses were performed using the ‘label swapping’ and ‘adaptive permutation’ options implemented in Plink. The ‘label swapping’ option is used to preserve the haplotype block structure and the ‘adaptive permutation’ algorithm allows for computationally efficient permutation analyses ( 20 ). Subsequently, we performed FDR corrections of 5% on the empirical P -values (from permutations) for association of genotype to the expression at the probe set ( P -value <9.73 × 10 −9 ) and meta-probe set levels ( P -value <6.07 × 10 −7 ).

Evaluation of SNP mask

To evaluate how SNPs in probe–target regions impacted our association analyses, we estimated the proportion of false-positive and false-negative associations due to polymorphic probe target regions. We treated the association results for the masked data set as the reference (true) data set because they were derived from expression estimates free of influence from known SNPs. This reference data set (see Supplementary Tables 1 and 2 ) enables us to evaluate the four scenarios described in Table 1 . Associations of probe set or meta-probe set, which were significant ( P -value below the thresholds) and non-significant ( P -value above thresholds) in both masked and unmasked data sets, were classified as true positives and true negatives, respectively. We consider a result a false-positive when a significant association is found in the unmasked data set, but becomes non-significant after masking probes containing SNPs (masked data set). Conversely, associations that were non-significant in the unmasked data set but significant in the masked data set were categorized as false-negatives. The false-positive and -negative rates are computed by: FPR = FP/(FP + TP) and FNR = FN/(FN + TN), respectively. In order to avoid the problem of reduced coverage within the masked data, the above analysis does not include probe sets which were entirely ‘masked’ due to the presence of SNPs.


Incorporating Linkage Information to Map Targets of Selection in E&R Data

While most E&R studies assume independence among SNPs ( Turner et al. 2011 Burke et al. 2010 Orozco-terWengel et al. 2012 Topa et al. 2015), it has been recognized that the inclusion of linkage information could improve the mapping of selection targets. Kessner and Novembre (2015) used linkage information from the founder haplotypes to determine haplotype frequencies in moderately sized (200 kb) windows for the evolved populations. The increase in accuracy of their approach stems from a more accurate frequency estimate compared to single marker analyses. In another recent approach Terhorst et al. (2015) describe a multi-locus model of selection for replicated time-series data. Here, a small number of SNPs adjacent to each focal SNP are used to increase the information content of the data by taking into account the local haplotype structure and recombination. While this improves the inference of the selected SNP, the actual haplotype structure in the evolved populations is a nuisance parameter and remains ultimately unknown.

Our approach differs from these two methods by primarily focusing on the reconstruction of selected haplotypes. Furthermore, we validate for the first time reconstructed haplotypes experimentally by sequencing evolved flies. We anticipate that time series trajectories of selected haplotypes will not only facilitate the mapping of targets of selection, but also provide the unique opportunity to match the observed patterns against the expectations of classic population genetics models.

Impact of IBD Regions

Natural D. melanogaster populations have low levels of linkage disequilibrium ( Mackay et al. 2012 Langley et al. 2012 Franssen et al. 2015). Nevertheless, it is becoming increasingly clear that large genomic regions can be shared among individuals from the same population. Such regions of identity by descent (IBD) are due to sampling of related individuals, which is to be expected from local population structure. In the context of adaptation, however, they have a severe impact on the mapping of selected alleles. The comparison of the reconstructed haplotype to the haplotype-block identified from a subset of the founder chromosomes ( Franssen et al. 2015) is a particularly good demonstration of how IBD blocks could result in wrong conclusions. Since the selected chromosome shared a large IBD region with another chromosome, which does not include the target of selection, the IBD region may be incorrectly identified as the target of selection if the selected haplotype is not known. While sequencing of all founder haplotypes would avoid this problem, previous suggestions to increase the number of founder haplotypes for a reliable mapping of selection targets ( Kofler and Schlötterer 2013 Baldwin-Brown et al. 2014) argue against this strategy. Furthermore, experiments starting from freshly established isofemale lines contain multiple haplotypes (at least four with their resulting recombinants), which further complicates the inference of founder haplotypes. The haplotype reconstruction method introduced here seems a more promising and resource efficient approach.

One further challenge of IBD regions for the reconstruction of selected haplotypes arises when they are shared between the selected chromosome and multiple non-selected haplotypes resulting in an intermediate frequency of the IBD region. In this setting, the selected haplotype will not carry any haplotype specific markers in the IBD region and thus prevent the extension of the haplotype-block across the IBD region. Therefore, it is still possible that the length of the selected haplotype-block is severely underestimated.

Limitations of the Haplotype Reconstruction

Our approach to reconstruct haplotypes is targeted at selected alleles that are present at low frequencies in the founder population. While this seems to be the predominant genomic response in D. melanogaster populations adapting to hot environments ( Tobler et al. 2014 Franssen et al. 2015), other studies have found that selected alleles are at intermediate frequencies ( Turner et al. 2011 Turner and Miller 2012). Such common alleles are expected to occur in multiple chromosomal backgrounds with few sites being in high LD, which results in fewer hitchhikers obscuring signal from the selection targets (e.g., supplementary fig. S33 in Kofler and Schlötterer 2013). The reconstruction of high frequency selected clusters may therefore be more limited. On the other hand, if only a small number of high frequency candidate SNPs emerge from the analysis, the identification of the actual targets of selection is substantially simplified relative to cases of low starting frequencies.

Introgression and selection shaped the evolutionary history of sympatric sister-species of coral reef fishes (genus: Haemulon)

Closely related marine species with large overlapping ranges provide opportunities to study mechanisms of speciation, particularly when there is evidence of gene flow between such lineages. Here, we focus on a case of hybridization between the sympatric sister-species Haemulon maculicauda and H. flaviguttatum, using Sanger sequencing of mitochondrial and nuclear loci, as well as 2422 single nucleotide polymorphisms (SNPs) obtained via restriction site-associated DNA sequencing (RADSeq). Mitochondrial markers revealed a shared haplotype for COI and low divergence for CytB and CR between the sister-species. On the other hand, complete lineage sorting was observed at the nuclear loci and most of the SNPs. Under neutral expectations, the smaller effective population size of mtDNA should lead to fixation of mutations faster than nDNA. Thus, these results suggest that hybridization in the recent past (0.174-0.263 Ma) led to introgression of the mtDNA, with little effect on the nuclear genome. Analyses of the SNP data revealed 28 loci potentially under divergent selection between the two species. The combination of mtDNA introgression and limited nuclear DNA introgression provides a mechanism for the evolution of independent lineages despite recurrent hybridization events. This study adds to the growing body of research that exemplifies how genetic divergence can be maintained in the presence of gene flow between closely related species.

Keywords: RADSeq Tropical Eastern Pacific gene flow genomics hybridization mito-nuclear discrepancies speciation.


Heterogeneity in diversification rates among lineages is a major factor shaping biodiversity. Yet, biological and environmental factors underlying this variation are incompletely understood 1 . Adaptive radiations are prime study systems to learn about these factors as they are characterized by a rapid origin of many species showing a diversity of ecological adaptations. This process requires high levels of heritable variation in traits related to ecological and reproductive isolation. However, adaptive radiations are often too rapid for the emergence of new relevant mutations between successive speciation events 2 and are thus more likely to stem from standing variation. Hybridization between species can instantaneously boost genetic variation, which may facilitate speciation and adaptive radiation 3,4,5,6,7,8,9 .

Hybridization among members of an adaptive radiation, has been suggested to potentially facilitate speciation events, an idea known as the ‘syngameon hypothesis’ 3 . Introgression of traits involved in adaptation or reproductive isolation has been demonstrated among members of several adaptive radiations (for example, traits related to host shift in Rhagoletis fruit flies 10 , wing patterns in Heliconius butterflies 11,12 or beak shape in Darwin’s finches 13 ). In other radiations, the hybrid ancestry of some species has been inferred, but a direct link between introgressed traits and speciation awaits further testing (for example, cichlid fishes of Lakes Tanganyika 14,15,16 , Malawi 17 , Victoria 18,19 and Barombi Mbo 20 ).

Another hypothesis for a perhaps more fundamental role of hybridization in adaptive radiation, distinct from the ‘syngameon hypothesis’, is the idea that hybridization between distinct lineages may seed the onset of an entire adaptive radiation 3 . Such hybridization can be common when allopatric lineages come into secondary contact 4 , and selection against hybrids may be weak during colonization of new environments. In this situation, the formation of a hybrid swarm, if coincident with ecological opportunity, may accelerate adaptive radiation by (i) providing functional genetic variation that can recombine into novel trait combinations favoured by selection and mate choice, and (ii) breaking genetic correlations that constrained the evolvability of parental lineages 3 . In addition, hybridization may facilitate speciation when multiple fixed differences that confer reproductive isolation between the two parental species decouple and segregate in a hybrid swarm, such that selection against incompatible gene combinations can generate more than two new reproductively isolated species 21,22,23 .

This ‘hybrid swarm origin of adaptive radiation’ hypothesis has been more challenging to test. So far the only adaptive radiation for which a hybrid origin has been robustly demonstrated is the Hawaiian silverswords, which have radiated from an allopolyploid hybrid population between two North American tarweed species 24 . Because gene and genome duplication are also proposed to facilitate adaptive radiation 25 , it is difficult to distinguish between effects of hybridization per se and those of gene or genome duplication in this case. Evidence consistent with a hybrid swarm origin of entire radiations has also been found in Alpine whitefish 26 , the ‘mbuna’ group of the Lake Malawi cichlid fish radiation 27 , and allopolyploid Hawaiian endemic mints 28,29 and possibly other polyploid plant radiations on Hawaii 30 . However, it remains to be tested if, in these systems, hybridization occurred before or after the radiation had started, and if hybridization-derived polymorphisms played a role in speciation and adaptive diversification.

The Lake Victoria Region Superflock of cichlid fish (LVRS) is a group of 700 haplochromine cichlid species endemic to the region around Lake Victoria and nearby western rift lakes in East Africa that started diversifying about 100–200 thousand years ago 31,32,33,34 . It includes several adaptive radiations, one in each of the major lakes of the region (Lakes Victoria, Edward, Albert and Kivu). The largest of them is in Lake Victoria, which has at least 500 endemic species that evolved in the past 15,000 years 34,35,36 . Each radiation comprises enormous diversity in habitat occupation, trophic ecology, colouration and behaviour. The high diversification rate but also the high nuclear genomic variation in the LVRS despite its young age 34,37 suggest that large amounts of standing genetic variation must have been present at the onset of the radiation 2,3 . Previous work showing cytonuclear discordance in phylogenetic reconstructions between LVRS and several riverine cichlid species raised the possibility of ancient hybridization between divergent species at the base of the radiation but could not demonstrate it 37 . Hybridization on secondary contact is not unlikely, as allopatric cichlid species, divergent by even millions of years, readily produce fertile offspring in the lab 38 .

Using genomic data from riverine haplochromine cichlids sampled from all major African drainage systems, and representative species from all lineages within the Lake Victoria region, we demonstrate here that the LVRS evolved from a hybrid swarm. All lake radiations show very similar proportions of mixed ancestry derived from two distantly related haplochromine lineages that had evolved in isolation from one another in different river systems for more than a million years before hybridizing in the Lake Victoria region. We find evidence that this hybridization event facilitated subsequent adaptive radiation by providing genetic variation that has been recombined and sorted into many new species. Variants that were fixed between the parental lineages show accentuated differentiation between young Lake Victoria species, but appear in many new combinations in the different species. Notably, each of the two major allele classes of an opsin gene involved in adaptation and speciation among Lake Victoria cichlids 39,40 is likely derived from one of the two parental lineages. This indicates that a major part of the variation at this gene segregating in the LVRS stems from hybridization between these lineages. Our results suggest that hybridization between relatively distantly related species, when coincident with ecological opportunity, may facilitate rapid adaptive radiation. Thus, hybridization, even in the distant past, may have important implications for understanding variation in extant species richness between lineages as well as variation in recent rates of diversification.


Human genomes are diploid and, for their complete description and interpretation, it is necessary not only to discover the variation they contain but also to arrange it onto chromosomal haplotypes. Although whole-genome sequencing is becoming increasingly routine, nearly all such individual genomes are mostly unresolved with respect to haplotype, particularly for rare alleles, which remain poorly resolved by inferential methods. Here, we review emerging technologies for experimentally resolving (that is, 'phasing') haplotypes across individual whole-genome sequences. We also discuss computational methods relevant to their implementation, metrics for assessing their accuracy and completeness, and the relevance of haplotype information to applications of genome sequencing in research and clinical medicine.


Even in the relatively short time (

10 years) since genomic data have been applied to population genetic questions in nonmodel organisms, population genomics has already helped answer a wide variety of questions in the biology of wildlife species. There has been a relatively slow uptake of population genomics results in influencing policy decisions and wildlife management actions (Shafer et al., 2015 ), with a number of factors contributing to significant time lags: researchers learning how to apply population genomics in wildlife species, studies being completed through publication of results, communicating results and interpretation of genomic data to conservation practitioners, integrating genomic results into the many sources of information that influence policy decisions or conservation actions, etc. Nonetheless, a decade on, examples of direct connections between population genomics research and wildlife conservation actions are now rapidly accumulating (Walters & Schwartz, 2020 ). A remaining question, however, is whether population genomics can help stem the tide of cataclysmic biodiversity declines given the accelerating urgency of the problems.

Population genomics research is by nature intensive and focused on one or a few species. It has, therefore, been applied to wildlife species that are high-profile or of significant economic interest, such as captive populations or salmonid fish (Waples et al., 2020 ), although the decreasing costs of genomic studies and proliferation of resources such as reference genome assemblies have allowed these techniques to spread across taxa, and this trend will continue. Future directions include expanding the “omics” toolkit to include transcriptomics, epigenomics or proteomics, which may improve our understanding of adaptive capacity in wildlife populations and the role of gene expression, epigenetics and phenotypic plasticity in population fitness. There may also be a role for genetic engineering techniques in wildlife, such as gene therapy or gene drive approaches to cause alleles to spread in a population (Breed et al., 2019 Rode et al., 2019 ). In species that suffer from a well-understood, relatively simple genetic problem, it could be conceivable to use a “rescue drive” – an attempt to spread a favoured allele into a population to increase fitness (Rode et al., 2019 ). However, this approach carries numerous poorly understood risks, including the pitfalls associated with focusing management on a narrow set of genetic factors (Kardos & Shafer, 2018 ). Another approach is to use gene drive techniques to control or eradicate invasive species that negatively affect native wildlife (Rode et al., 2019 ). While invasive species can often require active management, and some level of risk may be acceptable compared to taking no action, the risks of such eradication or suppression drives are still poorly known.

A future need in conservation is to understand how population genomics tools can be applied more broadly beyond single focal species, for instance at the ecosystem level (Breed et al., 2019 ). One avenue is metagenomics or metabarcoding approaches, where genetic samples include multiple species, for instance with eDNA (Goldberg & Parsley, 2020 ). Population genomics focused on species that are central to ecosystem interactions may also reveal the community effects of genomic diversity (Hand et al., 2015 ). These may often be plants, such as the dominant tree species in a forest ecosystem in which many other species are affected by its genetics, and genomics tools can be important for seed sourcing in restoration efforts (Breed et al., 2019 ). In other cases, wildlife species may play a similar role.

The field of population genomics continues to change rapidly, with technological and analytical advances expanding the tools that are available in wildlife biology at the same time as the need for conservation knowledge and action becomes more urgent. While it may be very difficult to keep up to date with all of the changes, it is critical for both researchers and wildlife professionals to maintain a broad understanding of the population genomics tools that are available and to foster communication between wildlife scientists and practitioners.

Watch the video: Βιολογία #1: Τι είναι το DNA και πως λειτουργεί; (August 2022).