GWAS: marker map based on old genome assembly - can I use the new assembly to look for SNP positions?

GWAS: marker map based on old genome assembly - can I use the new assembly to look for SNP positions?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have an old set of pig SNP data and the positions were mapped using v10.2 of the assembly.

I submitted a GWAS paper for review, and the reviewer wants to know if any of the significant SNPs that were located in previously unknown genes are now in known genes, with the release of the latest assembly last year.

I was under the impression that I had to use the assembly the marker map was based on to see where SNPs are located. Is this correct or can I use the new assembly?

Many thanks in advance!

Why haven't you tried using LiftOver to convert the coordinates?

GWAS: Fast-forwarding gene identification and characterization in temperate Cereals: lessons from Barley – A review

Understanding the genetic complexity of traits is an important objective of small grain temperate cereals yield and adaptation improvements. Bi-parental quantitative trait loci (QTL) linkage mapping is a powerful method to identify genetic regions that co-segregate in the trait of interest within the research population. However, recently, association or linkage disequilibrium (LD) mapping using a genome-wide association study (GWAS) became an approach for unraveling the molecular genetic basis underlying the natural phenotypic variation. Many causative allele(s)/loci have been identified using the power of this approach which had not been detected in QTL mapping populations. In barley (Hordeum vulgare L.), GWAS has been successfully applied to define the causative allele(s)/loci which can be used in the breeding crop for adaptation and yield improvement. This promising approach represents a tremendous step forward in genetic analysis and undoubtedly proved it is a valuable tool in the identification of candidate genes. In this review, we describe the recently used approach for genetic analyses (linkage mapping or association mapping), and then provide the basic genetic and statistical concepts of GWAS, and subsequently highlight the genetic discoveries using GWAS. The review explained how the candidate gene(s) can be detected using state-of-art bioinformatic tools.


Genome positions are best represented in BED format. UCSC provides tools to convert BED file from one genome assembly to another.

Binary liftOver tool

Provide BED format file (e.g. input.bed)

NOTE: Use the 'chr' before each chromosome name

unlifted.bed file will contain all genome positions that cannot be lifted. The reason for that varies. See Various reasons that lift over could fail

Web interface

Alternatively, you can lift over BED file in web interface at: Link Web interface can tell you why some genome position cannot be lifted if you click "Explain failure messages"


General descriptive statistics

General descriptive statistics of the traits analyzed are detailed in Table 1. There were five traits studied in spring barley, and five traits in winter wheat. The number of plots in powdery mildew and ramularia were less than for other traits because only one plot was recorded per year and per location in this study. In general, the distribution of all traits showed close to normal distributions, but powdery mildew was skewed as 90% of all records were scored as no infection. The PCA pointed out that the first two principal components explained 33% and 12% of the total variance among markers for spring barley, and 54% and 11% for winter wheat (Fig. 1). Based on the genomic information, the results showed that lines, in general, were grouped in families in both spring barley and winter wheat. The heat-map of the genomic relationship (data not shown as similar results were reported 26,27 ) using similar dataset also lead to the same result for both crops 26,27 . The square root of narrow-sense plot genomic heritabilities for all traits are given in Fig. 2 for barley and in Fig. 3 for winter wheat. For spring barley, the narrow-sense plot genomic heritability was 24% in yield, 11% in powdery mildew, and 13% in ramularia. For winter wheat, the narrow-sense plot genomic heritability was 33% in yield, 39% in protein content, and 12% in Zeleny value. The phenotypic correlation of traits were generally lower than genetic correlation across different traits in both species (Tables 2 and 3).


Phenotypic evaluation of seedling trait architecture under contrasting growth conditions

In total, ten traits were recorded under two different moisture treatments. Additionally, DSI was calculated and used as a derived trait for GWAS. A wide range of phenotypic variation was detected for all traits (Table 1, Additional file 1: Table S3). The phenotypic BLUEs of all traits along with their variance decreased drastically under stress treatment (Fig. 1). The lowest reductions were observed for Sl and Sdw (7 and 9%, respectively), while the strongest reductions were obtained for Trv and Rdw (50 and 36%, respectively). Considerable reductions in the coefficient of variation (CV) in the osmotic stress treatment were observed for the four image-based root traits, ranging from 4% (Rth) to 51% (Trv) (Additional file 1: Table S3). Furthermore, heritability values were lower under stress (range 0.20–0.68) compared to non-stress conditions (range 0.37–0.78 Additional file 1: Table S4), which is comparable to the manually recorded traits [69]. In both treatments, Trl and Nmr had the lowest heritability. In non-stress conditions, all further traits exhibited heritabilities > 0.5, the highest displayed by Rsr (0.78) and Rth (0.75), while in stress conditions, seven traits showed heritabilities < 0.5. The three traits with higher heritability in stress were Rl, Sdw and Sl (0.68, 0.54 and 0.53, respectively). The lower heritability values and coefficients of variation under imposed stress conditions indicate the complexity of the genotypic response to stress and environmental variation. Further, the reduction response was most pronounced for Trv (50.5%), followed by Rdw and Rl (35.9 and 28.6%, respectively), which suggests that root biomass components were affected most when stress was imposed, while Rth and Nmr were less strong reduced (17.4 and 11.9%, respectively). Least affected were Sl and Sdw (7.1 and 9.1%, respectively), demonstrating most accessions were investing more into shoot biomass under osmotic stress.

Box plots for root and seedling traits. Centre lines show the medians box limits indicate the 25th and 75th percentiles as determined by R software whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots data points are plotted as open circles. Trait values for Rsc, Rss, Rthc and Rths are transformed by multiplying by 100 for visualizations

Significant positive correlations were observed among seedling traits within both treatments, except for Rs (Fig. 2). Among the image-based traits, Trv showed the strongest positive correlation with Rdw across treatments, whereas Nmr and Rth displayed moderate to low-correlation values. In general, stress reduced most of the correlations among various traits but they showed similar relationships under both treatments in line with [21].

Correlations between root and seedling traits in non-stress or osmotic stress conditions. Correlations are displayed as heatmap and as numerical value. Red = negative correlation, blue = positive correlation. The part above the diagonal presents correlations of traits only within non-stress treatment and below the diagonal only within stress treatment. Along the diagonal correlations between the same trait in both treatment are displayed. Correlations values above 0.2 and below −0.2 are significant (P < 0.01)

Influence of population structure

The present panel comprised 223 genotypes from a well-studied population and ten additional genotypes. Population structure in this population was extensively investigated by [46] and revealed six subgroups based on row type and geographic origin. In GWAS, the kinship was fully sufficient to control for the confounding effects of population structure. Nevertheless, we have investigated population structure in the extended panel using Structure 2.0. In consistence with [46] the collection clustered according to row type and origin (Additional file 1: Table S5, Additional file 2: supplementary note and Figure S2). All nine additional two-rowed genotypes clustered within group 5 (European two-rowed genotypes), while the only additional six-rowed genotype clustered among the six-rowed European group 9. We tested if the main factor of population structure (row type) affected our phenotypic traits. In control, only one trait was significantly different (Trlc), while under stress, six traits showed differences, in all cases two-rowed genotypes were performing better (Additional file 1: Table S6).

Genome-wide association scans for root and shoot traits

A set of 4966 mapped and quality-filtered SNP markers were used for GWAS, which were evenly distributed over all seven chromosomes with an average spacing of 4.97 cM. The number of markers varied among chromosomes with a minimum of 483 SNPs on chromosome 1H and a maximum of 967 SNPs on chromosome 5H (Additional file 1: Table S7).

In total, 519 marker-trait associations were detected (Additional file 1: Table S8), based on 323 SNPs that were associated with one (234 SNPs) and or two and up to five traits (89 SNPs). For all traits analyzed, significant associations were detected for nineteen traits across all seven chromosomes (Figs. 3 and 4), while no significant associations were identified for Nmrs and Dsi.

Manhattan plots of 12 out of 21 root and shoot traits. Horizontal axis presents seven chromosomes (1H–7H) of the barley genome. Vertical axis shows -log10(P) values of the marker-trait associations. Horizontal green line shows the threshold value based on FDR (0.05). Additionally, dashed line signifies threshold of -log(p) = 4.0

Manhattan plots of 9 out of 21 root and shoot traits. Horizontal axis presents seven chromosomes (1H–7H) of the barley genome. Vertical axis shows -log10(P) values of the marker-trait associations. Horizontal green line shows the threshold value based on FDR (0.05). Additionally, dashed line signifies threshold of -log(p) = 4.0

Associated SNPs in close distance were grouped into QTL regions based on the average LD decay of

3.5 cM (data not shown) due to differential LD blocks for individual chromosomes and thus a variable decay across the chromosomes (Additional file 2: Figure S3). This enabled us to detect 65 QTL regions (Fig. 5, Additional file 1: Table S8). The highest number of QTL was identified on chromosomes 2H (13), 5H (11), 3H (10) and 7H (10), and the lowest found on 6H (3). In total, 26 out of all 65 QTL regions correspond to traits from both treatments, while 27 and 12 correspond to traits from non-stress or stress treatment, respectively. This may reflect the reduced heritabilities obtained under stress conditions.

Genetic positions (cM) of 65 QTL regions for root and shoot seedling architecture placed on a schematic map of the seven barley chromosomes along with the corresponding QTL name (see Additional file 1: Table S5 for all details). QTL-hotspots are highlighted in green, root specific QTL in orange, stress-specific QTL in pink and the remaining non-specific QTL in black. Centromeric regions are indicated by red segments

Further, we defined a QTL as a hotspot QTL if at least five out of all ten traits were associated to this region. In total, we identified eleven hotspot QTL and observed a concentration of five hotspots alone on 2H (QTL-2H-3, QTL-2H-6, QTL-2H-7, QTL-2H-8, QTL-2H-11), while the remaining were located on 1H (QTL-1H-3), 4H (QTL-4H-4), 5H (QTL-5H-1, QTL-5H-6) and 7H (QTL-7H-6, QTL-7H-10). All hotspots were associated with traits from both treatments.

Candidate gene exploration and testing

By educated guess, a list of candidate genes (CGs) was established (Additional file 1: Table S8). We identified developmental, flowering time, stress-related and root-affecting CGs from the recently annotated barley genome assembly [70]. Additionally, root-morphology related gene homologues from Arabidopsis, rice and maize were identified. Thereby, CGs near to our QTL regions were identified which we think are suitable candidates for the associated traits. However, the approach is not accurate, in particular in the centromeric regions where recombination is low and QTL interval is large. To further test polymorphisms within the potential CGs for associations we assembled the polymorphisms of these CGs by using two approaches i) utilization of GBS data for the IPK barley collection [84] and ii) re-sequencing CG fragments within the association panel.

Out of 113 potential CGs (Additional file 1: Table S8), 128 SNPs from 39 CGs were retrieved by GBS approach. However, after filtering for missing data and MAF, 71 SNPs from 29 CGs were tested for associations to our 21 traits with the same model as in GWAS. Association between traits and CG-SNPs with –log(p)-value > 2 were obtained for 17 different CGs (Additional file 1: Table S9), revealing a number of 12 CGs that can be excluded as CGs as they had weaker associations. For nine CGs, the –log(p)-value in GWAS was higher in the corresponding QTL region. This suggests that we miss either the causal SNP or we have not hit the right CG and therefore these nine genes remain potential CGs. Nevertheless, 15 associations corresponding to eight CGs were of similar –log(p)-value as in GWAS (difference maximal 0.3 lower) or of even higher –log(p)-value and therefore support the gene as a CG. The most striking result came from a SNP of PIN7 associated with Rthc, where the –log(p)-value was 5.92 while in GWAS the highest was only 4.02 in the corresponding QTL region.

In the second approach, fragments of twelve CGs were re-sequenced for the majority of accessions from the GWAS panel (Additional file 1: Table S2). These genes comprised two flowering time genes (HvPpd-H1 and HvCEN) and ten root growth related candidate genes, two of them chosen outside of QTL regions detected in this study. From the twelve CG fragments, eight showed significant trait associations (Additional file 1: Table S10, Additional file 2: Fig. S4), while four genes were rejected as CGs (HvCEN, ABP1, PRC2, PIN2). Six CGs had associations similar or higher compared to GWAS and are supported as the right CG (HvPpD-H1, TRX-m3, HvEXPB1, WOX5, PIN5, HvARF04). The two other genes remain potential CGs (HvCKX5, GNOM).

Two CGs were covered by both approaches, GBS and re-sequencing. Both revealed a rejection of PIN2 as CG behind root-specific QTL-7H-9 and both suggest a role of TRX-m3 as a CG for hotspot QTL-2H-6. Nevertheless, some associations from TRX-m3 based on re-sequencing were similar or higher compared to GWAS and therefore support TRX-m3 as a CG, while by GBS approach the gene remains a potential CG. However, only one SNP form the gene was available in GBS data compared to seven from re-sequencing.

In summary, 39 CGs were tested for associations to phenotypic traits by one or two of the CG-association approaches. In total, fourteen CGs are supported as candidates, while ten remain potential CGs (association not as high as in GWAS) and fifteen can be rejected as a CG (Table 2).

Structural genomic variation in rose

Interspecific genomic variation

In the genus Rosa the 2C DNA amount in diploid species ranges from 0.78 pg in Rosa xanthina (section Pimpinellifoliae) to 1.29 pg in ‘Félicité and Perpétue’ (hybrid Sempervirens) 67 and from 0.73 pg in R. zhongdianensis (section Pimpinellifoliae) to 1.77 pg in R. brunonii (section Synstylae) 68 . The recently sequenced R. chinensis is among the largest genomes known among diploid roses (estimated at 1.16 pg 67 to 1.67 pg 68 ). Roses from the Pimpinellifoliae section generally have the smallest genome size, while Synstylae roses have the largest genomes 68 .

Genome size variation in angiosperms is typically associated with two types of events: whole-genome duplication (WGD) or transposable element amplification 69,70 . With regard to the latter, approximately 68% of the R. chinensis reference genome sequence consists of transposable elements, especially long-terminal repeat retrotransposons like Gypsy and Copia elements 5 . For most transposable element families, a two-fold higher abundance was found in Rosa as compared to Fragaria vesca 5 , explaining a substantial part of the genome size difference between R. chinensis and F. vesca. Shallow shotgun sequencing of a comprehensive sets of species across the genus Rosa and subsequent clustering and quantification of repetitive sequences will reveal if species-specific repetitive elements exist. More extensive (re)sequencing will be needed to ascertain whether differential amplification of transposable elements can explain the variation in genome size among rose species.

Besides the role of transposable elements in genome size evolution, the insertion of copia elements into specific protein-coding genes 5,71 gave rise to two of the most important horticultural traits: double flower and recurrent blooming (see below). Rose breeding in general may have favoured species from sections with large genomes, as transposon activity is known to create allelic diversity 72 . A related question is whether retrotransposon activity is higher in tetraploid taxa compared to diploids, and whether this has also led to functional variation that is useful for the ornamental value of roses.

Whole-genome assembly and gene prediction has revealed no signs of recent WGD events in R. chinensis 4,5 . In addition, despite the difference in genome size, the 240 Mb F. vesca genome contains 34,809 predicted genes 73 , while the 560 Mb ‘Old Blush’ R. chinensis genome has only fractionally more predicted genes (39,669 genes 5 36,377 genes 4 ).

Rose comparative genomics

From an evolutionary point of view, rose is a very interesting model species as it includes species at several ploidy levels as well as many cultivars with a hybrid origin 74 . The sequencing of thousands of individuals in Arabidospis thaliana 75 , rice 76 and maize 77 has demonstrated extensive differences in genome constitution even among accessions within a species, leading to the definition of ‘core’ genes (which are present in all members of a species), and ‘distributed’ or ‘dispensable’ genes (present in a subset of members). The ‘pan-genome’ represents the full genome complement across all sampled members.

Metagenome-like assembly strategies in rice 76 and an analysis of 3010 re-sequenced rice accessions 78 revealed that ‘distributed’ gene families showed enrichment in regulation of immune and defence responses. Other studies also unveiled their role in adaptation to abiotic and biotic stresses 79 , species diversification and development of novel gene functions 80 . (Meta)genome assembly and gene annotation of species across the Rosa genus and in closely related species, and subsequent comparative genomics will be required to define whether rose also has conserved and lineage-, section- and/or species-specific genes. One of the possible hypotheses is that resistance (R) genes behave as a dispensable group gene family, while susceptibility (S) genes generally would be members of core gene families. For such questions to be answered, it will be necessary to collect and analyse genome sequences of species and accessions across the genus that represent the taxonomic diversity, but also the diversity in various abiotic and biotic conditions in which roses grow.

Among the most interesting cases to study in roses is the question of resistance genes. The sequencing of bacterial artificial chromosome (BAC) clones around the Rdr1 locus, contributing to the rose black spot resistance, in R. multiflora (9 TIR-NBS-LRR (TNL) genes) and R. rugosa (11 TNL genes) enabled both rearrangements and duplications in this locus to be discovered 81,82 . A larger analysis of clusters of resistance genes in roses is needed to understand how their organization and evolution can be associated with resistance levels.

Genome-Wide Association Study (GWAS) for Growth Rate and Age at Sexual Maturation in Atlantic Salmon (Salmo salar)

Growth and age at sexual maturation are among the most important economic traits in Atlantic salmon (Salmo salar) aquaculture, with fish having been continuously selected for these traits since the beginning of selective breeding programs in Norway in the 1970s [1&ndash3]. Early sexual maturation is considered a serious drawback for aquaculture as it retards growth for several months while affecting flesh quality and overall production times [4,5]. Late maturation on the other hand, is a desirable trait in many fish breeding programs because it can lead to larger body size and higher fecundity [6&ndash8] however, late maturation also means longer generation times, which could be disadvantageous for aquaculture production where short generation intervals are considered to be beneficial since increased growth rate and decreased age at maturity allows farm animal producers to shorten a production cycle. These traits, growth and age at sexual maturation, are complex physiological processes controlled by genetic and environmental factors, including qualitative and quantitative aspects of nutritional availability, behavioral interactions conditioned by intra- and inter-specific demographics and seasonal changes [9,10]. The effect of multiple factors on these traits and their heritable variation between and within populations of Atlantic salmon has been described by Garcia de Leaniz et al. [11]. In spite of the high number of environmental factors that may directly or indirectly influence growth, the heritabilities of growth and age at maturity have been reported to show moderate levels in most cases [2,3,11&ndash15]. This scenario makes artificial selection for these traits plausible, which allows their improvement by means of selective breeding.

Advances in genomic research have significantly improved the tools available for the study of commercially important traits using quantitative trait loci (QTL) analyses. The use of molecular markers spread across the genome and the effects of allele segregation are employed to determine the number, positions and the magnitude of the QTL affecting a trait by statistical associations between marker genotype and particular trait phenotypes. The ability to determine the chromosomal regions that affect economically important traits has led to the implementation of selective breeding based on genetic selection practices by identifying animals with favorable genotypes. In Atlantic salmon aquaculture, marker assisted selection (MAS) could be a valuable addition to current selective breeding programs by improving the accuracy of selection, and therefore the genetic gain [16]. This is particularly true for those traits like disease resistance, flesh quality and pigment uptake that are difficult to measure on the actual broodstock [17]. In this regard, a practical example of MAS in Atlantic salmon is the use of a QTL explaining a high proportion of the total variance for resistance against Infectious Pancreatic Necrosis virus [18&ndash20], which is being implemented into commercial breeding programs.

In the past decades, QTL in salmonids have been mainly detected by means of linkage mapping methods using a relatively small number of microsatellite markers distributed across the genome. Consequently, most identified QTL span a large chromosomal region, from which the identification of a causative gene(s) is problematic. Several studies have described QTL associated with growth and/or sexual maturation in salmonid species including Atlantic salmon [21&ndash25], rainbow trout [26&ndash29], coho salmon [30,31] and Arctic charr [24,32]. Yet, information regarding candidate genes located in these QTL regions is scarce for most species. In rainbow trout, however, a Clock gene was localized in a strong spawning time QTL [33]. Previous studies carried out in other salmonid species like rainbow trout and Arctic charr have shown an apparent link between QTL for sexual maturation and growth [24,26,28] however, recent evidence suggests that in Atlantic salmon these traits are independent of one another [34].

With the emergence of new sequencing technologies it is possible to obtain thousands of single nucleotide polymorphism (SNP) markers for population genotyping, which has allowed the construction of high density genetic maps [35]. The current SNP array technologies provide better tools for the identification of QTL and specific markers associated with traits of interest than was possible using microsatellite markers. For Atlantic salmon in particular, a 6.5K SNP was developed [36,37] resulting in a SNP linkage map with &sim 5,500 markers [38]. We previously made use of this SNP array to conduct QTL analyses for body weight at different times during a production cycle of Atlantic salmon [39] and subsequently for age at sexual maturation [34]. We were curious to determine if similar results (i.e., genomic locations and specific genetic markers) would be obtained using a different statistical approach, namely a genome-wide association study (GWAS). GWAS evaluates the association of genetic markers with a trait relying on the levels of linkage disequilibrium (LD) between the markers and the genetic variation affecting the trait, testing for association of each SNP and therefore, making possible the identification of specific alleles associated with the trait.

Here, we present the results of our GWAS in the Cermaq (formerly Mainstream) Canada Atlantic salmon broodstock population making use of the Atlantic salmon 6.5K SNP array [36,37]. We analyzed growth in this commercial population in terms of days to reach 5 kg and also age at sexual maturation. As GWAS does not rely on associations within a family as QTL analysis does, we were able to increase the power to detect association between markers and traits by including an additional 192 Atlantic salmon in all analyses and an additional 160 samples (80 grilse and 80 normally maturing Atlantic salmon) for the grilsing GWAS. To our knowledge, the present study represents the first GWAS for growth and age at sexual maturation in a salmonid species.

Materials and Methods

Samples and phenotype data

At this time the Canadian Council on Animal Care does not have formal guidelines for the use of DNA obtained from tissues. Therefore, for the purposes of this research an Animal Care permit from the University Animal Care Committee at Simon Fraser University was not required. Samples came from a commercial breeding program developed by Cermaq (formerly Mainstream) Canada and based on the Mowi strain of Atlantic salmon, which was derived from a breeding program established using Norwegian populations [40]. In November/December 2005, 130 single pair mating families were established. At the fry stage (February 2006) 120 offspring from each family were pooled (15,600 fish in total) and grown communally at the Oceans Farms Hatchery, Vancouver Island. In September/October 2006, 5,000 of the fish were PIT (passive integrated transponder) tagged and phenotypic measurements were taken until early 2009. DNA extracted from fin clips allowed the posterior parental assignment by the use of microsatellites [41]. Five full-sib families comprising 279 individuals (including parents), were selected for analysis. The five families named F007, F023, F076, F088 and F107 contained 51, 62, 45, 46 and 65 progeny, respectively [39].We included an additional 192 Atlantic salmon parents from the same broodstock year (BY) 2005 in all of the analyses. For the grilsing analysis, we also included an additional 160 samples (80 grilse/80 normally maturing Atlantic salmon).

Body weight measurements were taken as described in our previous analysis [39]. Based on the weight measurements taken at times during the production cycle, the number of days required to reach a market weight of 5 kg was calculated for all fish. Maturation times were classified as: precocious (&le 12 months of age), grilse (36 months of age, at 1 st sea winter (SW)), normally maturing (48 to 60 months of age at second SW or third SW), and late-maturing fish (>60 months). For our analyses, maturity was defined as a binary trait in which grilse were recorded as 1 and normally maturing fish were recorded as 0 for the late maturation analysis, fish maturing after 60 months were recorded as 1 and those normally maturing as 0. Sex was recorded during the confirmation of gonad maturation status for fish maturing up to 60 months old. The sex of late maturing fish was determined by the presence of the sdY gene according to Eisbrenner et al. [42].

SNP array and linkage mapping

The SNP data used for this analysis have been described previously [39]. Five families were selected for SNP genotyping at CIGENE, Norwegian University of Life Sciences, Ås, Norway using an Atlantic salmon 6.5K Illumina iSelect SNP-array [36,37]. Analyses were based on an Atlantic salmon linkage map, which contains &sim5,650 SNP markers and was constructed using genotyping data from 143 families comprising 3,297 fish [38]. This map contains 29 linkage groups, which were assigned to their specific chromosome number according to the nomenclature established by Phillips et al. [43]. A total of 471 fish were genotyped for the GWAS. Quality control (QC) was performed and those markers with a call rate lower than 95% and a minimum allele frequency lower than 0.05 were filtered and excluded from the analysis.

Genome-wide association study

GWAS was carried out using the GenABEL library implemented in R statistical environment ( Considering the presence of closely related fish in our sample, we used the GRAMMAS approach (Genome-wide association using Mixed Model and Score test) [44,45]. Thus, we used the &ldquopolygenic&rdquo function [46] to fit three different univariate additive polygenic models which were defined by means of the following general formula:where Y is the vector of phenotypic records (days to 5 kg, late maturation and grilsing) b is the vector of fixed effects (sex for days to 5 kg and late maturation) a is the fixed effect of the SNP genotype u is the random additive genetic effect X and Z are design matrices for b and u, respectively S is the design vector for a and e is the vector of random residuals. For the three models, a and e were assumed to be and , respectively, where A is the additive genomic kinship matrix, is the polygenic additive variance, I is an identity matrix and is the residual variance.

In order to take the relatedness between individuals into account by means of a (co)variance matrix, the kinship matrix A was calculated using genomic data. The genomic kinship matrix A was estimated using marker data with the ibs (identity by state) function and weight = freq option of GenABEL. The residuals obtained from the model were used to perform an association test by means of a simple least squares method [45&ndash47]. Genome-wide significance was assessed by the use of two different methods: first, empirically using 200 permutations and markers with p-values &le 0.05 were considered to be genome-wide significant, and second, by the Bonferroni method, in which the conventional p-value was divided by the number of tests performed. A SNP was considered to have genome-wide significance at p < 0.05/N and have chromosome-wide significance at p < 0.05/Nc, where N is the total number of SNPs used in our study and Nc is the number of SNPs in a particular chromosome.

Linkage disequilibrium

The levels of linkage disequilibrium as r 2 were calculated using the GenABEL package, in order to assess the ability of the available SNPs to capture the genetic variation of the traits analyzed. LD was calculated for all adjacent marker pairs using all of the parents in the population to avoid LD inflation by extremely related individuals present in the full-sib groups of the progeny. The extent and decay of LD with distance was analyzed based on the methodology described by Heifetz et al. [48]. Briefly, the equation of Sved [49], which relates LD caused by drift to inter-marker distance and effective population size, was used to summarize the extent and decay of LD with distance:where LDi is the observed LD for marker pair i, di is the distance in cM for marker pair i, b is the coefficient that describes the decay of LD with distance, and ei is a random residual. Parameter b was estimated using nonlinear regression analysis.

The nucleotide sequences corresponding to the SNPs that showed a significant association with growth or age at sexual maturation were compared by BLAST against the first assembly of the Atlantic salmon genome sequencing project [50], which is publicly available at ASalBase ( and NCBI ( SNP markers were then assigned to a specific whole genome shotgun (WGS) contig by sequence similarity searches. WGS contigs were annotated using an in-house annotation pipeline (

A total of 466 samples and 3,908 markers passed the QC and consequently a GWAS was carried out to identify markers significantly associated with growth (in terms of days to 5 kg) and late sexual maturation in Atlantic salmon. In the case of the grilsing analysis, 3,873 markers and 626 samples passed the QC.

Growth association analysis (days to 5 kg)

Only one marker (GCR_cBin15343_Ctg1_36) located on chromosome 13 (Ssa13) was found to be significantly associated (according to the empirical method) with growth (see Fig. 1), but only at the chromosome-wide level of significance (p < 2.55e-4 for Ssa13) according to the Bonferroni threshold. Marker GCR_cBin15343_Ctg1_36, localized in Ssa13 was assigned to the Atlantic salmon genome contig AGKD01005773 and annotation of this contig showed that the marker is located in a hypothetical protein in the vicinity of a membrane-associated guanylate kinase protein as shown in Table 1.

Fig 1. Results from GWAS for days to 5 kg.

Horizontal dotted line represents the genome-wide significant threshold.

Table 1. Days to 5 kg association detected in the analysis.

Sexual maturation association analysis

Analysis for grilsing identified five markers with a genome-wide significant association (p < 1.29e-5 according to the Bonferroni threshold and p < 0.05 for the permutation method) with the trait as shown in Table 2. These markers (ESTNV_20578_482, ESTNV_36582_634, ESTNV_34243_316, GCR_cBin47052_Ctg1_234, ESTNV_15175_311) are located in Ssa10, Ssa02, Ssa13, Ssa25 and Ssa12, respectively (Fig. 2). The most significantly associated marker ESTNV_20578_482 is located within an E2F Transcription Factor (E2F) and nearby the CCR4-NOT transcription complex gene. The next most significant marker ESTNV_36582_634 is located within a malate dehydrogenase gene (MDH) and close to a UDP-glucose pyrophosphorylase gene (UGP). The next marker showing a significant association was ESTNV_34243_316, a SNP located within a PQ Loop Repeat Containing gene. The other two markers that showed a genome-wide significant association with grilsing were located in uncharacterized genes, as shown in Table 2.

Fig 2. Results from GWAS for grilsing.

Horizontal dotted line represents the genome-wide significant threshold.

Table 2. Grilsing association detected in the analysis.

Late maturation analysis detected association with five markers (ESTNV_22894_922, ESTNV_35192_247, GCR_cBin47084_Ctg1_67, ESTNV_31055_861 and ESTNV_27268_490), which showed a genome-wide significant association with the trait according to the Bonferroni thresholds (only ESTNV_22894_922 and ESTNV_35192_247 reached genome-wide significance (p<0.05) using the permutation method), as shown in Table 3. The two most significant markers are identified as ESTNV_22894_922 and ESTNV_35192_247, and are located on chromosomes Ssa28 and Ssa01, respectively (see Fig. 3). ESTNV_22894_922 is a SNP that was assigned to sequence contig AGKD01068032 and by annotation was positioned in the coding region of FRA10AC1. ESTNV_35192_247 was assigned to sequence contig AGKD01242239 and by annotation was located in the coding region of a CPEB-associated factor maskin putative protein and assigned to sequence contig AGKD01242239.

Fig 3. Results from GWAS for late sexual maturation.

Horizontal dotted line represents the genome-wide significant threshold.

Table 3. Late maturation association detected in the analysis.

Linkage disequilibrium (LD)

The levels of LD detected using the available markers were on average 0.22 with a median of 0.11. The estimated coefficient of the decay of LD when increasing distance (bj) resulting from model (1) was 6.03, indicating that, for example, the expected r 2 for markers separated by 1 cM it is equal to 0.04. Thus, our results show that the high level of LD at shorter distances declines rapidly as distance increases (S1 Fig.). These levels of LD might be insufficient to capture the effect of genomic regions affecting quantitative traits using LD between two markers. These results suggest that the density of the SNPs used here should be improved in order to increase the power to detect association between traits and markers.

GWAS was able to identify at least one marker with a chromosome-wide significant association for each trait of interest. Especially surprising was the low number of markers with a significant association with growth in Atlantic salmon. We only detected one marker (GCR_cBin15343_Ctg1_36, located on Ssa13) showing a chromosome-wide significance. On the other hand, GWAS results for grilsing found five markers in genome-wide significant association with the trait (ESTNV_20578_482, ESTNV_36582_634, ESTNV_34243_316, GCR_cBin47052_Ctg1_234, ESTNV_15175_311), with all of them located on different chromosomes (Table 2). Similarly, late maturation analysis found association with five markers (according to the Bonferroni threshold), as shown in Table 3. Four of these SNPs are located on two chromosomes ESTNV_22894_922 and ESTNV_31055_861 are located on Ssa28, but far apart from each other (36.5 cM), and the other pair of markers, located on Ssa01 (ESTNV_35192_247 and ESTNV_27268_490), are 18.4 cM apart according to the Atlantic salmon female linkage map [35].

Previous studies seeking to identify genomic regions associated with phenotypic traits have based their studies on the use of linkage map regression methods and most of them used a relatively low number of markers. Growth QTL for instance, have been localized to most of the 29 Atlantic salmon chromosomes [21&ndash23,39,51], and a similar situation has been observed in other salmonid species such as rainbow trout and Arctic charr [24,26,27,52]. As growth is a complex trait we expected to find numerous markers in association with it. Surprisingly our GWAS only identified one marker in association with growth and this marker located on Ssa13 only showed a chromosome-wide significant association with the trait (Table 1). This finding was certainly unexpected and is contrary to previous analyses in terms of the number of regions linked to this complex trait. However, it agrees in part with results from our previous analysis [39], in which we found a genome-wide significant QTL on Ssa13 associated with growth in Atlantic salmon. As growth is a polygenic trait in Atlantic salmon, the different results obtained for the previous QTL analysis [39] and this GWAS in this population could be explained by some specific loci having a much greater effect in some of the individual families, but these effects on the trait are diluted at the population level where multiple loci with relatively small effect on the trait are involved.

The analysis for grilsing revealed five chromosome-wide significant associations with the trait by markers located on Ssa10, Ssa02, Ssa13, Ssa25 and Ssa12. In agreement with this finding, our previous QTL work based on linkage mapping described a suggestive QTL located on Ssa10 [34], but located in a different region of the chromosome. Chromosome Ssa10 shares homology with linkage group RT-8 (chromosome 5 Omy05 [53]) in rainbow trout, where a genome-wide significant QTL for early male sexual maturation was mapped [26,52] and also with linkage group AC-16 in Arctic charr, which contains a QTL associated with age at sexual maturation [24]. Moreover, these chromosomes (and also Ssa13) share the location of Clock genes implicated in the circadian regulation of many physiological functions including sexual maturation in salmonids [54]. In Atlantic salmon, QTL located on Ssa10 and Ssa12 were recently associated with precocious parr maturation [25]. To our knowledge no QTL associated with sexual maturation have been identified in Ssa02 however, several QTL related to growth have been described on this chromosome [21&ndash23,39,51].

GWAS of late maturation allowed us to identify five markers (located on Ssa28, Ssa01 and Ssa21) showing a genome-wide significant association with the trait, along with five markers associated with the trait at a chromosome-wide level, as shown in Table 3. Previous studies analysing sexual maturation related traits in other salmonid species have identified QTL in similar regions. In rainbow trout for example, a genome-wide significant QTL associated with early maturation was found on linkage group RT-17 (Omy20 [53]), which shares homology with Ssa28 [26], and also chromosome-wide significant QTL linked to developmental rates [52]. In addition to QTL found on Ssa28, previous studies have described QTL on Ssa01 and its equivalent in other salmonid species. For instance, a QTL for age at sexual maturation in Arctic charr was described in linkage group AC-9, that shares homology with Ssa01 [32] and also a QTL for condition factor [24]. In the case of Ssa21, we recently described QTL for grilsing located in the same chromosome however, in this analysis the QTL was detected for late maturation, which suggests that these regions controlling sexual development contain genes that are involved in both processes. QTL on this chromosome have been described for Atlantic salmon related to adult maturation [25] and in rainbow trout for early maturation [26].

Evidence based on current literature suggests that regions controlling sexual maturation are at least partially conserved within rainbow trout and Arctic charr [24,26,29]. However, the genetic architecture of this trait in Atlantic salmon still seems unresolved. Our previous analysis in Atlantic salmon detected significant QTL located on Ssa21 and Ssa18 for grilsing and late maturation, respectively, but did not detect QTL on other chromosomes [34]. Pedersen et al. [25] recently identified QTL associated with adult maturation in Ssa21 and Ssa23, but not on Ssa28. Nevertheless, they identified additional QTL that share homology with previous findings in rainbow trout and Arctic charr. Difference in populations, number of samples, markers and detection methods could explain differences found within analyses for the same species. In addition, a recent study by Johnston et al. [55] showed that the regions controlling sexual maturation in a wild population of Atlantic salmon from Northern Europe are different from those previously described in farmed fish [25, 34]. Moreover, the authors speculate that differences are due to different selection pressures in the wild and domesticated Atlantic salmon [55].

Given the different results obtained using regular QTL analysis and GWAS, we believe that such discrepancies could be explained by the different approaches used in these analytical procedures. Standard QTL analyses make use of the amount of recombination or linkage between individuals to detect association between markers and trait, which makes it a powerful method when using a low number of markers (e.g., microsatellites). On the other hand, the statistical power of GWAS is a function of sample size, effect size and marker allele frequency [56], and depends on the level of LD between the genetic markers to detect association. That being said, the relatively low power of detection observed in our analysis of growth (days to 5 kg) could be attributed to the number of markers and samples analysed [57]. Recent studies [55,57] using the same SNP chip to analyse traits in Atlantic salmon showed that the levels of LD are not sufficiently high to capture all of the genetic variation within the dataset, even with a larger sample size. A similar result regarding low values of LD between adjacent markers was found in our analyzed data (S1 Fig.), suggesting that a higher density SNP panel will be needed to fully exploit LD in GWAS.

The only marker that showed a significant association with growth in terms of days to 5 kg (chromosome-wide) was GCR_cBin15343_Ctg1_36, which is situated on an uncharacterized gene on Ssa13. Within 10 kb upstream of the location of the SNP we found MAGI-1 (membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1), a protein related to cell-cell contact such as neuronal synaptic and epithelial tight junctions [58]. Another marker, which did not reach chromosome-wide level significance (p < 2.23e-4 for Ssa04), was ESTV_20925_1105, which is located in NPM1 (Nucleophosmin 1) on Ssa04. NPM1 encodes a multifunctional nucleolar phosphoprotein that plays a crucial role in the control of various aspects of cell growth and homeostasis and has been shown to be involved in the growth process of cattle [59].

Candidate genes linked to markers associated with grilsing are numerous due to the level of significance reached by the markers in association with the trait, as shown in Table 2. Marker ESTNV_20578_482 is located in an E2F transcription factor gene. E2F genes are downstream effectors of the retinoblastoma protein (pRB) pathway, and are required for the regulation of numerous genes essential for functions such as DNA replication, cell cycle progression, DNA damage repair, apoptosis, differentiation and development [60&ndash62]. Additionally, nearby the location of this marker we found a subunit of the CCR4-NOT transcription complex, which interacts with Nanos and is essential for male germ cell development in mouse [63]. The following marker is ESTNV_36582_634, and is located in a malate dehydrogenase (MDH) gene. Differences in the enzymatic activity of MDH have been associated with developmental rates in salmonids [64,65]. Marker ESTNV_34243_316 is located in a PQ loop repeat containing protein however, its function remains unknown.

Analysis of late maturation identified several significantly associated markers and consequently several gene candidates were identified. The most significant marker ESTNV_22894_922, assigned to Ssa28, is located in a novel gene called FRA10AC1. This gene is found within the rare FRA10A folate-sensitive fragile site, and even though its function remains unknown, it is known that in humans the 5' UTR of this gene is part of a CpG island that contains a tandem CGG repeat region (normally of 8&ndash14 repeats but over 200 repeats when expanded) [66]. The expansion of repeats in the fragile sites results in hyper-methylation, silencing the underlying gene leading to the fragile site expression. Fragile sites have frequently been linked to mental retardation in humans [67], but are also associated with a possible role in genome reorganization due to their presence in chromosomal regions involved in rearrangements during evolution in vertebrates [68]. The second most significant marker ESTNV_35192_247 assigned to Ssa01 is located in the coding region of maskin, a CPEB-interacting protein involved in the repression of translation by its interaction with CPEB (the protein in charge of polyadenylation of mRNAs) and mainly associated with the control of oocyte maturation in Xenopus by repression and de-repression of mRNAs [69]. Orthologs of both FRA10AC1 and maskin are present in the Atlantic salmon genome however, their function remains unknown. Hence further analyses of these genes are required in order to determine their possible roles in sexual maturation.

The other three SNPs that showed a genome-wide significant association with late maturation were located in uncharacterized genes (Table 3). However, the publicly available Atlantic salmon genome database ( allowed us to identify nearby genes as TTC31, DNAAF2 and INTS6 for markers ESTNV_27268_490, ESTNV_31055_861 and GCR_cBin47084_Ctg1_67, respectively.

Potential implications for selective breeding

Most economically important traits like growth and sexual maturation are quantitative, complex traits affected by several genes [35,70]. Our GWAS failed to uncover a large number of genomic regions significantly associated with growth. A different situation was observed in the analysis of late maturation and grilsing, for which we were able to detect genome-wide significant associations. An important point to consider is the fact that markers (and their genomic regions) found in association with growth were different from those related to age at sexual maturation. This is worth noting considering that significant phenotypic and genetic correlations between growth (as estimated by body weight) and early sexual maturation in Atlantic salmon have been reported previously [13,15,71], and also taking into account that an association between growth and sexual maturation has been described in other salmonid species [24,26,28]. In Atlantic salmon, however, the link between these two traits is not conclusive, and increasing evidence indicates that growth and sexual maturation are controlled by separate regions of the genome [34,39], which also agrees with the results of this study in which the SNPs controlling growth (days to 5 kg) and sexual maturation were independent of one another. Nevertheless, due to the complexity of these traits, it is possible that regions controlling growth and sexual maturation overlap at some point in the genome but have a small effect on the trait. Pedersen et al. [25] recently found QTL for precocious parr and adult maturation in several Atlantic salmon chromosomes that have been previously shown to contain growth QTL, suggesting an interaction of certain genes in both developmental processes.

The estimated levels of heritability calculated using a pedigree-based relationship matrix, for grilsing and late sexual maturation in this population are low: grilse h 2 = 0.09, late maturation h 2 = 0.11 (results not shown), but in agreement with current literature levels of heritability for age at sexual maturity traits in Atlantic salmon (0.04 to 0.17 as reviewed by Garcia de Leaniz et al. [11]). Heritability of growth (days to 5 kg) on the other hand, was found to be considerably higher (h 2 = 0.2), also in agreement with previous findings which estimated heritabilities for growth rate and body weight to range from 0.04&ndash0.26 and 0.05&ndash0.44, respectively [11]. Thus, selection for these traits is possible and it has been ongoing since the early 1970s [3]. Selective breeding programs have been effective in increasing body size while also controlling undesired early sexual maturity in farmed fish [2], suggesting that the main genomic regions controlling these traits are different. The use of molecular marker information could be a valuable tool to improve conventional broodstock selection programs by the identification of specific alleles that could then be screened for and introduced into selected lines.

This GWAS aimed to identify genomic regions associated with growth and sexual maturation. Our previous results using a standard QTL method based on linkage mapping provided us with numerous QTL however, the GWAS detected a lower number of markers with a highly significant association. From a biological point of view, most of the candidate genes linked to markers associated with a particular trait play a role in metabolic or developmental processes however, it would be premature to assign an important role in the control of the trait that they were associated. Further studies, using a SNP array with a higher density, are required in order to determine if these SNPs and their affected alleles could be a valuable addition to broodstock selection programs.

Supporting Information

S1 Fig. Extent and decay of linkage disequilibrium (LD) with distance.

Conceived and designed the experiments: APG SF BS WSD. Performed the experiments: APG SF BS. Analyzed the data: APG JMY. Contributed reagents/materials/analysis tools: SF BS. Wrote the paper: AGP JMY WSD.

Citation: Gutierrez AP, Yáñez JM, Fukui S, Swift B, Davidson WS (2015) Genome-Wide Association Study (GWAS) for Growth Rate and Age at Sexual Maturation in Atlantic Salmon (Salmo salar). PLoS ONE 10(3): e0119730.

1. Gjøen HM, Bentsen HB. Past, present, and future of genetic improvement in salmon aquaculture. ICES J Mar Sci. 199754: 1009&ndash1014.

2. Gjedrem T. Genetic improvement of cold-water fish species. Aquacult Res. 200031: 25&ndash33.

3. Gjedrem T. Genetic improvement for the development of efficient global aquaculture: A personal opinion review. Aquaculture. 2012344&ndash349: 12&ndash22.

4. Nævdal G. Genetic factors in connection with age at maturation. Aquaculture 198333: 97&ndash106.

5. Thorpe JE. Reproductive strategies in Atlantic salmon, Salmo salar L. Aquacult Res. 199425: 77&ndash87.

6. Hörstgen-Schwark G, Langholz HJ. Prospects of selecting for late maturity in tilapia (Oreochromis niloticus): III. A selection experiment under laboratory conditions. Aquaculture. 1998167: 123&ndash133.

7. Gjerde B. Growth and reproduction in fish and shellfish. Aquaculture. 198657: 37&ndash55.

8. Kause A, Ritola O, Paananen T, Mäntysaari E, Eskelinen U. Selection against early maturity in large rainbow trout Oncorhynchus mykiss: the quantitative genetics of sexual dimorphism and genotype-by-environment interactions. Aquaculture. 2003228: 53&ndash68.

9. Sumpter JP. Control of growth of rainbow trout (Oncorhynchus mykiss). Aquaculture. 1992100: 299&ndash320.

10. Thorpe JE, Metcalfe NB. Is smolting a positive or a negative developmental decision? Aquaculture. 1998168: 95&ndash103.

11. Garcia de Leaniz C, Fleming IA, Einum S, Verspoor E, Jordan WC, Consuegra S, et al. A critical review of adaptive genetic variation in Atlantic salmon: implications for conservation. Biol Rev Camb Philos Soc. 200782: 173&ndash211. pmid:17437557

12. Quinton CD, McMillan I, Glebe BD. Development of an Atlantic salmon (Salmo salar) genetic improvement program: Genetic parameters of harvest body weight and carcass quality traits estimated with animal models. Aquaculture. 2005247: 211&ndash217.

13. Gjerde B, Simianer H, Refstie T. Estimates of genetic and phenotypic parameters for body weight, growth rate and sexual maturity in Atlantic salmon. Livest Prod Sci. 1994 38: 133&ndash143.

14. Jónasson J, Gjedrem T. Genetic correlation for body weight of Atlantic salmon grilse between fish in sea ranching and land-based farming. Aquaculture. 1997157: 205&ndash214.

15. Gjerde B. Response to individual selection for age at sexual maturity in Atlantic salmon. Aquaculture. 198438: 229&ndash240.

16. Sonesson A. Within-family marker-assisted selection for aquaculture species. Genet Sel Evol. 200739: 1&ndash17.

17. Sonesson AK. Possibilities for marker-assisted selection in aquaculture breeding schemes. In: Guimarães E, Ruane J, Scherf B, Sonnino A, Dargie J, editors. Marker-assisted selection: current status and future perspectives in crops, livestock, forestry and fish. Rome: FAO.2007 pp. 309&ndash328.

18. Houston RD, Haley CS, Hamilton A, Guy DR, Tinch AE, Taggart JB, et al. Major quantitative trait loci affect resistance to infectious pancreatic necrosis in Atlantic salmon (Salmo salar). Genetics. 2008178: 1109&ndash1115. pmid:18245341

19. Houston RD, Davey JW, Bishop SC, Lowe NR, Mota-Velasco JC, Hamilton A, et al. Characterisation of QTL-linked and genome-wide restriction site-associated DNA (RAD) markers in farmed Atlantic salmon. BMC Genomics. 201213: 244. pmid:22702806

20. Moen T, Baranski M, Sonesson A, Kjoglum S. Confirmation and fine-mapping of a major QTL for resistance to infectious pancreatic necrosis in Atlantic salmon (Salmo salar): population-level associations between markers and trait. BMC Genomics. 2009 10: 368. pmid:19664221

21. Reid DP, Szanto A, Glebe B, Danzmann RG, Ferguson MM. QTL for body weight and condition factor in Atlantic salmon (Salmo salar): comparative analysis with rainbow trout (Oncorhynchus mykiss) and Arctic charr (Salvelinus alpinus). Heredity. 2005 94: 166&ndash172. pmid:15483654

22. Boulding EG, Culling M, Glebe B, Berg PR, Lien S, Moen T. Conservation genomics of Atlantic salmon: SNPs associated with QTLs for adaptive traits in parr from four trans-Atlantic backcrosses. Heredity. 2008101: 381&ndash391. pmid:18648388

23. Baranski M, Moen T, Vage D. Mapping of quantitative trait loci for flesh colour and growth traits in Atlantic salmon (Salmo salar). Genet Sel Evol.201042: 17. pmid:20525320

24. Moghadam HK, Poissant J, Fotherby H, Haidle L, Ferguson MM, Danzmann RG. Quantitative trait loci for body weight, condition factor and age at sexual maturation in Arctic charr (Salvelinus alpinus): comparative analysis with rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Mol Genet Genomics.2007 277: 647&ndash661. pmid:17308931

25. Pedersen S, Berg PR, Culling M, Danzmann RG, Glebe B, Leadbeater S, et al. Quantitative trait loci for precocious parr maturation, early smoltification, and adult maturation in double-backcrossed trans-Atlantic salmon (Salmo salar). Aquaculture. 2013410&ndash411: 164&ndash171.

26. Haidle L, Janssen JE, Gharbi K, Moghadam HK, Ferguson MM, Danzmann RG. Determination of quantitative trait loci (QTL) for early maturation in rainbow trout (Oncorhynchus mykiss). Mar Biotechnol. 200810: 579&ndash592. pmid:18491191

27. Wringe BF, Devlin RH, Ferguson MM, Moghadam HK, Sakhrani D, Danzmann RG. Growth-related quantitative trait loci in domestic and wild rainbow trout (Oncorhynchus mykiss). BMC Genet. 201011: 63. pmid:20609225

28. Martyniuk CJ, Perry GML, Mogahadam HK, Ferguson MM, Danzmann RG. The genetic architecture of correlations among growth-related traits and male age at maturation in rainbow trout. J Fish Biol. 200363: 746&ndash764.

29. O'Malley KG, Sakamoto T, Danzmann RG, Ferguson MM. Quantitative trait loci for spawning date and body weight in rainbow trout: Testing for conserved effects across ancestrally duplicated chromosomes. J Hered. 200394: 273&ndash284. pmid:12920098

30. O'Malley KG, McClelland EK, Naish KA. Clock genes localize to quantitative trait loci for stage-specific growth in juvenile coho salmon, Oncorhynchus kisutch. J Hered. 2010101: 628&ndash632. pmid:20566470

31. McClelland EK, Naish KA. Quantitative trait locus analysis of hatch timing, weight, length and growth rate in coho salmon, Oncorhynchus kisutch. Heredity. 2010 105: 562&ndash573. pmid:20234386

32. Küttner E, Moghadam HK, Skúlason S, Danzmann RG, Ferguson MM. Genetic architecture of body weight, condition factor and age of sexual maturation in Icelandic Arctic charr (Salvelinus alpinus). Mol Genet Genomics. 2011286: 1&ndash13. pmid:21547562

33. Leder EH, Danzmann RG, Ferguson MM. The candidate gene, clock, localizes to a strong spawning time quantitative trait locus region in rainbow trout. J Hered. 2006 97: 74&ndash80. pmid:16407529

34. Gutierrez AP, Lubieniecki KP, Fukui S, Withler RE, Swift B, Davidson WS. Detection of quantitative trait loci (QTL) related to grilsing and late sexual maturation in Atlantic salmon (Salmo salar). Mar Biotechnol. 2013: 1&ndash8.

35. Goddard ME, Hayes BJ. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet. 200910: 381&ndash391. pmid:19448663

36. Kent MP, Hayes B, Xiang Q, Berg PR, Gibbs RA, Lien S. Development of 16.5K SNP-Chip for Atlantic salmon. Plant & Animal Genomes XVII Conference. San Diego, CA. 2009.

37. Gidskehaug L, Kent M, Hayes BJ, Lien S. Genotype calling and mapping of multisite variants using an Atlantic salmon iSelect SNP array. Bioinformatics. 201127: 303&ndash310. pmid:21149341

38. Lien S, Gidskehaug L, Moen T, Hayes B, Berg P, Davidson WS, et al. A dense SNP-based linkage map for Atlantic salmon (Salmo salar) reveals extended chromosome homeologies and striking differences in sex-specific recombination patterns. BMC Genomics. 201112: 615. pmid:22182215

39. Gutierrez AP, Lubieniecki KP, Davidson EA, Lien S, Kent MP, Fukui S, et al. Genetic mapping of quantitative trait loci (QTL) for body-weight in Atlantic salmon (Salmo salar) using a 6.5K SNP array. Aquaculture. 2012358&ndash359: 61&ndash70. pmid:24994942

40. Gjedrem T, Gjøen HM, Gjerde B. Genetic origin of Norwegian farmed Atlantic salmon. Aquaculture. 199198: 41&ndash50.

41. Withler R, Supernault J, Swift B, Peterson R, Fukui S. Microsatellite DNA assignment of progeny to parents enables communal freshwater rearing in an Atlantic salmon selective breeding program. Aquaculture. 2007272: S318.

42. Eisbrenner WD, Botwright N, Cook M, Davidson EA, Dominik S, Elliott NG, et al. Evidence for multiple sex-determining loci in Tasmanian Atlantic salmon (Salmo salar). Heredity. 2014113: 86&ndash92. pmid:23759729

43. Phillips RB, Keatley KA, Morasch MR, Ventura AB, Lubieniecki KP, Koop BF, et al. Assignment of Atlantic salmon (Salmo salar) linkage groups to specific chromosomes: Conservation of large syntenic blocks corresponding to whole chromosome arms in rainbow trout (Oncorhynchus mykiss). BMC Genet. 200910: 46. pmid:19689812

44. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 200723: 1294&ndash1296. pmid:17384015

45. Amin N, van Duijn CM, Aulchenko YS. A genomic background based method for association analysis in related individuals. PLoS ONE. 20072: e1274. pmid:18060068

46. Thompson EA, Shaw RG. Pedigree analysis for quantitative traits: variance components without matrix inversion. Biometrics. 199046: 399&ndash413. pmid:2364130

47. Aulchenko YS, de Koning D-J, Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics. 2007177: 577&ndash585. pmid:17660554

48. Heifetz EM, Fulton JE, O'Sullivan N, Zhao H, Dekkers JCM, Soller M. Extent and consistency across generations of linkage disequilibrium in commercial layer chicken breeding populations. Genetics. 2005171: 1173&ndash1181. pmid:16118198

49. Sved JA. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor Popul Biol. 19712: 125&ndash141. pmid:5170716

50. Davidson WS, Koop BF, Jones SJM, Iturra P, Vidal R, Maass A, et al. Sequencing the genome of the Atlantic salmon (Salmo salar). Genome Biol. 201011: 403. pmid:20887641

51. Houston RD, Bishop SC, Hamilton A, Guy DR, Tinch AE, Taggart JB, et al. Detection of QTL affecting harvest traits in a commercial Atlantic salmon population. Anim Genet. 200940: 753&ndash755. pmid:19397515

52. Easton AA, Moghadam HK, Danzmann RG, Ferguson MM. The genetic architecture of embryonic developmental rate and genetic covariation with age at maturation in rainbow trout Oncorhynchus mykiss. J Fish Biol. 201178: 602&ndash623. pmid:21284638

53. Phillips RB, Nichols KM, DeKonning JJ, Morasch MR, Keatley KA, Rexroad C, et al. Assignment of rainbow trout linkage groups to specific chromosomes. Genetics.2006174: 1661&ndash1670. pmid:16951085

54. Paibomesai M, Moghadam H, Ferguson M, Danzmann R. Clock genes and their genomic distributions in three species of salmonid fishes: Associations with genes regulating sexual maturation and cell cycling. BMC Research Notes. 20103: 215. pmid:20670436

55. Johnston SE, Orell P, Pritchard VL, Kent MP, Lien S, Niemela E, et al. Genome-wide SNP analysis reveals a genetic basis for sea-age variation in a wild population of Atlantic salmon (Salmo salar). Mol Ecol. 201423: 3452&ndash3468. pmid:24931807

56. Stranger BE, Stahl EA, Raj T. Progress and promise of genome-wide association studies for human complex trait genetics. Genetics. 2011187: 367&ndash383. pmid:21115973

57. Sodeland M, Gaarder M, Moen T, Thomassen M, Kjøglum S, Kent M, et al. Genome-wide association testing reveals quantitative trait loci for fillet texture and fat content in Atlantic salmon. Aquaculture. 2013408&ndash409: 169&ndash174.

58. Hata Y, Nakanishi H, Takai Y. Synaptic PDZ domain-containing proteins. Neuroscience Research. 199832: 1&ndash7. pmid:9831248

59. Huang Y-Z, Zhang E-P, Chen H, Wang J, Li ZJ, Huai Y-T, et al. Novel 12-bp deletion in the coding region of the bovine NPM1 gene affects growth traits. J Appl Genet. 201051: 199&ndash202. pmid:20453307

60. Müller H, Bracken AP, Vernell R, Moroni MC, Christians F, Grassilli E, et al. E2Fs regulate the expression of genes involved in differentiation, development, proliferation, and apoptosis. Genes Dev. 200115: 267&ndash285. pmid:11159908

61. Bracken AP, Ciro M, Cocito A, Helin K. E2F target genes: unraveling the biology. Trends BiochemSci. 2004 29: 409&ndash417. pmid:15362224

62. Korenjak M, Brehm A. E2F&ndashRb complexes regulating transcription of genes important for differentiation and development. Curr Opin Genet Dev. 200515: 520&ndash527. pmid:16081278

63. Suzuki A, Saba R, Miyoshi K, Morita Y, Saga Y. Interaction between NANOS2 and the CCR4-NOT Deadenylation Complex Is Essential for Male Germ Cell Development in Mouse. PLoS ONE. 20127: e33558. pmid:22448252

64. Bailey GS, Wilson AC, Halver JE, Johnson CL. Multiple forms of supernatant malate dehydrogenase in salmonid fishes: Biochemical, immunological and genetic studies. J Biol Chem. 1970245: 5927&ndash5940. pmid:4991846

65. Ferguson MM, Danzmann RG, Allendorf FW. Developmental divergence among hatchery strains of rainbow trout (Salmo gairdneri). I. Pure strains. Can J Genet Cytol. 198527: 289&ndash297.

66. Sarafidou T, Kahl C, Martinez-Garay I, Mangelsdorf M, Gesk S, Baker E, et al. Folate-sensitive fragile site FRA10A is due to an expansion of a CGG repeat in a novel gene, FRA10AC1, encoding a nuclear protein. Genomics. 200484: 69&ndash81. pmid:15203205

67. Debacker K, Kooy RF. Fragile sites and human disease. Hum Mol Genet. 2007 16: R150&ndashR158. pmid:17567780

68. Ruiz-Herrera A, Castresana J, Robinson T. Is mammalian chromosomal evolution driven by regions of genome fragility? Genome Biol. 20067: R115. pmid:17156441

69. Stebbins-Boaz B, Cao Q, de Moor CH, Mendez R, Richter JD. Maskin is a CPEB-associated factor that transiently interacts with eIF-4E. Mol Cell. 19994: 1017&ndash1027. pmid:10635326

70. Mackay TFC, Stone EA, Ayroles JF. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet. 200910: 565&ndash577. pmid:19584810

71. Wild V, Simianer H, Gjøen HM, Gjerde B. Genetic parameters and genotype × environment interaction for early sexual maturity in Atlantic salmon (Salmo salar). Aquaculture. 1994128: 51&ndash65.


Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 200616(9):1182–90.

Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S, et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science. 1994266(5182):66–71.

Collins FS, Drumm ML, Cole JL, Lockwood WK, Woude GV, Iannuzzi MC. Construction of a general human chromosome jumping library, with application to cystic fibrosis. Science. 1987235(4792):1046–9.

Roberts PS, Chung J, Jozwiak S, Dabora SL, Franz DN, Thiele EA, et al. SNP identification, haplotype analysis, and parental origin of mutations in TSC2. Hum Genet. 2002111(1):96–101.

Trappe R, Laccone F, Cobilanschi J, Meins M, Huppke P, Hanefeld F, et al. MECP2 mutations in sporadic cases of Rett syndrome are almost exclusively of paternal origin. Am J Hum Genet. 200168(5):1093–101.

Ketterling RP, Vielhaber EL, Lind TJ, Thorland EC, Sommer SS. The rates and patterns of deletions in the human factor IX gene. Am J Hum Genet. 199454(2):201.

Clark RM, Wagler TN, Quijada P, Doebley J. A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and inflorescent architecture. Nat Genet. 200638(5):594–7.

Ashikari M, Sakakibara H, Lin S, Yamamoto T, Takashi T, Nishimura A, et al. Cytokinin oxidase regulates rice grain production. Science. 2005309(5735):741–5.

Fu D, Uauy C, Distelfeld A, Blechl A, Epstein L, Chen X, et al. A kinase-START gene confers temperature-dependent resistance to wheat stripe rust. Science. 2009323(5919):1357–60.

Wu D-H, Wu H-P, Wang C-S, Tseng H-Y, Hwu K-K. Genome-wide InDel marker system for application in rice breeding and mapping studies. Euphytica. 2013192(1):131–43.

Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009326(5956):1112–5.

Xia Q, Guo Y, Zhang Z, Li D, Xuan Z, Li Z, et al. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science. 2009326(5951):433–6.

Lam H-M, Xu X, Liu X, Chen W, Yang G, Wong F-L, et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet. 201042(12):1053–9.

Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci. 2013110(2):453–8.

Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 201042(11):961–7.

Huang X, Zhao Y, Wei X, Li C, Wang A, Zhao Q, et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat Genet. 201244(1):32–9.

Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat Biotechnol. 201230(1):105–11.

Huang X, Kurata N, Wei X, Wang Z-X, Wang A, Zhao Q, et al. A map of rice genome variation reveals the origin of cultivated rice. Nature. 2012490(7421):497–501.

Lai J, Li R, Xu X, Jin W, Xu M, Zhao H, et al. Genome-wide patterns of genetic variation among elite maize inbred lines. Nat Genet. 201042(11):1027–30.

Jiao Y, Zhao H, Ren L, Song W, Zeng B, Guo J, et al. Genome-wide genetic changes during modern breeding of maize. Nat Genet. 201244(7):812–5.

Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet. 201244(7):803–7.

Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 201121(6):961–73.

Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 200925(17):2283–5.

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 201020(9):1297–303.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 200925(16):2078–9.

Schuler GD. Sequence mapping by electronic PCR. Genome Res. 19977(5):541–50.

Neuman JA, Isakov O, Shomron N. Analysis of insertion–deletion from deep-sequencing data: software evaluation for optimal detection. Brief Bioinform. 201314(1):46–55.

Varshney RK, Mahendar T, Aggarwal RK, Börner A. Genic molecular markers in plants: development and applications. Genomics-assisted crop improvement. the Netherlands: Springer 2007. p. 13-29.

Andersen JR, Lübberstedt T. Functional markers in plants. Trends Plant Sci. 20038(11):554–60.

Yang X, Chockalingam SP, Aluru S. A survey of error-correction methods for next-generation sequencing. Brief Bioinform. 201314(1):56–66.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 200910(3):R25. doi:10.1186/gb-2009-10-3-r25.

Patel RK, Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 20127(2):e30619.

Anderson JA, Churchill GA, Autrique JE, Tanksley SD, Sorrells ME. Optimizing parental selection for genetic linkage maps. Genome. 199336(1):181–6.

Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000132(3):365–86.

Allen G, Flores-Vergara M, Krasynanski S, Kumar S, Thompson W. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat Protoc. 20061(5):2320–5.

Results and discussion

Effects of environment (E), genotype (G), and G × E interaction (GEI)

The ANOVA for grain yield (Table 1) identified highly significant differences among genotypes at P < 0.0001. That high genetic variation existed among genotypes is very useful for wheat breeders to efficiently select the highest yielding genotypes, in each location or across locations, to be used in breeding programs. Genotype (G) × environment (E)interactions (GEI)were significant at P < 0.0001for grain yield. Significant GEI indicated that the genotypes performed differently in different environments and that genotypes should be selected for adaptation to specific environments [3, 4, 65]. Hence, the GEI confirmed that genotypes responded differently to the variation in environmental conditions at locations, which indicated the need to test wheat cultivars at multiple locations.

The phenotypic correlation for grain yield among the nine environments is presented in (Fig. 1). No significant or very low significant correlations at P < 0.05 were observed among the cultivar yield values in all environments. A moderately positive significant correlation between Lincoln and Mead (r = 0.42*) and between Grant and McCook (r = 0.43*) was expected as these pairs of locations are in similar ecogeographic zones. The results of weak or low correlations further support the diversity of the testing environments and the significant effect of GEI on the genotypes’ performances.

Correlation coefficient matrix of Grain yield in all nine environments

The performance of the genotypes in different environments

It is common for MEYTs data to represent a combination of crossover and non-crossover types of GEI. The minimum, maximum, mean of grain yield in each environment is presented in Table 2. The maximum grain yield ranged from 3503.53 (Kansas) to 8287.50 Kg/Ha (McCook). The lowest and highest average of grain yield were also accounted to the same two environments. This huge difference in grain yield for the same set of genotypes was due to the strong effect of environment and GEI.

The highest yielding genotypes differed by location NE17660 (Alliance), NE17626 (Clay Center), NE17528 (Grant), NE17588 (Kansas), NE17609 (Lincoln), NE17441 (McCook), NE17662 (Mead), NE17463 (North Platte), and NHH17447 (Sidney) (Supplementary Table S3). GEI can be caused by crossover interactions or by non-crossover interactions (e.g. changes in the magnitude of the differences among lines). As there was no common genotype that ranked as the highest yielding genotype in more than one environment, we chose the 50 highest yielding genotypes (

18.5% of the experimental genotypes) in each location to represent the high yielding genotypes at that environment. Then, a genotype was selected if it was among the 50 high yielding genotypes in at least two environments. As a result, 13 genotypes were marked and selected (Table 3). The same procedure was applied in selecting the high drought tolerant wheat genotypes, [52]. Those genotypes which were in the high yield group in multiple locations were considered as having the non-crossover GEI (Table 3). The genotype NE17625 was found among the highest 50 yielding genotypes in all the environments except Kansas. The genotype NE17626 was found among the highest 50 yielding genotypes at all the environments except Mead. Moreover, the genotype NE17443 was found to be among the highest 50 yielding in seven environments. Two genotypes NE17629 and NE17549 were found among highest 50 yielding in six environments. Remarkably the selected genotypes are heterogeneous in terms of their pedigree. For example, some of the selected genotypes shared the same parent such as NE17625, NE17626, NE17629 and NE17549 which were all reselections from NW03666 (Supplementary Table S1). Both NE17479 and NE17435 were half-sibs and shared the same parents (NE06545/NW07534). The other seven selected genotypes had different pedigrees. Pedigree information provides useful information for plant breeders to maintain diversity while making the next set of crosses using the selected genotypes.

As mentioned previously, the significant GEI often is interpreted as a specific breeding program for improving grain yield may be required optimal improvement for each environment [44, 52]. However, crossing these selected genotypes may be useful for a breeding program, with a full consideration to the pedigree information, that extends across more than one environment, especially when the environment at a location will change from year to year (e.g. may not be predictable).

Alliance and Grant had the highest number of selected top 50 genotypes in common with 11 in each. Kansas and Sidney, on the other hand, had the fewest common selected genotypes (six genotypes). This result may be due to the Kansas trial was considerably further south and in a different state while the other eight environments are in Nebraska and where the breeding program targets its new cultivars.

Analytical approaches to GEI analysis are important for enhancing the value of MEYTs and gaining an understanding of causes of GE interactions [61, 65, 68]. The methods used to understand GEI include the characterization of trial sites according to environmental factors, using either direct measurements, calculated indices, or variables derived from crop growth models [18]. The highly significant GEI were explained by the differences in the precipitation, snow cover, and temperature from one location to another location during the growing session (Supplementary Table S4). Although eight environments are geographically within Nebraska, the climate data differed by the environment.

GGE bi-plot analysis

The GGE-biplot approach, which was based on environment focused scaling, was used to estimate the relationships between the environments (Fig. 2). The lines that join the biplot origin and the markers of the environments are called environment vectors. The angle between the vectors of 2 environments is related to the correlation coefficient between them. The angles among most of our environments were only a little smaller than 90° therefore, the correlation between them should be close to 0 (See Fig. 1). This GGE biplot approach (Fig. 2) suggested that Alliance and North Platte were the most closely correlated environments with Grant and McCook closely behind. However, the largest correlation coefficients were between McCook and Grant and between Lincoln and Mead (Fig. 1). Some contradictions between the figures and actual correlations were predictable because the biplot did not estimate 100% of the GGE variation [33, 67].

GGE-biplot based on environment-focused scaling for environments. PC and E stand for principal component and environments, respectively. Details of environments are (Supplementary Table S4). The environments are represented in this figure as Alliance (AL), Clay Center (CC), Grant (G), Kansas (KAN), Lincoln (LN), McCook (MC), Mead (ME), North Platte (NP), Sidney (SD)

Most of our environments in this study were considered as PC2 environments except Kansas, Lincoln and Mead were included in PC1, which had positive and negative scores. PC1 represents proportional genotype yield differences across environments, which leads to a non-crossover GEI. Genotypes with superior PC1 scores can be easily identified in environments with larger PC1 scores. In contrast to environmental PC1, PC2 had both positive and negative scores (Fig. 2). Positive and negative scores are due to crossover GEI, leading to inconsistent genotype yield differences across environments [66]. A genotype may have large positive interactions with some environments but have large negative interactions with other environments.

In order to create a detailed climate factor (Supplementary Table S4 and Fig. 3), PCA evaluated the standardized values of the growing season mean temperature, average rainfall and average snowfall. Looking at (Fig. 3) we find that all the three climatic factors (average temperature, average rainfall, and average snowfall) were widely distributed throughout the PCA1 and PCA2. But there were several closely observed snowfall points among Lincoln, Mead, McCook and Kansas. The average temperature of Alliance, Sidney and Kansas were widely distributed across the PCA. All these informative data indicated that different climate factors caused strong GE interactions.

Principal component analysis for temperature, rainfall, and snowfall in the nine environments. The environments are represented in this figure as Alliance (AL), Clay Center (CC), Kansas (KAN), Lincoln (LN), McCook (MC), Mead (ME), North Platte (NP), Sidney (SD). Weather data in Grant location is not availbe

Genome-wide association study for grain yield

The GWAS analysis was performed using MLM model which takes population structure into consideration [6, 69]. Due to the highly significant interaction among the genotypes and environments, the GWAS was performed for each environment, separately. The GWAS found a total of 70 MTAs associated with GY in the nine environments (Table 4 and Figure 4 Supplementary Table S5). The lowest number of significant SNPs (three SNPs) for grain yield were detected in the Grant environment, while the highest number of significant SNPs (11) was observed in three environments: Lincoln, McCook, and Sidney. The phenotypic variation (R 2 ) ranged from 7.36% to 12.91%. All QTLs detected using GWAS can be considered as having minor effects on increasing grain yield. Grain yield is a complex trait controlled by many genes and affected by environment, and thus the identification of large number of associations is expected.At the genomic level, the highest number of significantly associated SNPs was observed in the D genome (30 SNPs) followed by A genome (21 SNPs) then B genome (19 SNPs) (Fig. 5). At the chromosomal level, the 71 significant SNPs associated with increased GY were distributed on all wheat chromosomes except 1A, 4B, 4D, 6A and 6B. The highest number of significant SNPs were located on the same chromosome (2D) and associated with high grain yield across environments (13 SNPs). The 13 SNPs were found in 3 environments (2 SNPs in Grant, 6 SNPs in Mead and 5 SNPs in Sidney). These valuable results reflected the importance of D genome in the GY traits. A broad comparison of marker-trait association results from the current study with two previous studies were made using a chromosome basis because of differences in marker type and marker positions on different genetic maps. Edae [21] detected a stable QTL for grain yield on chromosome 2DS both under irrigated and rain-fed conditions using DArT markers. Also, the DArT marker wpt6531 on the short arm of chromosome 2DS, which was associated with yield is about 8 cM away from the wpt4144 marker, which was associated with grain yield in a previous study by Burguen et al. [14]. Previous studies have emphasized the importance of the D genome for grain yield using different types of markers [14, 20, 21, 23, 34].

Manhattan plot displaying SNP markers-trait association identified for GY traits in GWAS using F3:6 Nebraska winter wheat. The blue line is significant threshold of 5% bonferroni correction (BC 5%)

The distribution of significant associated SNPs with GY across all environments

No common markers were found among environments due to the lack of or very low significant correlations among environments for grain yield. Marker-assisted selection (MAS) can be useful for specific environments. The MTAs found in this study should be validated in additional environments and germplasm before using them in MAS. Previous studies identified SNP markers associated with GY on various chromosomes (1D, 1B, 2A, 3B, 4A, 5A, 5B, 5D, 7A, and 7B) [2, 10, 21, 32, 38, 63]. Chromosomes 3B, 5A, 5B and 7A were identified as having important yield QTL using 567 loci including RFLP, SSR, and AFLP markers [58]. El-basyoni [22] identified QTLs associated with GY in different environments on chromosomes 1B, 2A, 3A, 4A, 5A, 5B, 6A, 6D and 7B using DArT markers with a previous duplicate nursery lines of Nebraska winter wheat. Moreover, significant markers associated with GY were found on chromosomes 1B, 2A, 3A, 4A, 5A, 5B, 6A, 6D, and 7B in European winter wheat [11, 17, 19, 26, 54, 55, 58, 73]. Neumann [46], detected significant markers for GY in winter wheat on the chromosomes, 3A, 3B, 7A, 5B and 7B. Markers responsible for GY were identified on chromosomes, 4B and 7D which were reported in bi-parental QTL analyses [2, 11, 17]. Significant markers associated with GY on chromosome 2A were reported in previous studies [1, 27,28,29, 38, 60, 62]. A new publication of Kan et al. [32] who revealed a major quantitative trait locus (QTL) QYld.osu-1BS for grain yield in 260 F2:4 winter wheat population of doubled haploid (DH) lines derived from the cross of Duster and Billings and they validated the QTLs using kompetitive allele specific PCR (KASP) markers for the unique sequences for QYld.osu-1BS allele.

Linkage disequilibrium (LD) and gene annotation

The significance of linkage equilibrium (r 2 ) was estimated between each pair of SNPs located on the same chromosome in each environment (Supplementary Table 5). All SNP pairs that had a high LD were considered a genomic region (GR). As a result, a set of 16 genomic regions associated with increased grain yield were identified across the nine environments. All SNPs located on the same chromosome had a high significant LD in Clay Center (two GRs), Kansas (three GRs), and North Platte (two GRs). In Alliance, there was no significant LD among SNPs located on 2A chr., while the SNPs located on 3A chr. were in significant LD. The five SNPs that are located on 2D chromosome in Mead were in a high significant LD. Among the SNPs located on the same chromosome, there were some cases which some SNP pairs had a significant LD, while the other did not have. For example, in McCook, there was a non-significant LD among SNPs on 1B chr., however, only S1B_427530781 and S1B_427530781 were in a complete LD. Likewise, in Sidney, the five SNPs located on 2D chromosomes had two groups in which SNPs were in high significant LD. The first group consisted of two SNPs (S2D_67556531 and S2D_69503850), while the second one consisted of three SNPs (S2D_76749441, S2D_77122291, and S2D_77122292). There was no significant LD between the two groups. Such information is very important to know which SNPs, located on the same chromosome, could be inherited together or individually. The significant SNPs located on the same chromosome, if the LD value among a group of target SNPs is high, then these SNPs could represent the same QTL and inherited together. If the LD is low, on the other hand, then the two significant SNPs represent two different QTLs [6, 51]. The gene annotation was identified for all genomic regions with significant LD. The candidate genes within the 16 GRs are listed in (Supplementary Table S6).

As aforementioned, 2D chromosome had the highest number of SNPs associated with high grain yield across environment. Therefore, gene annotation was described in detail on this chromosome. By studying LD pattern among these SNPs, we can identify how many genes within the physical position of these SNPs could represent. In Mead, the five SNPs represent one QTL, while, in Sidney, S2D_67556531 and S2D_69503850 coinherited tougher and the other three SNPs on the same chromosome co-inherited together. When estimating the LD between each two pairs of the 13 SNPs, the results indicated no significant LD between any pair of SNPs from different environments (Fig. 6). This is also a further indication of the strong G × E interaction which affects the expression of genes in response to the environment.

Heatmap of LD among SNPs located on 2D chr. Across the three environments. The red line indicates that high LD among all SNPs in Mead

By looking to the gene annotation in the LD genomic regions on 2D chromosome (Supplementary Table S6). We found that TraesCS2D01G506200 gene (genome region 8) is annotated as Zinc transporter and metal ion transport was associated with grain yield in Mead environment. There was many reports demonstrated the crucial role of Zn in improving grain yield in wheat [35, 40]. Recently, Alqudah et al. [5] found that deficiency of metal ion transport specially Cu and Zn may result in reducing grain yield through increasing spikelet and floret sterility/abortion. Interestingly, in Sidney environment, TraesCS2D01G118400 and TraesCS2D01G119200 (genomic region 15) genes are also annotated as potassium transporter and calcium exchanger respectively, demonstrating the role of the metals in grain yield. Potassium and calcium deficiency can significantly reduce crop yield but due to the complex relationship in absorption or transport among them, their mechanisms in grain improvement is still not well-understood [56]. Other interesting candidate genes in Mead environment are involved in phytohormone e.g. auxin (TraesCS2D01G506900) and Jasmonate (TraesCS2D01G507200). In wheat, phytohormones conttrol the spikelet’s and grain development and grain filling. Slow grain development and filling rates are highly linked with low contents of the cytokinin and auxin [64] that in turn reduce grain yield. Understanding the mechanisms of metal transport and phytohormones which are found to be highly associated with grain yield in the current study are needed to improving wheat grain yield.

The promising high yielding genotypes for future breeding program

For the 13 selected genotypes (Table 3), three criteria may be considered to determine the genotypes as candidate parents for a future cross to improve grain yield. These criteria are based on:

the presence of favorable alleles associated with increased grain yield was recorded in each genotype (Supplementary Table S7). NE17435 had the highest number of favorable alleles associated with increased grain yield at 46 sites, while, NE17550 had 39 favorable alleles. Four genotypes NE17545, NE17626, NE17524, and NE17629 had the same number of 43 alleles associated with increased grain yield. Although NE17435 had the highest number of target alleles for grain yield, but it was among the 50 highest yielding genotypes in only six environments. NE17625 and NE17626 which were among the 50 highest in eight environments had 40 and 43 target alleles for grain yield, respectively. It was useful to count the number of favorable alleles that each selected genotype carries as it helps in determining the target genotypes as parents for future crosses in the breeding program to improve grain yield in single or specific environments.

the genetic diversity among the 13 genotypes. The GD among all genotypes in this study was extensively described in [24]. This population was divided into three possible subpopulations [24]. The genetic distance among the 13 selected genotypes is presented in (Supplementary Table S8). Five genotypes were found to be assigned to subpopulation 1 (G1), five were assigned to subpopulation 3 (G3), and the remaining three genotypes were assigned to subpopulation (G2). The highest genetic distance was found between NE17624 and NE17661 (GD = 0.333), while the lowest GD was between NE17549 and NE17625 (GD = 0.007). Generally, all selected genotypes had a low level of genetic distance and they tend to be genetically similar. This is due to the fact that all genotypes represent Nebraska winter wheat breeding program [24] Moreover, many accessions were reselections from the same line (Supplementary Table S1).

the number of unique alleles associated with increased grain yield between each pair of the 13 selected genotypes (Supplementary Table S9). The highest number of different alleles (33) was found between NE17549 and both NE17479 and NE17435. Only three different alleles, on the other hand, were found between NE17549 and NE17625. The different alleles between each two genotypes depends on the genetic distance among them. There was a positive significant correlation between different alleles and genetic distance (r = 0.65**).

Each criterion identified different candidate parents. By considering all criteria together with the priority for the number of different alleles and genetic distance, a cross between NE17625 and NE17479 may be useful for developing a cultivar that may be high yielding in five environments (Alliance, Grant, Kansas, Lincoln, and North Platte). Both parents had 40 alleles associated with increased grain yield, 32 different alleles associated with increased grain yield, and genetic distance of 0.292. Moreover, NE17625 and NE17479 were among the highest 50 genotypes in eight and five environments, respectively. Although NE17625 and NE17626 were among the highest 50 yielding genotypes in eight environments, crossing between them is not useful because it will lead to very low genetic variation in grain yield as both genotypes tend to be genetically similar with only GD of 0.007 and nine different alleles. Therefore, using genomic tools (e.g. identifying number of target alleles, number of different alleles, genetic diversity analyses, etc.) in parallel with phenotypic selection is very fruitful to improve target traits. Moving forward such information on important MTAs can be considered along with genomic selection which is now routinely performed in the breeding program for shortlisting promising parental lines for new sets of crosses for improving grain yield [9].

Putative SNP markers for validation and future use in MAS

For the candidate SNPs detected by GWAS, a set of 10 SNP markers were found to be present in the 13 selected genotypes, while three SNPs were not shown in any of the selected genotypes. As it is recommended in each environment to validate the SNPs that were found in the that environment, it may be useful to convert these 13 SNP to kompetitive allele specific PCR (KASP) markers and validate them for high grain yield in a different genetic background. For example, S2A_718916923 marker was found in all selected genotypes and should be validated. Although the marker allele C was found to be significantly associated with increased grain yield in Alliance, this allele was also present in all selected genotypes in other environments. This marker allele has probably minor effects in increasing grain yield in the other environments but the GEI hinders this marker in the other environments to be significantly associated with grain yield.

Interestingly, two SNPs associated with GY in this study were also found to be associated with grain yield in a previous study [10]. This earlier study evaluated grain yield for synthetic winter wheat genotypes in Turkey for two seasons. The two SNPs (S3A_24993796 and S3A_24993797) are associated with GY in Alliance environment (Supplementary Table S5). Moreover, in their study, the allele A for S3A_24993796 and C for S3A_24993797 were associated with increased grain yield. The same two alleles were also associated with increased grain yield of Nebraska winter wheat. The amount of phenotypic variation (R 2 ) explained by these markers is similar (

11.5%) in both studies. High LD between the two SNPs has also been highlighted in both of our studies. Overall, the two SNPs are associated with grain yield in two independent genetic backgrounds (Nebraska wheat and synthetic winter wheat from Turkey) and two testing environments (Nebraska and Turkey) further validating the associations identified in this study. It is known that states in the Great Plain and Turkey share some of the same climatic features [7]. Hence, introgression of favorable alleles across germplasm which have not have been previously incorporated into their germplasm may be a possibility [7].

This research was financially supported by Production Development Corporation (CORFO project number 14EIAT-28667) a Chilean governmental organization. GMY is supported by Fondecyt/Conicyt Postdoctoral Grant n. 3190553, and JMY is supported by Núcleo Milenio INVASAL funded by Chile’s government program, Iniciativa Cientifica Milenio from Ministerio de Economia, Fomento y Turismo. The funders had no role in the design, data collection and analysis, decision to publish, or preparation of the manuscript.


Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile

Watch the video: R for GWAS: Day 3 (August 2022).