What is a sex-biased gene?

What is a sex-biased gene?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

How do you define a male-biased gene and a female-biased gene as they are found in the abstract of this article.

In the linked paper, the authors discuss this as sex-based gene expression that evolved by sex-specific selection. The expression is not limited to one sex (which are sex-limited genes). Sex-biased genes are expressed by both sexes, but differently between sexes.

Evolutionary and developmental dynamics of sex-biased gene expression in common frogs with proto-Y chromosomes

The patterns of gene expression on highly differentiated sex chromosomes differ drastically from those on autosomes, due to sex-specific patterns of selection and inheritance. As a result, X chromosomes are often enriched in female-biased genes (feminization) and Z chromosomes in male-biased genes (masculinization). However, it is not known how quickly sexualization of gene expression and transcriptional degeneration evolve after sex-chromosome formation. Furthermore, little is known about how sex-biased gene expression varies throughout development.


We sample a population of common frogs (Rana temporaria) with limited sex-chromosome differentiation (proto-sex chromosome), leaky genetic sex determination evidenced by the occurrence of XX males, and delayed gonadal development, meaning that XY individuals may first develop ovaries before switching to testes. Using high-throughput RNA sequencing, we investigate the dynamics of gene expression throughout development, spanning from early embryo to froglet stages. Our results show that sex-biased expression affects different genes at different developmental stages and increases during development, reaching highest levels in XX female froglets. Additionally, sex-biased gene expression depends on phenotypic, rather than genotypic sex, with similar expression in XX and XY males correlates with gene evolutionary rates and is not localized to the proto-sex chromosome nor near the candidate sex-determining gene Dmrt1.


The proto-sex chromosome of common frogs does not show evidence of sexualization of gene expression, nor evidence for a faster rate of evolution. This challenges the notion that sexually antagonistic genes play a central role in the initial stages of sex-chromosome evolution.

Sex-Biased Gene Expression

Methods of transcriptional profiling have made it possible to compare gene expression between females and males on a genome-wide scale. Such studies have revealed that sex-biased gene expression is abundant in many species, although its extent may vary greatly among tissues or developmental stages. In species with genetic sex determination, sex chromosome–specific processes, such as dosage compensation, also may influence sex-biased gene expression. Sex-biased genes, especially those with male-biased expression, often show elevated rates of both protein sequence and gene expression divergence between species, which could have a number of causes, including sexual selection, sexual antagonism, and relaxed selective constraint. Here, we review our current knowledge of sex-biased gene expression in both model and nonmodel organisms, as well as the biological and technical factors that should be considered when analyzing sex-biased expression. We also discuss current approaches to uncover the evolutionary forces that govern the evolution of sex-biased genes.

2. Dispensibility

Genes do not all share the same degree of importance in terms of an organism's ability to survive and reproduce. Definitions of dispensability vary, but genetic studies ranging from yeast to mice have identified those genes that are indispensable to survival or fertility (Hirsh & Fraser 2001 Giaever et al. 2002) and those at the other end of the continuum, which show no obvious knock-out phenotype (Barbaric et al. 2007 Liao & Zhang 2007). Therefore, some genes are required, others appear to be superfluous, and most are intermediate between these extremes. This degree of importance is often referred to as dispensability, and while it is difficult to connect the measures of dispensability from unicellular eukaryotes to metazoans, we employ the term loosely here, intending it as a gauge for the phenotypic effects of a gene.

Critical genes generally show lower rates of functional protein change when compared with dispensable genes (Hirsh & Fraser 2001 Jordan et al. 2002 Pal et al. 2003 Wall et al. 2005 Liao & Zhang 2006). This theoretically results from narrow fitness optima for critical genes, manifesting in strong purifying selection against functional mutations, the vast majority of which are deleterious. Genes that are less critical are subject to less purifying pressure, and evolve more rapidly simply through neutral processes.

Dispensable genes share key expression characteristics with sex-biased genes, suggesting that sex-biased genes themselves may be dispensable. Indeed, it is possible that genes with higher levels of dispensability may respond more quickly to sexually antagonistic selection, thereby evolving sex-biased expression, as different female and male transcription levels would be less likely to have deleterious effects for less critical genes.

Materials and Methods

To identify genes with sex-limited and nonsex-limited functions, we searched FlyBase (Tweedie et al. 2009) for mutations within the following phenotypic categories: visible, lethal, semilethal, sterile, male sterile, and female sterile (searches used the TermLink section We trimmed the data set to include alleles associated with specific genes (although many alleles have been mapped to specific chromosomes and/or cytological bands, the mapping resolution for these cases was generally insufficient to be included within the final data set).

Individual genes can potentially have multiple alleles within the data set (though the majority of Drosophila genes were not associated with any alleles). We therefore classified each gene according to its range of mutant allele phenotypes, which fall between 𠇎ntirely female-specific” to 𠇎ntirely male-specific.” The genes were classified as follows:

Genes with female-limited fitness effects are those that contain female-sterile alleles and no other allele type

Genes with female-biased fitness effects contain female-sterile alleles and any combination of visible, lethal, and semilethal alleles

Genes with male-limited fitness effects contain male-sterile alleles and no other allele type

Genes with male-biased fitness effects contain male-sterile alleles and any combination of visible, lethal, and semilethal alleles

Genes without sex-biased fitness effects contain both male-sterile and female-sterile alleles, visible alleles, lethal alleles, and/or semilethal alleles.

Genes associated with sterility, but with neither sex specified (the underlying allelic data did not provide information about sex), were considered ambiguous and excluded from the analysis. The sample of sterile alleles that were included in the analysis is potentially heterogeneous because some studies examine fertility in only one sex rather than both. Nevertheless, the proportion of sex-limited and nonsex-limited steriles in our data set is consistent with independent experimental results that explicitly test male and female fertility (alleles associated with sex-specific sterility are roughly three times as common as alleles associated with sterility in both sexes see Lindsley and Lifschytz 1972 Ashburner et al. 2005). This suggests that most genes and alleles classified as sex-limited are in fact associated with sex-limited sterility.

Molecular expression profiles were obtained from the Sex Bias Database (SEBIDA version 2.0: Gnad and Parsch 2006). We downloaded male versus female expression ratios (M/F) from 15 different microarray studies (data were originally reported in: Parisi et al. 2003, 2004 Ranz et al. 2003 Gibson et al. 2004 Stolc et al. 2004 McIntyre et al. 2006 Goldman and Arbeitman 2007 Ayroles et al. 2009) and M/F ratios from a meta-analysis of several studies (details of the meta-analysis are described at SEBIDA). M/F ratios can potentially range from zero to infinity, with male-biased transcription for M/F > 1 and female-biased transcription for M/F < 1. To impose symmetry on sex-biased expression levels, we rescaled the data using an index of sex-biased expression: x = M/(M + F). This variable ranges between zero and one, with female-biased transcription for x < 0.5 and male-biased transcription for x > 0.5.

The final data set included 2,433 genes with M/F expression information from at least 1 of the 15 studies. Within the final data set, there were 1,955 genes with similar mutational effects on both sexes, 298 genes with female-biased fitness effects, 43 female-limited genes, 87 genes with male-biased fitness effects, and 50 male-limited genes (an additional 53 genes had ambiguous sex-specific sterility phenotypes). Supplementary table S1 (Supplementary Material online) provides a breakdown of the data set into phenotypic subcategories, including the mean and median number of alleles per gene, per phenotypic category.

Statistical Analysis

Two-tailed Mann–Whitney U tests (implemented in R R Development Core Team 2005) were used to assess whether the distribution of sex-biased transcription levels differs between phenotypically defined gene categories. To examine whether different categories of sex-biased transcription have different compositions of phenotypes, we subdivided the data set into five expression categories, each with equal range: (1) 0 < x < 0.2 (2) 0.2 < x < 0.4 (3) 0.4 < x < 0.6 (4) 0.6 < x < 0.8 and (5) 0.8 < x < 1.0. Two-tailed Fisher’s exact tests were used to examine whether female-biased transcription categories (1, 2) were enriched for genes with female-specific phenotypes and whether male-biased transcription categories (4, 5) were enriched for genes with male-specific phenotypes.

The results presented below use meta-analysis expression profiles (from SEBIDA see above) to transcriptionally categorize genes. The meta-analysis data set represents a composite of several independent microarray studies, which minimizes the likelihood of sex-biased transcription misclassification for each gene (compared with classifications based on single studies). The meta-analysis also includes data for a high proportion of the 2,433 genes (compared with single studies), which maximizes statistical power. Nevertheless, each analysis was also performed using transcription classifications from individual microarray studies. The results are consistent across studies, though the statistical power is often lower, due to decreased gene representation. Results for each platform are presented within the supplementary figs. S1 and S2 (Supplementary Material online).



Individuals for whole-body and tissue-specific samples were collected from the field as last instar juveniles in spring 2013 and 2014, respectively (collection locations for all samples are given in Supplementary Data 8). All individuals were raised in common garden conditions (23 °C, 12 h:12 h, 60% humidity, fed with Ceanothus cuttings) until 8 days following their final moult. Prior to RNA extraction, individuals were fed with an artificial medium for 2 days to avoid RNA contamination with gut content and then frozen at −80 °C. For leg samples, three legs were used from each individual (one foreleg, one midleg and one hindleg). Reproductive tracts were dissected to consist of ovaries, oviducts and spermatheca in females and testes and accessory glands in males. Note the same individuals were used for leg and reproductive tract samples. To ensure individuals were reproductively active at the time of sampling, all sexual individuals were allowed to mate, and asexual and sexual females were observed to lay eggs. When analyses were repeated using virgin sexual females, we obtained qualitatively similar results (Supplementary Fig. 13). Note only whole-body samples were available for this comparison. Ethical approvals or collection permits were not required for this research.

RNA extraction and sequencing

We generated three biological replicates per species and tissue type from pooled individuals (1–9 individuals per replicate, a total of 516 individuals, in 150 replicates in total (including the virgin sexual females) see Supplementary Data 8). To extract RNA, samples were flash-frozen in liquid nitrogen followed by addition of Trizol (Life Technologies) before being homogenised using mechanical beads (Sigmund Lindner). Chloroform and ethanol were then added to the samples and the aqueous layer transferred to RNeasy MinElute Columns (Qiagen). RNA extraction was then completed using an RNeasy Mini Kit following the manufacturer’s instructions. RNA quantity and quality was measured using NanoDrop (Thermo Scientific) and Bioanalyzer (Agilent). Strand-specific library preparation and single-end sequencing (100 bp, HiSeq2000) were performed at the Lausanne Genomic Technologies Facility.

The 150 libraries produced a total of just under 5 billion single-end reads. Six whole-body and six tissue-specific libraries produced significantly more reads than the average for the other samples. To reduce any influence of this on downstream analyses, these libraries were sampled down to approximately the average number of reads for whole-body or tissue-specific libraries respectively using seqtk ( Version: 1.2-r94).

Transcriptome references

De novo reference transcriptome assemblies for each species were generated previously 16 . Our expression analyses were conducted using two sets of orthologs. Firstly, we identified orthologs between sexual and asexual sister species using reciprocal Blast as described in Parker et al. 24 . Secondly, we used the 3010 one-to-one orthologs present in all 10 Timema species as identified by Bast et al. 16 . The identified ortholog sequences varied in length among different species. Since length variation might influence estimates of gene expression, we aligned orthologous sequences using PRANK (v.100802, default options) 25 and trimmed them using 26 to remove overhanging gaps at the ends of the alignments. If an alignment contained a gap of greater than three bases then sequence preceding or following the alignment gap (whichever was shortest) was discarded. Any orthologous sequences that had a trimmed length of <300 bp were also discarded. Finally, before mapping, genes with significant Blast hits to rRNA sequences were removed from the trimmed transcriptome references.

Read trimming and mapping

Before mapping, adapter sequences were trimmed from raw reads with CutAdapt 27 . Reads were then quality trimmed using Trimmomatic v 0.36 28 , clipping leading or trailing bases with a phred score of <10 from the read, before using a sliding window from the 5′ end to clip the read if 4 consecutive bases had an average phred score of <20. Any reads with a sequence length of <80 after trimming were discarded. Reads from each libret were then mapped to the transcriptome references using Kallisto (v. 0.43.1) 29 with the following options -l 210 -s 25–bias–rf-stranded for whole-body samples and -l 370 -s 25–bias–rf-stranded for tissue-specific samples (the -l option was different for whole-body and tissue-specific samples as the fragment length for these libraries was different).

Differential expression analysis

Expression analyses were performed using the Bioconductor package EdgeR (v. 3.18.1) 30 in R (v. 3.4.1) 31 . Firstly, to identify sex-biased genes we compared male and female expression separately for each tissue type in each sexual species. Genes with counts per million <0.5 in 2 or more libraries per sex were excluded from expression analyses. Normalisation factors for each library were computed using the TMM method. To estimate dispersion, we then fit a generalised linear model (GLM) with a negative binomial distribution with sex as an explanatory variable and used a GLM likelihood ratio test to determine the significance of sex on gene expression for each gene. P-values were then corrected for multiple tests using Benjamini and Hochberg’s algorithm 32 . Sex-biased genes were then defined as genes that showed a greater than 2 fold difference in expression between males and females with an FDR < 0.05. Note all genes not classified as sex-biased were classified as unbiased genes. We chose this threshold in order to select a robust set of sex-biased genes, and to reduce the effect of sex-biased allometry 33 . Note that analyses using just an FDR threshold to define sex-biased genes, or using virgin sexual females to independently verify sex-biased genes in whole-body samples, produced qualitatively similar results (Supplementary Tables 18–19, Supplementary Fig. 14).

Clustering of expression values (log2 CPM) was performed using Ward’s hierarchical clustering of Euclidean distances with the R package pvclust (v. 2.0.0) 34 , with bootstrap resampling (method.hclust = “ward.D2”, method.dist = “euclidean”, nboot = 10000), and visualised using R package pheatmap (v. 1.0.8) 35 .

To quantify how sex-biased genes change in expression in asexual females we then compared gene expression in sexual and asexual females separately for each species pair and each tissue type. We also compared the change in expression in asexual females for male- and female-biased genes to unbiased genes using a Wilcoxon test, corrected for multiple tests using Benjamini and Hochberg’s algorithm 32 . To determine if changes in sex-biased gene expression in asexual females are larger for genes sex-biased in fewer species we fit a generalised linear mixed model with the number of species a gene is sex-biased in as a fixed effect and gene ID as a random effect in R. The significance of terms was determined using a Likelihood Ratio Test. A separate model was fit for male- and female- biased genes in each tissue. P-values were corrected for multiple tests using Benjamini and Hochberg’s algorithm. We also examined gene expression changes in the T. cristinaeT. monikensis species pair when X-linked transcripts were excluded. X-linked transcripts were determined in these species by blasting (blastN) transcripts to the T. cristinae reference genome, for which linkage groups have been assigned 36 . The gene expression analyses were then repeated on only those transcripts that had a significant blast hit (e-value < 1 × 10 −20 , query coverage > 60%) to a scaffold in an autosomal linkage group.

Shifts in sex-biased genes and asexual lineage age

The asexual species differ in age as estimated previously 2 . Since the age of asexuality varies we tested if changes in sex-biased gene expression altered with asexual species age using a permutation ANCOVA (number of permutations = 10,000) separately for male- and female-biased genes with the following terms: asexual lineage age, tissue type and their interaction.

Analysis of sex-limited genes

Sex-limited genes were classified as genes that had at least two Fragments Per Kilobase Million (FKPM) in each replicate of one sex and 0 FKPM in each replicate of the other sex. FKPM values were calculated using EdgeR. The expression levels of female-limited genes in sexual and asexual females, and male-limited genes in sexual males and asexual females were compared using a Wilcoxon test, corrected for multiple tests using Benjamini and Hochberg’s algorithm 32 .

Sequence evolution of sex-biased genes

To test if sex-biased genes have a higher rate of divergence in asexuals, we examined if sex-biased genes have elevated dN/dS ratios in asexuals. To do this we firstly fit a binomial glmm (dN/dS values were transformed to fall into two categories: zero or non-zero), with reproductive mode, sex-bias and their interaction as fixed effects and gene identity as a random effect. Secondly, we firstly fit a glmm with a gamma distribution to the dN/dS values that were greater than zero, with the same fixed and random effects as the binomial model. All glmms were fit using the lme4 package (v. 1.1.14) 37 in R, and significance of terms was determined using a log-likelihood ratio test. dN/dS values were calculated for each of the one-to-one orthologs using codeml implemented in the PAML package 38 to generate maximum likelihood estimates of dN/dS for each terminal branch in the phylogeny (using the “free model”) as described in Bast et al. 16 .

GO term analysis

Genes were functionally annotated using Blast2GO (version 4.1.9) 39 as described in Parker et al. 24 . Briefly, sequences from each sexual species were compared with BlastX to either NCBI’s nr-arthropod or Drosophila melanogaster (drosoph) databases, to produce two sets of functional annotations, one derived from all arthropods and one specifically from Drosophila melanogaster. The D. melanogaster GO term annotation generated around four times more annotations per sequence than NCBI’s nr-arthropod database. We therefore conducted all subsequent analyses using the GO terms derived from D. melanogaster, but note that results using the annotations from all arthropods were qualitatively the same (see Supplementary Fig. 15).

To identify overrepresented GO terms we conducted gene set enrichment analyses (GSEA) using the R package TopGO (v. 2.28.0) 40 , using the elim algorithm to account for the GO topology. GO terms were considered to be significantly enriched when p < 0.05.

Since we defined sex-biased genes with both FDR and FC thresholds, we ranked sex-biased genes for the GSEA to take both FDR and FC into account. To identify overrepresented GO terms for female-biased genes, genes were ranked by FDR in four subsets: female-biased with FC > 2, female-biased with FC < 2, male-biased with FC < 2, and male-biased with FC > 2. Female-biased gene subsets were ranked so that small FDR values were ranked highly, male-biased gene subsets were ranked so that small FDR values were ranked low in the list. The four lists were then joined together in the order given above, and assigned a unique rank. This ranked list produces a list where strongly female-biased genes are at the top, followed by weakly female-biased genes, then weakly male-biased genes, and finally strongly male-biased genes at the bottom. To identify overrepresented GO terms for male-biased genes the ranked list for female-biased genes was simply inverted. Finally, to examine the GO terms overrepresented in sex-biased genes which changed expression in asexuals, female- and male-biased genes were ranked by fold-change between sexual and asexual females.

To determine if the overlap of sets of sex-biased genes or GO terms was greater than expected by chance we used the SuperExactTest package (v. 0.99.4 41 ) in R, which calculates the probability of multi-set intersections. P-values were multiple test corrected using Benjamini and Hochberg’s algorithm implemented in R.

Evolutionary dynamics of sex-biased genes expressed in cricket brains and gonads

Sex-biased gene expression, particularly sex-biased expression in the gonad, has been linked to rates of protein sequence evolution (nonsynonymous to synonymous substitutions, dN/dS) in animals. However, in insects, sex-biased expression studies remain centered on a few holometabolous species, and moreover, other major tissue types such as the brain remain underexplored. Here, we studied sex-biased gene expression and protein evolution in a hemimetabolous insect, the cricket Gryllus bimaculatus. We generated novel male and female RNA-seq data for two sexual tissue types, the gonad and somatic reproductive system, and for two core components of the nervous system, the brain and ventral nerve cord. From a genome-wide analysis, we report several core findings. Firstly, testis-biased genes had accelerated evolution, as compared to ovary-biased and unbiased genes, which was associated with positive selection events. Secondly, while sex-biased brain genes were much less common than for the gonad, they exhibited a striking tendency for rapid protein evolution, an effect that was stronger for the female than male brain. Further, some sex-biased brain genes were linked to sexual functions and mating behaviors, which we suggest may have accelerated their evolution via sexual selection. Thirdly, a tendency for narrow cross-tissue expression breadth, suggesting low pleiotropy, was observed for sex-biased brain genes, suggesting relaxed purifying selection, which we speculate may allow enhanced freedom to evolve adaptive protein functional changes. The findings of rapid evolution of testis-biased genes and male and female-biased brain genes are discussed with respect to pleiotropy, positive selection, and the mating biology of this cricket.

Competing Interest Statement

The authors have declared no competing interest.

Materials and Methods

Worm strains and growth

C. elegans (N2), C. brenneri (PB2801), C. briggsae (AF16), C. remanei (PB4641), and P. pacificus (PS312) strains were maintained at 20° on NGM agar plates using standard C. elegans growth methods.


One female and two or more male replicates were collected per species (summarized in Supporting Information, Table S1). At least 50 young adult worms were hand picked per replicate. Worms were washed by settling three to five times with 1 ml of M9, starved overnight to eliminate gut bacterial contamination, and resuspended in 100 μl TE. A total of 400 μl of lysis buffer (0.1 M Tris–HCl 0.1 M NaCl 50 mM EDTA 1.25% SDS) was added and worms were sonicated for 30 min using the Bioruptor at high setting, 30 sec on/off. Sonicated DNA was isolated and cleaned up using Qiagen MinElute kit. Illumina DNA sequencing libraries between 250 and 500 bp were prepared from the purified DNA using Illumina TruSeq DNA kit with the following modifications. Briefly, after end repair and A tailing, adapters were ligated and the resulting DNA was purified using AmpureXP beads. Ligated DNA was amplified by PCR and DNA library between 300 and 500 bp was gel purified. Fifty base pair paired-end or single-end sequencing (see Table S1) was performed using Illumina HiSeq-2000. The raw data can be found at Gene Expression Omnibus (GEO) under series number GSE53144. For paired-end data, quality scores of the reverse reads were much lower than those of the forward reads. As such, only forward reads were used for analysis.

Copy-number approach: X and autosomal gene assignments

For each species, forward reads were aligned to WS228 with Bowtie version 0.12.7 (Langmead et al. 2009). We supplied the parameter (m = 4) to suppress all alignments with more than four hits in the genome. The resulting alignment files (in BAM format, a binary file type containing sequence alignment data) were converted to SAM format (tab-delimited text version of a BAM file) using SAMtools v. 0.1.18 (Li et al. 2009). The SAM files were sorted and used to generate bedgraph files (BEDTools v. 2.15.0) (Quinlan and Hall 2010). For each species, the contigs were split into 5-kb windows and the bedgraph file was used to calculate the sequencing coverage for each window. The male-to-female coverage ratio was computed for each window by taking the log2 of male coverage divided by female coverage. Baseline was set as the mean male-to-female coverage ratio. Windows whose log2 ratio fell one standard deviation below the mean were initially assigned to the X chromosome. If the majority of 5-kb windows contained within a contig were assigned to the X, then the contig was assigned to the X. If there was no majority, or if the contig was <15 kb, we could not assign the contig. Assignments were given confidence scores ranging from 0 to 2 based on both length of sequencing contig and agreement of assignment between replicates. One point was given to those contigs with lengths >50 kb. A second point was given if the final contig assignment obtained by combining all replicates matched the assignment obtained when the replicates were analyzed separately. The gene assignments and confidence scores are given in Table S2.


C. elegans (N2) and P. pacificus (PS312) worms were synchronized by bleaching gravid adults and allowing worms to hatch overnight. Larval worms were plated and grown at 20°. At least 750 young adult worms were hand picked for each of three biological replicates. Worms were washed three to five times in M9. Ten volumes of Trizol (Invitrogen) was added. Samples were freeze-cracked five times and RNA purification was performed according to the manufacturer protocol. Isolated RNA was cleaned up using the Qiagen RNeasy kit. mRNA was purified using Sera-Mag oligo(dT) beads (ThermoScientific) from at least 1 μg of total RNA. Stranded mRNA-seq libraries were prepared based on incorporation of dUTPs during cDNA synthesis using a previously described protocol (Parkhomchuk et al. 2009). Single-end 50-bp sequencing was performed using the Illumina HiSeq-2000. Data for C. elegans (fog-2), C. brenneri, C. briggsae (she-1), and C. remanei were downloaded from GEO (accession no. GSE41367). Reads were aligned to genome version WS228 with tophat v. 2.0.0 (Trapnell et al. 2012). Default parameters allow up to 20 hits for each read. Read numbers and mapping percentages (which refer to the percentage of unique reads with at least one alignment) can be found in Table S1. Gene expression was estimated using Cufflinks v. 2.0.2 with default parameters and supplying gene annotations for WS228. Average male and female expression (FPKM, fragments per kilobase per million mapped reads) was determined (Table S6). The raw read data and the average Cufflinks FPKM data can be obtained from GEO accession no. GSE53144. Differential expression analysis between males and females was performed using the R package DESeq (Anders and Huber 2010 R Development Core Team 2012). These results are available in Table S6.

Defining sex-biased gene expression

To define sex bias, we first identified those genes that were differentially expressed between the two sexes (DESeq qvalue < 0.05). Genes with FPKM >1 in at least one sex were considered “expressed” and used for subsequent analyses. We categorized “sex-biased” genes as those having FPKM >1 in one sex and FPKM >0 in the other. For each sex-biased gene, the magnitude of sex bias was calculated as the log2 ratio of FPKM values between the two sexes. Those genes that have FPKM >1 in one sex and FPKM = 0 in the other were categorized as “sex-specific” genes. We categorized “nonbiased” genes as those genes not called significant by DESeq (qvalue > 0.05) with FPKM >1 in both sexes and with less than twofold expression difference between the sexes. We categorized genes with high and low sex-biased expression based on a log2 sex-expression ratio cutoff of 3 (Table S3). The cutoff was selected based on a breakpoint in the distribution of sex-biased expression ratios driven largely by the gonadal expression of highly sex-biased genes (discussed further in Results and Discussion).


We have presented a comprehensive characterization of genetic variation in 234 NXH samples of age over 60, which is the first whole-genome sequencing study of this ethnic group. Considering the unique history of the Hui population in China, the whole-genome data generated in this study is of great significance for the genomic studies of East Asian populations and Muslim populations and serves as a useful control data set for genetic association studies of late-onset diseases.

With this unprecedented data, we comprehensively revealed the genetic origins, admixture history, and population structure of NXH. Our results showed that NXH was most closely related to East Asian populations compared with other global populations. Moreover, NXH shared the majority of genetic makeup with the Han Chinese population. Interestingly, although the Hui and Han Chinese peoples were very similar in appearance ( Zheng et al. 1997), they were genetically distinguishable from each other, which could attribute to the western ancestry in NXH. Remarkable differences in the genetic makeup between NXH and HAN were observed. Specifically, four major ancestral components were identified in the NXH, which potentially were derived from ancestral populations in East Asia, Siberia, West Eurasia, and South Asia. In contrast, two major ancestral components were identified in HAN. Modeling admixture history indicated that these four ancestral components were derived from two earlier admixed populations. The eastern ancestry consisted of East Asian and Siberian ancestral components. The western ancestry consisted of West Eurasian, South Asian, and Siberian ancestral components. The population movement and gene flow between Siberia and West Eurasia across the Eurasian steppe has been reported by several studies ( Pugach et al. 2016 Sikora et al. 2019), which suggested the possibility that the Siberian ancestral component through western ancestry. Moreover, our simplified modeling of isolation by distance showed that it is unlikely to explain the history of NXH ( supplementary fig. S32 , Supplementary Material online), thus supporting the admixture model we proposed for NXH ( supplementary fig. S12 , Supplementary Material online). Besides, the admixture between eastern and western ancestries was sex-biased, with more Eastern females and Western males. Merchants, emissaries, and soldiers migrated from Arab, Persia, and Central Asia into China and those people were mainly males. Moreover, the Hui people practice endogamy and the marriage occurred mainly within the Huis. Intermarriage generally involves a Han Chinese converting to Islam, especially, marriages between Hui male and Han female were more frequent ( Jia 2006).

The distribution of the Huis is a result of the genetic origin, migration, and admixture history of this group. We observed a south-to-north cline within NXH. Specifically, the samples in southern Ningxia regions had a higher western ancestry proportion, which can be attributed to a few factors. First, the old Silk Road went through the southern Ningxia, which was the main route of the gene flow between East Asian and West Eurasian ( Gladney 1997) ( fig. 4A), which might have resulted in differentiated admixture between southern and northern NXH. Second, the Hui account for a higher total population percentage in southern Ningxia than in northern Ningxia, resulting in relatively fewer intermarriages between the Huis and Han Chinese in the south compared with that in the north. Indeed, intermarriage between Huis and other ethnic groups is much less frequent in southern than in northern Ningxia ( Yang 2002).

Interestingly, among the Hui people in Northwestern China, the genetic differentiation between southern and northern NXH was even larger than that between NXH and XJH, though Ningxia and Xinjiang were further apart geographically. Moreover, the western ancestry contribution was lower in XJH (10%) than that in some regions southern NXH (12%), although slightly higher than that in northern NXH (<9%). These results seemly unexpected, because compared with Ningxia, Xinjiang was geographically closer to Central Asia and West Eurasia and was the gateway for western people from Central Asia to enter the interior. However, it would be reasonable if historical documents were referred to, which recorded that the XJH were mainly immigrants from Northwestern China. For example, it is believed that the history of the Huis settlement in Xinjiang began after the suppression of the Junggar rebellion in the twentieth year of the Qing Emperor Qianlong (1755) ( Li 2010).

Also, our results suggested that Dungan was genetically more closely related to southern NXH. According to historical documents, Dungan were the Hui people who migrated from China into Central Asia in the year 1867. ADMIXTURE analysis suggested that there was no considerable gene flow between Dungan and surrounding populations after Dungan people migrated from China into Central Asia, which could due to that they speak the Sino-Tibetan language, whereas most of the surrounding populations speak Turkic language.

We evaluated the effect of some confounding factors on the inference of fine-scale population structure. ASD was commonly used to measure the genetic differentiation at the individual level. Compared with FST, it is not necessary to separate individuals into groups with different genetic backgrounds to estimate the allele frequency. Pairwise distance within one population was expected to be less than that between populations. However, we found that this was not true for the admixed population ( supplementary fig. S33 , Supplementary Material online). We performed a simulation to investigate whether this was due to the effect of admixture. Genetic distance within groups having higher western ancestry proportion was larger than that between groups having lower western ancestry proportion ( supplementary figs. S33 and S34 , Supplementary Material online). This was consistent with the larger ASD among samples from southern Ningxia ( supplementary fig. S35 , Supplementary Material online). The genetic distance between regions in southern Ningxia was larger than that between regions in northern Ningxia. This could partially explain the pattern observed in the estimated effective migration surfaces (EEMS) result that the lowest effective migration rate was among the southern regions in Ningxia ( supplementary fig. S35 , Supplementary Material online) and that genetic distance between NXH and HAN was less than genetic distance within NXH ( supplementary fig. S33 , Supplementary Material online).

We also found that enough markers were needed to detect fine-scale population structure ( supplementary fig. S36 , Supplementary Material online). Additionally, estimating admixture time is important to understand the history of the admixed population. We observed the admixture times inferred by ALDER and GLOBETROTTER are inconsistent with that inferred by MultiWaver 2.0. Both ALDER and GLOBETROTTER estimated that the admixture of NXH occurred 20 generations ago, whereas MultiWaver 2.0 identified an additional ancient admixture event that occurred 41 generations ago. Indeed, according to the recorded history, the admixture of the Hui population may have started as early as 1,400 years ago during the Tang and Song dynasties ( Chen 1999). Previous studies also suggested that MultiWaver 2.0 was more powerful than other methods in modeling ancient and complex admixture. For example, an ancient admixture of the Uyghurs that occurred in the Bronze Age was identified by MultiWaver 2.0, and has been confirmed by some recent ancient DNA studies ( Feng et al. 2017 Ning et al. 2019). MultiWaver 2.0 leveraged the information of length distribution of ancestral tracks and required the ancestral segments of local ancestry inference as input. However, many local ancestry inference methods need a priori admixture time. Our results showed that the marker density would affect the process to choose the optimized parameter, which would indirectly affect the result inferred based on the length distribution of ancestral tracks ( supplementary fig. S36 , Supplementary Material online). These findings valued the whole genome sequencing data in exploring the history of admixed populations such as the Huis.

Our results suggested a complex scenario of genetic origin, admixture history, and population structure in the Hui population. The two-wave model we proposed here does not necessarily mean there were only two admixture events in history. Rather, it suggested population admixture occurred at least more than once. Moreover, the first wave of admixture was estimated to occur 1,025 years (41 generations) ago, which might suggest the admixture event begins to occur 1,025 years ago at the latest. Furthermore, the admixture time could be underestimated ( Leslie et al. 2015). The ancient admixture we identified, that is, occurred over 1,025 years ago, is roughly corresponding to the late Tang Dynasty, and the Five Dynasties and Ten Kingdoms period. However, the intensive contact between western and eastern peoples might be common in the early Tang Dynasty according to historical records. Similarly, the time of a recent admixture that occurred nearly 500 years ago as we inferred in this study, corresponding to the Ming Dynasty, might be also underestimated. Again, according to the recorded history, west–east contacts were more frequent during the Mongolian Conquests in the 13th and 14th centuries. The concordance would be improved if we adopted a longer generation time, 29–30 years, as a previous study suggested ( Fenner 2005). Nonetheless, we believe history could be more complex than the simplified models as we presented in this study. We should point out that the Hui is one of the most distributed populations in China, the samples studied here were mainly descendants of the Hui people in Northwestern China, in which more than 51% of the Hui people are concentrated. Whether the Hui genomes in this study fully represent the diversity of the Hui China has not been evaluated. Further efforts in developing more sophisticated methods and recruiting more diverse population samples are expected to uncover a more comprehensive picture of the diversity, origins, and history of the Hui people.

Genetic study takes research on sex differences to new heights

Throughout the animal kingdom, males and females frequently exhibit sexual dimorphism: differences in characteristic traits that often make it easy to tell them apart. In mammals, one of the most common sex-biased traits is size, with males typically being larger than females. This is true in humans: Men are, on average, taller than women. However, biological differences among males and females aren’t limited to physical traits like height. They’re also common in disease. For example, women are much more likely to develop autoimmune diseases, while men are more likely to develop cardiovascular diseases.

In spite of the widespread nature of these sex biases, and their significant implications for medical research and treatment, little is known about the underlying biology that causes sex differences in characteristic traits or disease. In order to address this gap in understanding, Whitehead Institute Director David Page has transformed the focus of his lab in recent years from studying the X and Y sex chromosomes to working to understand the broader biology of sex differences throughout the body. In a paper published in Science, Page, a professor of biology at MIT and a Howard Hughes Medical Institute investigator Sahin Naqvi, first author and former MIT graduate student (now a postdoc at Stanford University) and colleagues present the results of a wide-ranging investigation into sex biases in gene expression, revealing differences in the levels at which particular genes are expressed in males versus females.

The researchers’ findings span 12 tissue types in five species of mammals, including humans, and led to the discovery that a combination of sex-biased genes accounts for approximately 12 percent of the average height difference between men and women. This finding demonstrates a functional role for sex-biased gene expression in contributing to sex differences. The researchers also found that the majority of sex biases in gene expression are not shared between mammalian species, suggesting that — in some cases — sex-biased gene expression that can contribute to disease may differ between humans and the animals used as models in medical research.

Having the same gene expressed at different levels in each sex is one way to perpetuate sex differences in traits in spite of the genetic similarity of males and females within a species — since with the exception of the 46th chromosome (the Y in males or the second X in females), the sexes share the same pool of genes. For example, if a tall parent passes on a gene associated with an increase in height to both a son and a daughter, but the gene has male-biased expression, then that gene will be more highly expressed in the son, and so may contribute more height to the son than the daughter.

The researchers searched for sex-biased genes in tissues across the body in humans, macaques, mice, rats, and dogs, and they found hundreds of examples in every tissue. They used height for their first demonstration of the contribution of sex-biased gene expression to sex differences in traits because height is an easy-to-measure and heavily studied trait in quantitative genetics.

“Discovering contributions of sex-biased gene expression to height is exciting because identifying the determinants of height is a classic, century-old problem, and yet by looking at sex differences in this new way we were able to provide new insights,” Page says. “My hope is that we and other researchers can repeat this model to similarly gain new insights into diseases that show sex bias."

Because height is so well studied, the researchers had access to public data on the identity of hundreds of genes that affect height. Naqvi decided to see how many of those height genes appeared in the researchers’ new dataset of sex-biased genes, and whether the genes’ sex biases corresponded to the expected effects on height. He found that sex-biased gene expression contributed approximately 1.6 centimeters to the average height difference between men and women, or 12 percent of the overall observed difference.

The scope of the researchers’ findings goes beyond height, however. Their database contains thousands of sex-biased genes. Slightly less than a quarter of the sex-biased genes that they catalogued appear to have evolved that sex bias in an early mammalian ancestor, and to have maintained that sex bias today in at least four of the five species studied. The majority of the genes appear to have evolved their sex biases more recently, and are specific to either one species or a certain lineage, such as rodents or primates.

Whether or not a sex-biased gene is shared across species is a particularly important consideration for medical and pharmaceutical research using animal models. For example, previous research identified certain genetic variants that increase the risk of Type 2 diabetes specifically in women however, the same variants increase the risk of Type 2 diabetes indiscriminately in male and female mice. Therefore, mice would not be a good model to study the genetic basis of this sex difference in humans. Even when the animal appears to have the same sex difference in disease as humans, the specific sex-biased genes involved might be different. Based on their finding that most sex bias is not shared between species, Page and colleagues urge researchers to use caution when picking an animal model to study sex differences at the level of gene expression.

“We’re not saying to avoid animal models in sex-differences research, only not to take for granted that the sex-biased gene expression behind a trait or disease observed in an animal will be the same as that in humans. Now that researchers have species and tissue-specific data available to them, we hope they will use it to inform their interpretation of results from animal models,” Naqvi says.

The researchers have also begun to explore what exactly causes sex-biased expression of genes not found on the sex chromosomes. Naqvi discovered a mechanism by which sex-biased expression may be enabled: through sex-biased transcription factors, proteins that help to regulate gene expression. Transcription factors bind to specific DNA sequences called motifs, and he found that certain sex-biased genes had the motif for a sex-biased transcription factor in their promoter regions, the sections of DNA that turn on gene expression. This means that, for example, a male-biased transcription factor was selectively binding to the promoter region for, and so increasing the expression of, male-biased genes — and likewise for female-biased transcription factors and female-biased genes. The question of what regulates the transcription factors remains for further study — but all sex differences are ultimately controlled by either the sex chromosomes or sex hormones.

The researchers see the collective findings of this paper as a foundation for future sex-differences research.

“We’re beginning to build the infrastructure for a systematic understanding of sex biases throughout the body,” Page says. “We hope these datasets are used for further research, and we hope this work gives people a greater appreciation of the need for, and value of, research into the molecular differences in male and female biology.”

This work was supported by Biogen, Whitehead Institute, National Institutes of Health, Howard Hughes Medical Institute, and generous gifts from Brit and Alexander d’Arbeloff and Arthur W. and Carol Tobin Brill.

Not all sex-biased genes are the same

In this issue of the New Phytologist, Sanderson et al. (pp. 527–539) characterize sexual dimorphism in gene expression in a long-lived dioecious tree (Populus balsamifera), adding a rare type of datapoint to the literature. The authors compared gene expression between the sexes from both reproductive (catkins) and somatic tissues (leaves). They found very limited sexual dimorphism in leaves, while reproductive tissues showed higher sexual dimorphism – c. 30% of the genes expressed in catkins were sex-biased. Sanderson et al. did not include genes that were only expressed in one sex, which should exclude most gene expression arising from sex-specific tissues.

Unlike most similar studies in other organisms, including dioecious willows (Darolti et al., 2017 ), there were twice as many female-biased genes compared to male-biased genes. This result demonstrates the usefulness of studying species from many domains of life, reproductive modes, and life histories. It nicely illustrates that very different genes can be sex-biased when different tissues or species are compared, and suggests that not all sex-biased genes evolve under the same constraints. In this context, the recent suggestion to split sexual dimorphism into primary and secondary, based on direct association with gamete production or not (Charlesworth, 2018 ), is a welcome step in the right direction. However, separate analysis of smaller subsets of sex-biased genes is likely to better inform independent biological processes that may produce results that cancel out when all sex-biased genes are analysed together.

Sex-biased genes are often assumed to evolve fast, because of their potential direct involvement in sexual antagonism (Zhang et al., 2004 ). In this context, the finding of Sanderson et al. that female-biased genes in catkins had atypically slow evolutionary rates is surprising. A similar result was obtained in the basket willow and may be attributed to haploid selection (the expression of many genes in haploid cells, as part of the normal life cycle of a plant), which would result in strong purifying selection (Darolti et al., 2017 ). To understand why these genes evolved slowly, the function of the sex-biased genes needs to be taken into account. Many of the female-biased genes identified by Sanderson et al. were associated with photosynthesis, which is an evolutionarily conserved process. Other explanations for slow evolutionary rates – such as the authors’ suggestion involving a combination of the old age of the sex chromosomes, prolonged dioecy and the assumption that rapid sequence evolution only occurs early in sex-chromosome evolution – may still be valid. However, these factors would be better suited to a subset of female-biased genes that excludes the evolutionarily conserved, female-biased, photosynthesis genes, such as those associated with immunity, which were also enriched in the female-biased genes identified by Sanderson et al.

The differentially expressed genes between the sexes will differ between species with different sexually dimorphic tissues. This is obvious to anyone who has held a P. balsamifera catkin and a Silene latifolia flower. However, when discussing sex-biased genes, cross-species comparisons in the numbers or rates of evolution of sex-biased genes are common. It is unlikely that all the genes differing in expression between sexes have the same evolutionary constraints, as illustrated in P. balsamifera, where many female-biased genes have resulted from a reduction in photosynthesis in male catkins. Treating sex-biased genes as a homogeneous group of genes, and investigating their evolution in different tissues and organisms, does not capture the complexity of differences in male and female biology.

The example of photosynthesis also illustrates the limitations of removing sex-specific genes and employing log2 fold change thresholds when defining sex bias, to avoid the effects of tissue composition, as Sanderson et al. did. While such methods are recommended to reduce the effects of allometric differences between samples (Montgomery & Mank, 2016 ), they do not help with comparisons between sexually dimorphic tissues, because fundamental differences between the sexes are due to sex-specific tissues. One solution is to avoid gene-expression comparisons between reproductive tissues, but this would remove tissues with genes showing high evolutionary rates, such as pollen in Arabidopsis thaliana (Gossmann et al., 2014 ) or accessory gland proteins in Drosophila (Swanson & Vacquier, 2002 ). Instead it is more important to accept that the comparison of two samples arising from the two sexes will never capture the complexity of sexual antagonism in a whole organism, such as sex allocation in hermaphrodites, sex-specific growth rates, differences in stress and investment in immunity.

One way to encompass these biological realities relating to sexual antagonism is to embrace the fact that the assumption that most parameters are unchanged when comparing two samples is not valid when two sexes are compared. Other than tissue composition, the two sexes often differ in the timing of gene expression. For example developmental time is different between animal meiotic tissue, with meiosis being ongoing or stalled in male and female animals, respectively. In addition, the sexes frequently differ in their overall pace of life, with males often adopting a ‘live fast, die young’ strategy (Immonen et al., 2018 ). Plants have the potential to inform our understanding of sexual dimorphism in many phenotypes associated with the pace of life, because they represent many independent data points of different life cycles, with trees representing one extreme, and annual plants the other. The availability of a plethora of sexually dimorphic traits makes plants particularly well suited to the study of sexual antagonism in biologically realistic conditions. These include some sexually antagonistic traits such as flowering time (Meagher & Delph, 2001 ) and other traits whose sexual antagonism interacts with environmental conditions (such as specific leaf area, which is sexually dimorphic only in sites with high water availability in S. latifolia Delph et al., 2011 ).

It is time to incorporate more biology in the discussion of sex-biased genes. One such case is when accounting for tissue specificity in gene expression, when investigating the evolutionary rate of sex-biased genes, which finds high tissue specificity in gene expression to explain fast protein-coding evolution better than sex bias (Meisel, 2011 ). The influence of genes associated with photosynthesis on the evolutionary rate of female-biased genes in P. balsamifera catkins is, essentially, such an example. Another parameter to consider involves the proportion of genes that are expressed in the haploid phase, which prevents degeneration of the allele restricted to the heterogametic sex (Chibalina & Filatov, 2011 ). The fact that such diverse cases of interesting biology can be studied through sex-biased gene expression, illustrates that grouping of genes into male- and female-biased is too broad. Sex-biased genes can be very different, depending on the tissue and species studied, and not all of them evolve under the same constraints.

The rationale for studying non-model organisms is often to understand how the tweaks in textbook biology that our species of interest represents affect evolutionary processes. Realizing and understanding the underlying biology behind sex-biased genes is essential before understanding the evolutionary forces that have resulted in their observed state of expression and sequence evolution. Gene expression studies such as Sanderson et al. are only the first step towards understanding the contribution of sexual antagonism to the observed differences between species.

Watch the video: Στερεότυπα,Προκατάληψη,σεξισμός και διάκριση ανάμεσα στα δύο φύλα (May 2022).