9.1: Genomes and their organization - Biology

9.1: Genomes and their organization - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Genomes are characterized by two complementary metrics, the number of base pairs of DNA and the number of genes present within this DNA. In fact genomes are dynamic, something that we will return to shortly.

The genome of an organism (and generally the cells of which it is composed) consists of one or more DNA molecules. When we talk about genome size we are talking about the total number of base pairs present in all of these DNA molecules added together. The organism with one of the largest known genomes is the plant Paris japonica; its genome is estimated to be ~150,000 x 106 (millions of) base pairs255. In contrast the (haploid) human genome consists of ~3,200 x 106 base pairs of DNA. The relatively small genome size of birds (~1,450 x 106 base pairs) is thought to be due to the smaller genome size of their dinosaurian ancestors256. That said there are interesting organisms that suggest that in some cases, natural selection can act to dramatically increase or decrease genome size without changing gene number. For example, the carnivorous bladderwort Utricularia gibba, has a genome of ~80 x 106 base pairs and ~28,000 genes, significantly fewer base pairs of DNA, but apparently more genes than humans.

Very much smaller genomes are found in prokaryotes, typically their genomes are a few millions of base pairs in length. The smallest genomes occur in organisms that are obligate parasites and endosymbionts. For example the bacterium Mycoplasma genitalium, the cause of non-gonococcal urethritis, contains ~0.58 x 106 base pairs of DNA, which encodes ~500 distinct genes. An even smaller genome is found in the obligate endosymbiont Carsonella ruddii; it has 159,662 (~0.16 x 106) base pairs of DNA encoding "182 ORFs (open reading frames or genes), 164 (90%) overlap with at least one of the two adjacent ORFs”257. Eukaryotic mitochondria and chloroplasts, derived from endosymbionts, have very small genomes. Typically mitochondrial genomes are ~16,000 base pairs in length and contain ~40 genes, while chloroplasts genomes are larger, ~120,000–170,000 base pairs in length, and encode ~100 genes. Most of the genes present in the original endosymbionts appear to have either been lost or transferred to the host cell’s nucleus. This illustrates a theme that we will return to, namely that genomes are not static. In fact, it is their dynamic nature that makes significant evolutionary change possible.

An interesting question is what is the minimal number of genes that an organism needs. Here we have to look at free living organisms, rather than parasites or endosymbionts, since they can rely on genes within their hosts. A common approach is to use mutagenesis to generate non-functioning (amorphic) versions of genes. One can then count the number of essential genes within a genome, that is, genes whose functioning is absolutely required for life. One complication is that different sets of genes may be essential in different environments, but we will ignore that for now. In one such lethal mutagenesis study Lewis et al found that 382 of the genes in Mycoplasma genitalium are essential; of these ~28% had no (as yet) known function258.

Whole genome sequencing

Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. [2] This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

Whole genome sequencing has largely been used as a research tool, but was being introduced to clinics in 2014. [3] [4] [5] In the future of personalized medicine, whole genome sequence data may be an important tool to guide therapeutic intervention. [6] The tool of gene sequencing at SNP level is also used to pinpoint functional variants from association studies and improve the knowledge available to researchers interested in evolutionary biology, and hence may lay the foundation for predicting disease susceptibility and drug response.

Whole genome sequencing should not be confused with DNA profiling, which only determines the likelihood that genetic material came from a particular individual or group, and does not contain additional information on genetic relationships, origin or susceptibility to specific diseases. [7] In addition, whole genome sequencing should not be confused with methods that sequence specific subsets of the genome - such methods include whole exome sequencing (1-2% of the genome) or SNP genotyping (<0.1% of the genome). As of 2017 there were no complete genomes for any mammals, including humans. Between 4% to 9% of the human genome, mostly satellite DNA, had not been sequenced. [8]

Physical Maps

A physical map provides detail of the actual physical distance between genetic markers, as well as the number of nucleotides. There are three methods used to create a physical map: cytogenetic mapping, radiation hybrid mapping, and sequence mapping.Cytogenetic mapping uses information obtained by microscopic analysis of stained sections of the chromosome (Figure). It is possible to determine the approximate distance between genetic markers using cytogenetic mapping, but not the exact distance (number of base pairs). Radiation hybrid mapping uses radiation, such as x-rays, to break the DNA into fragments. The amount of radiation can be adjusted to create smaller or larger fragments. This technique overcomes the limitation of genetic mapping and is not affected by increased or decreased recombination frequency. Sequence mapping resulted from DNA sequencing technology that allowed for the creation of detailed physical maps with distances measured in terms of the number of base pairs. The creation of genomic libraries and complementary DNA (cDNA) libraries (collections of cloned sequences or all DNA from a genome) has sped up the process of physical mapping. A genetic site used to generate a physical map with sequencing technology (a sequence-tagged site, or STS) is a unique sequence in the genome with a known exact chromosomal location. An expressed sequence tag (EST) and a single sequence length polymorphism (SSLP) are common STSs. An EST is a short STS that is identified with cDNA libraries, while SSLPs are obtained from known genetic markers and provide a link between genetic maps and physical maps.

A cytogenetic map shows the appearance of a chromosome after it is stained and examined under a microscope. (credit: National Human Genome Research Institute)

Mitochondrial Genome Evolution

Françoise Budar , Sota Fujii , in Advances in Botanical Research , 2012

4 Conclusion and Perspectives for Further Research

Coadaptation between organelle and nuclear genomes at the species level is widely accepted. The contribution of cytonuclear epistasis to genetic isolation, hence its possible involvement in speciation, has been recognized ( Alcázar et al., 2012 Chou & Leu, 2010 Greiner et al., 2011 Levin, 2003 ). The occurrence of cytonuclear epistasis within species has been documented by recent reports. However, the genetic diversification within a species of coadapted molecular partners encoded in different genetic compartments is probably underestimated so far, except for CMS appearing after intraspecific crosses. They reveal the contribution of the genomic conflict between the nuclear and maternally inherited organelle genomes in the raising of genetic barriers ( Alcázar et al., 2012 Geddy & Brown, 2007 Kato et al., 2007 ). However, it might be difficult to discriminate between the disruption of cooperative coadaptation and reactivation of a genomic conflict when observing maternally inherited male sterility after a cross between distantly related genotypes. It is likely that both mechanisms can act together in the phenotype of a hybrid plant.

The issue remains whether the model proposed for animal genomes, under which coadaptation is driven by variations in the mt genome first, and the subsequent selection of nuclear coadapted variants, is also valid for plant cytonuclear cooperative coadaptation. Better knowledge of the genetic variation occurring in plant organelle genomes both at the species level and within species, and probably a reappraisal of theoretical models, are needed before clarifying this issue.

Our knowledge on the genetic diversity of plant organelles is, in most species, restricted to pt intergenic polymorphisms thought to be neutral and used to infer maternal phylogenies. Obviously the programs based on the use of new generation sequencing (NGS) technologies will provide precious data on the substitutions occurring in plant organelle genomes (e.g. the 1001 genomes project for A. thaliana ). However, the peculiar mode of evolution of plant mt will probably also necessitate the de novo assembly of variant mt genomes, as a large amount of polymorphism in these plant organelles results from rearrangements (see Chapter 9 ) ( Davila et al., 2011 ). In addition, evidence is accumulating that both mt substitution rates and constraints on mt genome size fluctuate among plant lineages ( Sloan et al., 2012 ). The impact of these fluctuations on cytonuclear coevolution remains to be investigated.

Nevertheless, PPR proteins have been identified as major molecular actors in cytonuclear coadaptation, both in the frame of cooperative coadaptation and genomic conflict. Probably, this also reflects the peculiarities of cytonuclear coadaptation in plants, since the expansion of this protein family is characteristic of the green eukaryotic lineage.

Most of the known examples of molecular interactions underlying coadaptation between plant mt genomes and their nucleus involve PPR proteins and their target mt RNAs, as shown above. In yeast, two recently deciphered cytonuclear BDM incompatibilities involved nuclear-encoded factors that are required for the proper expression of specific mt genes ( Chou, Hung, Lin, Lee, & Leu, 2010 Lee et al., 2008 ).

In animals, several reports suggested that cytonuclear incompatibilities resulted from impaired electron transport chain function, due to poorly matched mt- and nuclear-encoded subunits ( Barrientos, Müller, Dey, Eienberg, & Moraes, 2000 Blier, Dufresne, & Burton, 2001 Sackton, Haney, & Rand, 2003 Wu, Schmidt, Goodman, & Grossman, 2000 ). In addition, in the case of the marine copepod, Tigriopus californicus, the decrease in complex IV (cytochrome oxidase) efficiency of unfit hybrids could be traced to single amino acid polymorphisms in the nuclear-encoded cytochrome c apoprotein and corresponding sequence variants of the mt-encoded subunit II of cytochrome oxidase ( Harrison & Burton, 2006 ). In laboratory evolved populations of T. californicus, mt-nuclear negative epistasis was found to depend on environmental conditions, namely the temperature regime ( Galloway & Fenster, 1999 Galloway & Fenster, 2001 Leinonen et al., 2011 Willett & Burton, 2003 ). However, in this species, in which populations evolve in almost strict isolation, hybrid breakdown in fitness appears to involve more complex BDM incompatibilities than simple two-factor cytonuclear epistasis ( Willett, 2011 ).

Although evidence for the contribution of cytoplasmic variation in plant adaptation to the environment is accumulating, mainly from ecological studies, this contribution has been neglected in most studies reported so far on plant adaptation. Regarding this aspect, research on animal mt evolution is several steps ahead. Nevertheless, for plants also, a key issue for environmental adaptation is bioenergetics ( Wallace, 2010 ). This should motivate us to pay more attention to organelle variation with respect to plant adaptation to new environments, particularly in the context of global climate change. In this context, a recent study on adaptation to climate in A. thaliana indicated the involvement of genes whose functions were related to photosynthesis and energy metabolism, among others ( Hancock et al., 2011 ). Both cases of organelle GC content and PPR editing factors and of the rbcL gene mentioned above (Section 3.2 ) are demonstrative examples that the issues of cytonuclear adaptation and adaptive variation in organelle genes are entangled ( Barrientos et al., 2000 Blier et al., 2001 Fujii & Small, 2011 Sackton et al., 2003 Savir et al., 2010 Wu et al., 2000 ). Therefore, coadaptation with nuclear genes will have to be carefully considered when addressing the contribution of organelle variants in plant adaptation.

Exploration of the adaptive features of mt–nuclear coevolution in plants will require a combination of approaches and collaborative efforts between scientific disciplines. In addition to exploration of the diversity in organelle and nuclear genes and thorough genetic analysis of their epistatic interactions, a comprehensive analysis of the physiological impact of poorly matched genetic combinations is highly desirable. The adaptive nature of the traced polymorphisms will also necessitate the evaluation of their impact on fitness in realistic ecological environments ( Bergelson & Roux, 2010 ). In this respect, deciphering the contributions of mt– (or pt–) nuclear epistatic interactions to fitness-related traits in varying environments represents an exciting challenge.

In addition, such studies are likely to provide precious knowledge for breeders. The impact of nuclear–cytoplasm interactions has been reported to be significant in a wide range of traits of interest in several crops. For instance, cytonuclear interactions and cytoplasmic variation were found to influence yield and low-temperature tolerance in rice ( Harrison & Burton, 2006 Tao et al., 2004 ). At the moment, the potential of organelle genetic variations and cytonuclear combination in breeding is mostly restricted to the use of CMS in hybrid seed production. It is most likely that exploitation of genetic resources for crop improvement and breeding strategies will benefit from increased knowledge of the cytonuclear component in the adaptive response of plants to their environment.


The ACE1 cluster is specific to few fungal species

A complete ACE1 cluster is present in only four of the 23 sequenced Pezizomycotina genomes (M. grisea, C. globosum, S. nodorum and A. clavatus). Such a sporadic distribution could be the result of either independent HGTs or frequent losses of the whole cluster in different lineages (Figure 3). We favor the latter explanation because - with the exception of A. clavatus - our phylogenetic trees of genes from the cluster have topologies that are in broad agreement with the expected species phylogeny [27]. We suggest that an ACE1-like cluster consisting of at least three genes (homologous to ACE1, RAP1 and ORF3) existed in the common ancestor of Pezizomycotina, but this cluster has been lost in many lineages subsequently. The scheme in Figure 3 identifies four independent lineages (shown by dashed lines) in which all copies of the cluster have been lost. We cannot tell, with current data, whether genes such as OXR1 that are present in the ACE1 clusters of Sordariomycetes and Dothideomycetes but not in the ACE1-like clusters of Eurotiomycetes correspond to lineage-specific additions or losses.

Inferred history of ACE1 and ACE1-like clusters in filamentous fungi. The gray rectangle corresponds to the ancient core cluster of three genes (ACE1, RAP1, ORF3) that is common to all ACE1 clusters (pink) and ACE1-like clusters (orange). The black arrow denotes the inferred HGT of part B of the cluster from a donor related to M. grisea to the A. clavatus recipient. Dashed branches and smaller fonts indicate euascomycetes that were included in our analysis but lack the clusters entirely. Phylogenetic relationships are based on [27] and N Fedorova and N Khaldi, unpublished data, for the topology within the genus Aspergillus. The tree is not drawn to scale.

Any tree showing apparent HGT of a gene can also be explained by an alternative scenario of gene duplications and losses. However, the situation reported here is rather different to typical cases of possible HGT of individual genes, because it involves multiple genes that are arranged as a large tandem duplication (in M. grisea). The fact that the A. clavatus ACE1 cluster forms a clade with the M. grisea part B genes (to the exclusion of the part A genes) means that the only alternative scenario to HGT is one where the part A/part B tandem duplication occurred right at the base of the tree in Figure 3. This scenario would then necessitate at least four events of precise loss of exactly one part of the tandemly duplicated set of genes: part B in C. globosum, part B in the ancestor of C. immitis and U. reesii, part B in S. nodorum, and part A in A. clavatus. Because of the precise nature of the deletion required (and choice of gene copy to delete), we do not regard this scenario as likely.

The discontinuous distribution of the ACE1 cluster among fungal species suggests that evolutionary constraints act to maintain this cluster only in few species. As M. grisea, S. nodorum and C. globosum are plant or animal pathogens, it is tempting to speculate that the ACE1 cluster is involved in the infection process of these three species. The metabolite produced by this biosynthetic pathway may be an important pathogenicity factor, but such a role remains to be determined. A. clavatus is different as it is not pathogenic. The presence of the ACE1 cluster in A. clavatus may arise from selection involving an unknown biological role of this metabolite in this fungus. Identifying the molecules made by these different clusters will be necessary to understand the role of the ACE1 cluster in fungal biology and could give clues about evolution of the ancestral biosynthetic pathway controlled by this cluster.

ACE1 cluster evolution in Sordariomycetes involved several duplication events

The ACE1 cluster has a complex history with multiple events of large-scale duplication and multiple losses. The scenario we infer is summarized in Figure 3. An ancient duplication produced the large ACE1 and smaller ACE1-like clusters. A second duplication event in an ancestral Sordariomycete gave rise to the two clusters (1 and 2) presently seen in C. globosum. This event occurred prior to the speciation between C. globosum and M. grisea, but M. grisea later lost its counterpart of cluster 2. Independently, cluster 1 underwent a tandem duplication event, generating parts A and B. This tandem duplication survived in M. grisea, but in C. globosum the addition (part B of cluster 1) was lost again. It might seem simpler to suggest that the part A/B tandem duplication was an event that occurred specifically in M. grisea after it diverged from C. globosum, but we know that this is incorrect because the part B genes from M. grisea form outgroups to a clade consisting of C. globosum and M. grisea part A genes. We can also be sure that the surviving duplications seen in M. grisea and C. globosum were separate events because of the topology of the phylogenetic trees: if the surviving genes were descended from the same duplication event we would expect that in the ACE1-SYN2 tree, for example, M. grisea ACE1 and SYN2 should each form a separate monophyletic group with one of the C. globosum genes, but that is not seen (Figure 2a). Instead we interpret the trees as indicative of two duplications of the whole cluster in a Sordariomycete ancestor of M. grisea and C. globosum, the first of which was non-tandem and the second of which was tandem. After this tandem duplication, the M. grisea lineage lost its ortholog of cluster 2 of C. globosum, and the C. globosum lineage lost its ortholog of part B of M. grisea (Figure 3). This pattern of frequent loss is consistent with the cluster's sporadic distribution in fungi.

ORF3 is unusual as it is inferred to have been present in the ancestor of all ACE1 and ACE1-like clusters, but in M. grisea it is not duplicated and it shows phylogenetic affinity to A. clavatus rather than to C. globosum or S. nodorum (Figure 2e). These properties suggest that a homolog of ORF3 was lost from part A of the M. grisea cluster, after the tandem duplication occurred. Furthermore, we speculate that the location of ORF3 on the boundary between parts A and B may indicate that the tandem duplication event visible in M. grisea involved a recombination between two copies of this gene.

Gene order and orientation is quite poorly conserved among the ACE1 clusters, as is typical of many secondary metabolism gene clusters [7, 8, 28]. This makes it all the more striking that the duplicated M. grisea genes each have one copy in the part A and one copy in part B. Because the tandem duplication that is evident in the M. grisea genome is not particularly recent (it predates the M. grisea/C. globosum speciation), we suggest that some form of selection has acted on gene order in the cluster, preventing intermixing of the two parts. In this context it is notable that recombination seems to be inhibited in the M. grisea ACE1 cluster, because it displays a low frequency of targeted gene replacement, even in a KU80 null mutant background where homologous recombination rates are increased ([29] Collemare et al, unpublished results). The way that part A and part B genes of the ACE1 cluster are distributed among species may indicate that they are involved in the biosynthesis of different molecules. Alternatively, parts A and B of the ACE1 cluster may be each involved in the biosynthesis of independent polyketide precursors that are fused into a final complex molecule as observed for lovastatin [25, 30, 31]. The fact that all 15 genes in the M. grisea ACE1 cluster are co-expressed at a very specific stage of the infection process (Collemare et al, unpublished results) favors the hypothesis that both part A and part B genes are involved in same biosynthetic pathway. However, gene knockout experiments have shown that two part B genes (RAP2 and SYN2) are not essential for the avirulence function supported up to now only by the part A gene ACE1 (Collemare et al, unpublished results). These latter results suggest that part A and part B genes could be involved in the biosynthesis of two different molecules, with only one (ACE1, part A pathway) being recognized by resistant rice cultivars. However, these two hypotheses are both plausible, and await the biochemical characterization of the Ace1 metabolite.

HGT of a fungal secondary metabolism gene cluster

Although the genomics era has uncovered evidence for widespread horizontal gene transfer among prokaryotes [32, 33], and from prokaryotes to eukaryotes [17, 34–37] or vice versa [38, 39], relatively few instances of horizontal gene transfer have been documented from one eukaryote to another [40–42]. Among fungi, the best documented is the transfer of a virulence gene from S. nodorum to Pyrenophora tritici-repens, which occurred only about 70 years ago [16]. In that case, the transferred DNA fragment was about 11 kb in size but contained only one gene. In this study we showed that part B of the ACE1 cluster (30 kb in size, containing 5-6 genes) was likely horizontally transferred from a close ancestor of M. grisea (a Sordariomycete) into an ancestor of A. clavatus (a Eurotiomyete). The mechanism by which HGT might have occurred remains a matter of speculation, but could perhaps have involved hyphal fusion between species, or endocytosis. Our inference of HGT is valid only if the Sordariomycete and Eurotiomycete clades are monophyletic as shown in Figure 1, but their monophyly is supported by several molecular and systematic analyses [27, 43–47].

To our knowledge, our study and the recent work of Patron et al [18] are the first reported instances of HGT of groups of linked genes involved in the same pathway between eukaryotic species. In both cases these secondary metabolite clusters show a punctate (sporadic) distribution among other species, with an ancestral cluster apparently having been lost by more species than the number that retain it. This pattern of frequent losses of genes and their occasional reacquisition by HGT resembles the pattern of evolution of "dispensable pathway" genes in ascomycete yeasts [48]. Hall and Dietrich [48] noted that genes whose products function in dispensable pathways are one of the few categories of genes in S. cerevisiae that are physically organized into gene clusters. They found that the pathway for biotin synthesis was lost in a yeast ancestor and then regained in the S. cerevisiae lineage by a combination of HGT from bacteria and gene duplication with neofunctionalization. One possible explanation for this strange pattern of evolution could be that an intermediate in the pathway is toxic [48], although there is no direct experimental evidence of this. If a pathway can confer a selective advantage in some circumstances but also involves the production of a toxic intermediate, there can be strong selection in favor of the pathway in some conditions and strong selection against it in others. The consequences of such a situation could include the formation of physical gene clusters (to reduce the chances of coding for only part of the pathway, or for strong repression of transcription mediated by chromatin remodelling), and occasional selection for re-gain of function by HGT. Further exploration of this hypothesis will require the discovery of more examples of similar sets of genes, and detailed characterization of the biochemical pathways involved.


The present analysis has revealed that tryptophan genes are rather frequent within the Sargasso Sea metagenome. All trp genes that were found have enough similarity to COGs to be recognized. This seems to indicate, but does not prove, that all have come from a common ancestor. However, additional genes for tryptophan biosynthesis may exist which we were unable to detect with the probes employed. In this regard, it has been reported [26] that some organisms indeed lack a recognizable trpF in their genomes but are capable of growing without external tryptophan. A gene whose sequence is not homologous to known trpFs but whose product catalyzes this reaction has in fact been found in Streptomyces coelicolor A3 and Mycobacterium tuberculosis HR37Rv [26]. This trpF gene is an example of reticulate evolution because it can catalyze reactions in both the histidine and tryptophan pathways [27, 28]. A BLAST search with the amino acid sequence of the trpF gene from Streptomyces coelicolor A3 gene (SCO2050) against the Sargasso Sea metagenome data showed more than 500 hits that can be identified as hisA proteins. Thus, only a functional analysis of these environmental sequences can prove whether they can take part in both pathways or not. The fact that a group of marine trpB_1 sequences are similar to one another but quite distant from the major trpB_1 group supports the idea that there may be trp genes that are not recognized as such by those sequences presently known.

While trp operons, both complete and split, exist in marine bacteria, many trp genes are no longer found in that framework. In contrast to most terrestrial bacteria, the operon structure is not used for the trp genes in some of marine origin. There are mini-operons of 2 genes in many cases (Table 5) and also an even more frequent occurrence of single trp genes. It is of course an open question whether what we observe is the result of the breakup of an original operon structure or that the trp operons at present have arisen from these unlinked genes. Since the marine environment is very exacting and selective, it is certain that organisms lacking an operon structure for the trp genes have found an evolutionary advantage in the organization of the trp genes that they possess. It should be mentioned that in Escherichia coli and Salmonella, about 50% of the genes encoding polypeptides involved in amino acid synthesis are separate although their trp genes are not. On the basis of our results in which novel trp gene orders were found, it appears likely that further studies of the trp genes and their regulation and organization will provide many future surprises.

Eugene V. Koonin
Vol. 39, 2005


AbstractOrthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A . Read More

Figure 1: The time dynamics of the usage of the terms “ortholog” and “paralog”. The PubMed database was searched using the Entrez search engine with the following queries: “ortholog or orthologs or or.

Figure 2: A hypothetical phylogenetic tree illustrating orthologous and paralogous relationships between three ancestral genes and their descendants in three species. LCA, last common ancestor (of the.

Figure 3: A hypothetical phylogenetic tree illustrating emergence of pseudoorthologs via lineage-specific gene loss.

Figure 4: Effect of horizontal gene transfer on orthology and paralogy. (a) A hypothetical evolutionary scenario with HGT leading to xenology. (b) A hypothetical evolutionary scenario with HGT leading.

Figure 5: Orthology and genome-specific best hits. (A) An evolutionary scheme illustrating the connection between orthology and symmetrical best hits (SymBets). X and Y represent two paralogous genes.

Figure 6: Coverage of selected genomes with clusters of orthologous groups of proteins (C/KOGs). (a) Prokaryotic genomes. (b) Eukaryotic genomes. The data are from (88). Filled volume, genes in C/KOGs.

Figure 7: Distribution of the number of paralogs in COGs for selected prokaryotic genomes. The data were extracted from the current COG version (88). The plot is shown in the double-logarithmic scale.

Figure 8: Xenologous displacement in situ of the ruvB gene in the mycoplasmas. (A) Organization of the Holliday junction resolvasome operon and surrounding genes in bacteria. COG0632, Holliday junctio.

Figure 9: Horizontal gene transfer leading to pseudoparalogy. The two pseudoparalogous peroxiredoxins from Aquifex aeolicus are shown in red, the three pseudoparalogs from the Thermoplasmas in blue, a.

Figure 10: Rearrangements of gene structure and orthology. (a) Domain architectures of bacterial and archaeal DnaG-like primases. (b) Independent fission of the DNA polymerase I gene in multiple bacte.


With the arrival of the Bos taurus genome assembly, bovine milk and lactation data can be linked to other mammalian genomes for the first time, allowing us to gain additional insight into the molecular evolution of milk and lactation. Mammals are warm-blooded vertebrate animals that nourish their young with milk produced by mammary glands. They first appeared approximately 166 million years ago, but their evolution can be traced back 310 million years when synapsids first branched from amniotes [1]. Two subclasses of mammals evolved, the prototherians and therians. Prototheria are monotremes, mammals that lay eggs extant species include the platypus and enchidnas. Theria are mammals that bear live young they are divided into the infraclasses Metatheria or marsupials - which include kangaroos and opossums - and the more common Eutheria or placental mammals - which include, for example, humans, dogs, mice, rats, and bovine species. Figure 1 shows the mammalian phylogenetic tree with approximate divergence times [2, 3]. Of the mammalian species listed, high coverage genomic data are available for the platypus (Ornithorhynchus anatinus), a prototherian, the opossum (Monodelphis domestica), a metatherian, and a number of placental mammals, including human (Homo sapiens), rat (Rattus norvegicus), mouse (Mus musculus), dog (Canis familiaris), and now bovine (Bos taurus).

Simplified phylogenetic tree illustrates relationships of representative extant Mammalian species. Estimates in millions of years ago (MYA) of origin of each major branch were derived from Bininda-Emonds et al. [2]. The two earliest splits established monotremes, (166.2 MYA), and marsupials and placentals (147.7 MYA). Approximately 50 million years pass before the origination of any extant groups, and then the four placental superorders (italicized capitals) arose within 2.4 million years of each other.

The reproductive strategy, developmental requirements of the young, and environment of the maternal-infant pair are thought to drive variation in milk composition among species. Platypus and opossum neonates are embryonic in appearance and dependent on milk for growth and immunological protection during the equivalent of the fetal period in placental mammals [4, 5]. In contrast, placental mammals have relatively longer gestation and shorter lactation periods. These reproductive strategies directly impact milk composition as the immature monotreme and marsupial young have different needs with regard to growth, development, and adaptive immunity. Other aspects of the reproductive strategy, such as the length of the lactation period and the maternal nutritional strategy, can also impact milk composition. For example, mammals that fast or feed little during lactation produce milks low in sugar but high in fat to minimize energy and water demands while sustaining nutrient transfer to the young [6]. The data in Table 1 illustrate that even the gross macronutrient composition of milk can be highly variable among species.

Because bovine milk is a major human food and agro-economical product, comparison of bovine milk with the milk of other species in the context of the bovine genome sequence is important not only to improve our understanding of mammary evolution but also of bovine milk production and human nutrition. The importance of bovine milk consumption to humans is underscored by the domestication of cattle and the convergent evolution of lactase persistency in diverse human populations [7]. The availability of the bovine genome sequence provides unique opportunities to investigate milk and lactation. Lactation has been studied more extensively in Bos taurus than in other species, resulting in extensive milk proteome data, milk production quantitative trait loci (QTL), and over 100,000 mammary-related bovine expressed sequence tags (ESTs).

In the present study, we identified the bovine lactation genome in silico and examined its content and organization. Utilizing the genomes of the seven mammals listed above and in Table 1, we investigated gene loss and duplication, phylogeny, sequence conservation, and evolution of milk and mammary genes. Given the conspicuous absence of some known abundant proteins, such as beta-lactoglobulin and whey acidic protein, in the milk of some species [8], we hypothesized that variation in milk composition resides in part in variation in the milk protein genome. We show that gene duplication and genomic rearrangement contribute to changes in the milk protein gene complement of Bos taurus and other species. Although the casein proteins are highly divergent across mammalian milks [9, 10], we report that milk and mammary genes are more highly conserved, on average, than other genes in the bovine genome. Our findings illustrate the importance of lactation for the survival of mammalian species and suggest that we must look more deeply, perhaps into the non-coding regions of the genome that regulate milk protein gene expression, to understand the species-specificity of milk composition. Among mammals, we find milk proteins that are most divergent have nutritional and immunological functions, whereas the least divergent milk protein genes have functions that are important for the formation and secretion of mammalian milk. High conservation of milk fat globule membrane protein genes among the mammalian genomes suggests that the secretory process for milk production was firmly established more than 160 million years ago.

Dinoflagellate Genome Structure Unlike Any Other Known

Amanda Heidt
May 10, 2021

ABOVE: Species of dinoflagellate

A n international team of researchers has generated the most robust genome to date of the dinoflagellate Symbiodinium microadriaticum, a species involved in a life-supporting symbiosis with corals. While the updated genome confirms some of what has been suggested by previous work, an unusual relationship between DNA transcription and the shape and organization of their chromosomes reveals that dinoflagellates harbor some of the strangest genomes in the eukaryotic world, according to findings published April 29 in Nature Genetics.

Rather than the flexible, X-shaped chromosomes familiar to humans, dinoflagellates organize their genetic material in orderly blocks along rigid, rod-shaped chromosomes. Genes within blocks are consistently transcribed in one direction and rarely interact with others outside their immediate vicinity. This odd arrangement, the authors found, influences the three-dimensional structure of the entire chromosome.

“This is definitely a breakthrough within the field. We’ve been generating assemblies for these microalgae for a few years now . . . but the quality of those genomes has made them very difficult to work with,” says Raúl González-Pech, a computational biologist and postdoc at the University of South Florida who was not involved in the work. “This genome assembly is particularly good because it has incorporated new sequencing technologies to get a higher resolution, which will allow us to go deeper into analyses of different aspects of [dinoflagellate] biology and evolution.”

We normally think of genomes as something very static, but dinoflagellates have shown me that they are incredibly plastic.

Dinoflagellates are best known for their relationship to corals. In exchange for a safe home, the single-cell microalgae provide the coral with photosynthetic nutrients. When corals bleach, it’s because they’re expelling their symbionts in response to stress. But dinoflagellates as a group are diverse, with some nonsymbiotic species causing prolific red tides, while others are common parasites of crustaceans.

At least some of this diversity is tied to their strange genetic makeup. Their genomes, for one, are massive. S. microadriaticum’s genome is relatively small among dinoflaggelates, but it’s still one-third the size of the human genome. And rather than regulate gene expression only through transcription, dinoflagellates also engage in rampant gene and chromosome duplication, making genome assembly a nightmarish effort for geneticists—putting the puzzle together is more difficult when many of the pieces look identical.

Until very recently, it was also thought that dinoflagellates lacked the histones that condense and package DNA and are present in all other eukaryotes. While recent studies have found that they do in fact have histones, they likely don’t serve the same purpose. Ordinarily, histones work like spools, allowing DNA to wind and unwind to become more or less accessible to transcriptional machinery as needed. In contrast, dinoflagellate chromosomes seem to be perpetually condensed into a crystalline structure, leaving unanswered questions about how their DNA is organized and how it can be accessed for transcription.

“They don’t fit with everything else we know about eukaryotes—how they structure their chromosomes, how they structure their genomes, how they regulate transcription,” Manuel Aranda, a functional geneticist at King Abdullah University of Science and Technology in Saudi Arabia and an author of the new study.

The same day that Aranda’s paper was published, another study, led by a team of researchers from Stanford University, reported a similar analysis of the genome of the closely related dinoflagellate Breviolum minutum. Both teams relied on sequencing approaches that generate longer reads and used an analysis called Hi-C to assemble and study their genomes. Hi-C infers how often any two sequences interact with one another. In theory, the closer two loci are on a chromosome, the more likely they should be to interact, while sequences that are further away, or on different chromosomes altogether, might never interact at all. Based on these interaction frequency maps, researchers can piece together the genome and make educated guesses as to the shape and three-dimensional structure of the chromosomes.

Aranda’s Hi-C analysis concluded that S. microadriaticum has roughly 94 rigid, rod-shaped chromosomes that include more than 600 million base pairs. Gene density increased near the telomeres at the chromosomes’ tips, and some chromosomes were enriched for genes related to specific functions or pathways. This finding lends support to a longstanding idea that dinoflagellates may organize their genes like bacterial operons, clusters of related genes that are under the control of the same regulatory machinery and therefore expressed together.

The team also identified an unusual pattern within each chromosome of “alternating unidirectional blocks” of genes, the authors write in the paper. Two blocks sitting next to each other on a chromosome make up what the researchers called a domain, and genes within a domain frequently interact with one another and rarely with those in other domains. At the ends of each domain, the researchers surmised, are some sort of physical boundaries that acted as bookends, although it’s not clear what creates these boundaries. While the orientation of genes on a chromosome is usually random, in the case of the dinoflagellate, one block in the domain was consistently transcribed in one direction while the other block was transcribed in the opposite direction.

What drove the evolution of this unique pattern remains unknown, although it isn’t an entirely novel finding, says Senjie Lin, a phytoplankton ecologist at the University of Connecticut who studies dinoflagellate genomics but was not involved in the current work. Previous research using microscopy to visualize dinoflagellate chromosomes noted that these blocks often appeared as dark, evenly spaced bars. “What’s good about this paper is now you see it from the sequence perspective, whereas previously it was more the structural perspective,” Lin tells The Scientist. The team studying the B. minutum genome noted same organizational pattern, referring to paired blocks as dinoflagellate topologically associating domains, or dinoTADs.

Something that is new, Lin and González-Pech agree, is the correlational link between gene transcription and chromosomal structure and folding found in Aranda’s study. As the two blocks in each domain untwisted during transcription, the DNA outside the boundaries remained fixed. This caused a buildup of twisting at those boundaries. Imagine pulling apart strands of yarn or embroidery thread from the middle, while holding the ends in place the sides will twist tighter as the center is unwound. Consequently, Aranda says, “you end up with these two opposing twirls within the domain that create the structure, which then creates the domain boundaries.”

When the team treated dinoflagellates with an inhibitor to block transcription, the boundaries between domains disappeared, suggesting that for dinoflagellates, transcription and chromosomal structure are intimately linked. Whatever is happening at those boundaries “must be something really important in organizing the chromosome,” Lin says, and “may be important in regulating gene expression.

The mystery of the domain boundaries is just one of many new questions researchers would like to answer using these new, high-quality genomes.

Lin previously sequenced the genome of Fugacium kawagutii, another coral symbiont that is closely related to Symbiodinium. Despite the ecological similarities between the two, when Lin used Hi-C to analyze the genome, he found only 30 chromosomes—far less than S. microadriaticum’s 94—and the chromosomes of F. kawagutii were much longer on average. The handful of dinoflagellates that have been sequenced show that massive restructuring is likely the rule, rather than the exception, says González-Pech, and as more genomes are analyzed, comparative genomics will become a valuable tool for understanding why.

“We normally think of genomes as something very static, but dinoflagellates have shown me that they are incredibly plastic, that they really represent the boundaries of that plasticity in eukaryotes,” González-Pech tells The Scientist. “We’re already pushing boundaries here inside the family, so now we can start going for larger, more-complex dinoflagellate genomes. I think that’s coming up.”

A. Nand et al., “Genetic and spatial organization of the unusual chromosomes of the dinoflagellate Symbiodinium microadriaticum,” Nat Genet, 53:618–29, 2021.

Jeffrey Lawrence

Our research is directed toward elucidating the evolution of bacterial genomes, including their size, composition, variability and organization. In other words, why do genomes have the genes that they do? An understanding of the evolutionary process that leads to differences in genomes will shed light on how species themselves differentiate. We take computations, theoretical and experimental approaches to understanding how genomes evolve.

Speciation. Bacterial speciation - the process by which lineages become genetically and ecologically distinct from one another - is quite different from its eukaryotic counterpart. The differences arise from both the manner by which bacteria adapt (by gene acquisition, rather than gene modification) and the constraints on their gene exchange. Our work has supported a "fragmented" model of speciation, whereby lineages become genetically isolated on a gene-by-gene basis over a period of tens of millions of years.

Ecological adaptation. Which are the first genes to become genetically isolated in nascent species? Among the earliest diverging genes in the Salmonella chromosome are those that encode the O-antigen biosynthetic machinery. We have been investigating the role of protozoan predation in driving this diversification. Here, different antigens allow the newly-diverging Salmonella to escape protozoan predators in different environments.

Genomic architecture. The fate of a newly-arrived gene is the function of two factors. Its likelhood of retention increases as it provides an increasingly beneficial function. However, its insertion may also be detrimental in interfering with genome-wide patterns fo information required to successfully manipulate the massive DNA polymer duing growth and reproduction. We study the embdedd information - here termed architecture - which differs between organisms and controls the flow of genes between taxa.

Dr. Lawrence is seeking graduate students in the 2020 or 2021 incoming classes, from either the MCDB or EE graduate programs, with interests in computation biology and genome evolution.


  1. Nechemya

    I with you completely agree.

  2. Mukora

    Bravo, what necessary words..., a remarkable idea

  3. Wilton

    I consider, that you are not right. I suggest it to discuss. Write to me in PM.

  4. Senna

    You are wrong. I propose to discuss it. Write to me in PM, speak.

  5. Elwood

    Can you tell me where to buy a new iPhone? I just can't find it in Moscow ...

  6. Spyridon

    I can recommend that you visit a site that has a lot of information on this subject.

Write a message