Information

5.12: Biosynthesis - Biology

5.12: Biosynthesis - Biology


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

5.12: Biosynthesis

Computational Biologist @ NSW

Brief position description: The Garvan Institute of Medical Research brings together world-leading clinicians and basic and translational researchers to break down barriers between traditional scientific disciplines and find solutions to disease. Founded in 1963, Garvan’s mission is to harness all the information encoded in our genome to better diagnose, treat, predict and prevent disease.

Our scientists work across four intersecting research themes: medical genomics, epigenetics, and cellular genomics diseases of immunity and inflammation cancer and diseases of ageing affecting bone, brain and metabolism. In addition, Garvan has three major Centres: The Kinghorn Centre for Clinical Genomics, the Garvan-Weizmann Centre for Cellular Genomics, and the Centre for Population Genomics.

This position is based in the Centre for Population Genomics (CPG), a joint initiative of the Garvan Institute of Medical Research (Garvan) in Sydney and the Murdoch Children’s Research Institute (MCRI) in Melbourne.

CPG’s vision is a world in which genomic information enables comprehensive disease prediction, accurate diagnosis and effective therapeutics for all people. CPG’s purpose is: To establish respectful partnerships with diverse communities, collect and analyse genomic data at transformative scale and drive genomic discovery and equitable genomic medicine in Australia.

CPG is led by experts in community engagement, software development, genomic analysis and project management. Director Daniel MacArthur previously served as the co-director of Medical and Population Genetics at the Broad Institute of MIT and Harvard, where he led the development of the Genome Aggregation Database (gnomAD), the largest and most widely used collection of human DNA sequencing data in the world.

We are seeking a Computational Biologist to contribute to the development of novel analysis pipelines for large-scale genomic datasets, to apply these pipelines to increasingly large human genomic data sets, and to work closely with other Centre staff and external researchers to carry out and publish novel analyses of the data. The Computational Biologist will join a highly collaborative team including other analysis and methods development staff, software engineers, project managers, communities specialists, and trainees.

As part of the mission of the Centre for Population Genomics, this team will be responsible for handling data generated through new sequencing projects as well as the aggregation of public resources to create a new resource for human genetics that reflects the remarkable population diversity of Australia. The team will also manage genome, exome, RNA-seq and other genomic data sets from rare disease patients and their family members, with the goal of improving the diagnosis of severe genetic diseases. These analyses will be performed within a scalable cloud-based computational platform. The resulting data sets and software will be widely shared with the broader research community, maximising their impact on science and health outcomes.

While this position will involve direct exposure to cutting-edge genomic science, the measures of success and promotion will be on developing and deploying complex analysis methods at scale, and not on traditional academic metrics such as leading publications. These positions are centrally funded and are not dependent on fellowships or other fundraising. These roles will be best-suited to individuals with a strong track record of scientific output who are interested in continuing to contribute to impactful science but not in the traditional academic career track.

To adapt to the impact of COVID-19, we have launched the Centre under a completely remote model and are open to remote positions over the longer term.

This is a 3-year full-time position with an annual salary range between $92,000 - $120,000k taking into consideration skillset and experience, plus superannuation and salary packaging benefits and the possibility to extend.

The Computational Biologist will be part of the Centre’s Analysis Team, which will be responsible for developing scalable pipelines for complex genomic analyses, for working closely with the Centre’s software development team to implement these at production scale across data from tens of thousands of individuals, and for performing quality control and analysis across the resulting data sets. This team will be composed of members with diverse skills across computational biology, statistical genetics, and population genetics, working collaboratively to create these pipelines, perform analyses, and contribute extensively to the publication of high-impact science.

The key responsibilities include:

Developing novel approaches, and rigorously benchmarking existing approaches, for the analysis of large-scale genome sequencing, RNA-seq, single-cell RNA-seq, and other genomic data types across tens of thousands of human samples

Developing prototype code for analysis approaches, and working closely with a software engineering team to ensure this code can be deployed at scale and shared as open-source software

Performing rigorous quality control and analysis of massive genomic data sets

Meeting regularly with Centre trainees and other scientific staff to identify major challenges in data handling and analysis, and work proactively with other team members to define solutions

Contributing actively to scientific discussions around the best approaches to make sense of the large data sets generated and handled by the Centre

Contributing actively to the development of programming and analysis best practices through regular code review meetings and other activities such as pair programming

The key skills and experience include:

Either a masters or PhD in computational biology, functional genomics, machine learning, statistics, or a related field, or an equivalent total amount of direct work experience in these fields

Considerable experience with Python and R, or similar

Demonstrated experience working in high-performance or cloud computing environments

Demonstrated experience with version control and software repositories

Direct experience in performing quality control and analysis of genomic data sets or other complex data types

Highly autonomous and self-motivated: able to define and manage the execution of novel strategies for analysis across their domain of expertise

Good written communication skills: able to contribute effectively to papers and technical reports

A genuine passion for open-source software development, and contributing code to the wider computational biology ecosystem

Highly collaborative: more focused on solving important biological and medical problems by working with others than with securing individual credit and willing to engage with other team members from a diverse range of background to execute on complex scientific tasks

A problem-solving mentality: able to navigate a complex and dynamic series of technical obstacles, and to pivot rapidly when needed, to build a first-of-its-kind research project someone who identifies problems even if they fall outside their immediate mandate, and works with other team members to solve them

Direct experience with cloud computing, and the analysis of very large genomic data sets, would be beneficial

Direct experience in complex data visualization or machine learning approaches would also be a plus

To apply for this position, please submit your application with a CV and cover letter as one document, stating why you are interested in this role. We are reviewing applications as they are received. If you think you’re the right person for this role, we’d love to hear how your capabilities, achievements and experience set you apart. Only applicants with full working rights in Australia are eligible to apply for this role.


Background

Extreme environments, generally characterized by abnormal temperature, pH, pressure, salinity, toxicity and radiation levels, are inhabited by various organisms - extremophiles - that are specifically adapted to these particular conditions. Studies on these microorganisms has led to the development of important molecular biology techniques such as polymerase chain reaction (PCR) [1, 2] and hence further research has been largely stimulated by the industry's interest on the fact that the survival mechanisms of these microorganisms could be transformed into valuable applications ranging from wastewater treatment to the diagnosis of infectious and genetic diseases [3].

Halophilic microorganisms are extremophiles that are able to survive high osmolarity in hypersaline conditions either by maintenance of high salinity in their cytoplasm or by intracellular accumulation of osmoprotectants such as ectoine and betaine [4]. C. salexigens is a halophilic Gammaproteobacterium of the family Halomonadaceae with a versatile metabolism allowing not only fast growth on a large variety of simple carbon compounds as its sole carbon and energy source but also resistance to saturated and aromatic hydrocarbons and heavy metals [5, 6]. C. salexigens with the ability to grow over a wide range of salinities [0.5-4 M NaCl] has been the most euryhaline of the bacteria [7] and to understand the osmoregulatory mechanisms in halophilic bacteria, it has been used as a model organism [5, 7–9]. Moreover, C. salexigens has also many promising biotechnological applications as a source of compatible solutes, salt-tolerant and recombinant enzymes, biosurfactants and exopolysaccharides [10].

Genome sequence of extremophiles, such as sulphate-reducing archaeon Archaeaglobus fulgidus[11], halophilic archaeon Halobacterium species NRC-1 [12] and acidophilic bacterium Acidithiobacillus ferrooxidans[13] have been reported earlier. Since the publication of the genome of C. salexigens DSM 3043 [14] the biological knowledge about this strain has significantly increased and various methods that allow the genomic analysis and genetic manipulation have been developed [15, 16]. On the other hand, systematic analysis of its metabolic and biotechnological capacities have not been performed yet. This is, at some level, due to the lack of an in silico comprehensive metabolic model that enables the integration of canonical experimental data in a coherent fashion.

Metabolic reconstruction is non-automated and iterative decision-making process through which the genes, enzymes, reactions and metabolites that participate in the metabolic activity of a biological system are identified, categorized and interconnected to form a network [17]. The reconstruction process has been reviewed conceptually in literature [17–22] and, recently, a standard operating protocol giving a detailed overview of the necessary data and steps has been published [23]. To date, genome-scale metabolic reconstructions for more than 50 organisms have been published and this number is expected to increase rapidly. Therefore, the need for developing automated, or at least semi-automated, ways to reconstruct metabolic networks is growing. A limited number of software tools, such as Pathway tools [24], metaSHARK [25], Simpheny (Genomatica), which aim at assisting and facilitating the reconstruction process are available. However, recent reviews [18, 26] highlight current problems with genome annotations and databases, which make automated reconstructions challenging and thus they require manual evaluation. Genome-scale metabolic reconstructions have been successfully applied to several organisms across eukaryotic (e.g., Saccharomyces cerevisiae[21, 27–29], human [30], Arabidopsis thaliana[31]), prokaryotic (e.g., Escherichia coli[32–34], Bacillus subtilis[35], Helicobacter pylori[36, 37], Lactococcus lactis[38], Staphylococcus aureus[39, 40], Clostridium acetobutylicum[41], Pseudomonas putida[42], Pseudomonas aeruginosa[43], Geobacter metallireducens[44], Corynebacterium glutamicum[45]), and archaeal (e.g., Methansoarcina barkeri[46], Halobacterium salinarum[47] species). Being a useful guide for identification and filling of knowledge gaps, these metabolic networks have been used toward simulation of the cellular behavior under different genetic and physiological conditions, contextualization of high-throughput data, directing hypothesis driven discovery, interrogation of multi-species relationships and topological analysis (See [17] for an extensive review).

Here, a genome-scale reconstruction of C. salexigens DSM 3043's metabolism was established based on genomic, biochemical and physiological information. Being the first comprehensive metabolic model of a halophilic bacterium, it was labeled as i OA584 following the naming convention proposed by [33]. The predictive potential of the model was validated not only against literature data on the in vivo C. salexigens phenotypic features, the transport and use of different substrates but also against experimental observations on the choline - betaine and ectoine synthesis pathways which are important parts of the osmoadaptation mechanism.


Results

We used the FOCI network model to estimate a coexpression network for 5,007 yeast open reading frames (ORFs). The data for this analysis are drawn from publicly available microarray measurements of gene expression under a variety of physiological conditions. The FOCI method assumes a linear model of association between variables and computes dependence and independence relationships for pairs of variables up to a first-order (that is, single) conditioning variable. More detailed descriptions of the data and the network estimation algorithm are provided in the Materials and methods section.

On the basis of an edge-wise false-positive rate of 0.001 (see Materials and methods), the estimated network for the yeast expression data has 11,450 edges. It is possible for the FOCI network estimation procedure to yield disconnected subgraphs - that is, groups of genes that are related to each other but not connected to any other genes. However, the yeast coexpression network we estimated includes a single giant connected component (GCC, the largest subgraph such that there is a path between every pair of vertices) with 4,686 vertices and 11,416 edges. The next largest connected component includes only four vertices thus the GCC represents the relationships among the majority of the genes in the genome. In Figure 1 we show a simplification of the FOCI network constructed by retaining the 4,000 strongest edges. We used this edge-thresholding procedure to provide a comprehensible two-dimensional visualization of the graph all the results discussed below were derived from analyses of the entire GCC of the FOCI network.

Simplification of the yeast FOCI coexpression network constructed by retaining the 4,000 strongest edges (= 1,729 vertices). The colored vertices represent a subset of the locally distinct subgraphs of the FOCI network letters are as in Table 2, and further details can be found there. Some of the locally distinct subgraphs of Table 2 are not represented in this figure because they involve subgraphs whose edge weights are not in the top 4,000 edges.

The mean, median and modal values for vertex degree in the GCC are 4.87, 4 and 2 respectively. That is, each gene shows significant expression relationships to approximately five other genes on average, and the most common form of relationship is to two other genes. Most genes have five or fewer neighbors, but there is a small number of genes (349) with more than 10 neighbors in the FOCI network the maximum degree in the graph is 28 (Figure 2a). Thus, approximately 7% of genes show significant expression relationships to a fairly large number of other genes. The connectivity of the FOCI network is not consistent with a power-law distribution (see Additional data file 1 for a log-log plot of this distribution). We estimated the distribution of path distances between pairs of genes (defined as the smallest number of graph edges separating the pair) by randomly choosing 1,000 source vertices in the GCC, and calculating the path distance from each source vertex to every other gene in the network (Figure 2b). The mean path distance is 6.46 steps, and the median is 6.0 (mode = 7). The maximum path distance is 16 steps. Therefore, in the GCC of the FOCI network, random pairs of genes are typically separated by six or seven edges.

Topological properties of the yeast FOCI coexpression network. Distribution of (a) vertex degrees and (b) path lengths for the network.

Coherence of the FOCI network with known metabolic pathways

To assess the biological relevance of our estimated coexpression network we compared the composition of 38 known metabolic pathways (Table 1) to our yeast coexpression FOCI network. In a biologically informative network, genes that are involved in the same pathway(s) should be represented as coherent pieces of the larger graph. That is, under the assumption that pathway interactions require co-regulation and coexpression, the genes in a given pathway should be relatively close to each other in the estimated global network.

We used a pathway query approach to examine 38 metabolic pathways relative to our FOCI network. For each pathway, we computed a quantity called the 'coherence value' that measures how well the pathway is recovered in a given network model (see Materials and methods). Of the 38 pathways tested, 19 have coherence values that are significant when compared to the distribution of random pathways of the same size (p < 0.05 see Materials and methods). Most of the pathways of carbohydrate and amino-acid metabolism that we examined are coherently represented in the FOCI network. Of each of the major categories of metabolic pathways listed in Table 1, only lipid metabolism and metabolism of cofactors and vitamins are not well represented in the FOCI network.

The five largest coherent pathways are glycolysis/gluconeogenesis, the TCA cycle, oxidative phosphorylation, purine metabolism and synthesis of N-glycans. Other pathways that are distinctive in our analysis include the glyoxylate cycle (6 of 12 genes in largest coherent subnetwork), valine, leucine, and isoleucine biosynthesis (10 of 15 genes), methionine metabolism (6 of 13 genes), phenylalanine, tyrosine, and tryptophan metabolism (two subnetworks each of 6 genes). Several coherent subsets of the FOCI network generated by these pathway queries are illustrated in the Additional data file 1.

Combined analysis of core carbohydrate metabolism

In addition to being consistent with individual pathways, a useful network model should capture interactions between pathways. To explore this issue we queried the FOCI network on combined pathways and again measured its coherence. We illustrate one such combined query based on four related pathways involved in carbohydrate metabolism: glycolysis/gluconeogenesis, pyruvate metabolism, the TCA cycle and the glyoxylate cycle.

Figure 3 illustrates the largest subgraph extracted in this combined analysis. The combined query results in a subset of the FOCI network that is larger than the sum of the subgraphs estimated separately from individual pathways because it also admits non-query genes that are connected to multiple pathways. The nodes of the graph are colored according to their membership in each of the four pathways as defined by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Many gene products are assigned to multiple pathways. This is particularly evident with respect to the glyoxylate cycle the only genes uniquely assigned to this pathway are ICL1 (encoding an isocitrate lyase) and ICL2 (a 2-methylisocitrate lyase).

Largest connected subgraph resulting from combined query on four pathways involved in carbohydrate metabolism: glycolysis/gluconeogenesis (red) pyruvate metabolism (yellow) TCA cycle (green) and the glyoxylate cycle (pink). Genes encoding proteins involved in more than one pathway are highlighted with multiple colors. Uncolored vertices represent non-pathway genes that were recovered in the combined pathway query. See text for further details.

In this combined pathway query the TCA cycle, glycolysis/gluconeogenesis, and glyxoylate cycle are each represented primarily by a single two-step connected subgraph (see Materials and methods). Pyruvate metabolism on the other hand, is represented by at least two distinct subgraphs, one including <PCK1, DAL7, MDH2, MLS1, ACS1, ACH1, LPD1, MDH1> and the other including <GLO1, GLO2, DLD1, CYB2>. This second set of genes encodes enzymes that participate in a branch of the pyruvate metabolism pathway that leads to the degradation of methylglyoxal (methylglyoxal → L-lactaldehyde → L-lactate → pyruvate and methylglyoxal → (R)-S-lactoyl-glutathione → D-lactaldehyde → D-lactate → pyruvate) [12, 13]. In the branch of methylglyoxal metabolism that involves S-lactoyl-glutathione, methyglyoxal is condensed with glutathione [12]. Interestingly, two neighboring non-query genes, GRX1 (a neighbor of GLO2) and TTR1 (neighbor of CYB2), encode proteins with glutathione transferase activity.

The position of FBP1 in the combined query is also interesting. The product of FBP1 is fructose-1,6-bisphosphatase, an enzyme that catalyzes the conversion of beta-d-fructose 1,6-bisphosphate to beta-D-fructose 6-phosphate, a reaction associated with glycolysis. However, in our network it is most closely associated with genes assigned to pyruvate metabolism and the glyoxylate cycle. The neighbors of FBP1 in this query include ICL1, MLS1, SFC1, PCK1 and IDP3. With the exception of IDP3, the promoters of all of these genes (including FBP1) have at least one upstream activation sequence that can be classified as a carbon source-response element (CSRE), and that responds to the transcriptional activator Cat8p [14]. This set of genes is expressed under non-fermentative growth conditions in the absence of glucose, conditions characteristic of the diauxic shift [15]. Considering other genes in the vicinity of FBP1 in the combined pathway query we find that ACS1, IDP2, SIP4, MDH2, ACH1 and YJL045w have all been shown to have either CSRE-like activation sequences and/or to be at least partially Cat8p dependent [14]. The association among these Cat8p-activated genes persists when we estimate the FOCI network without including the data of DeRisi et al. [15], suggesting that this set of interactions is not merely a consequence of the inclusion of data collected from cultures undergoing diauxic shift.

The inclusion of a number of other genes in the carbohydrate metabolism subnetwork is consistent with independent evidence from the literature. For example, McCammon et al. [16] identified YER053c as among the set of genes whose expression levels changed in TCA cycle mutants.

Although many of the associations among groups of genes revealed in these subgraphs can be interpreted either in terms of the query pathways used to construct them or with respect to related pathways, a number of association have no obvious biological interpretation. For example, the tail on the left of the graph in Figure 3, composed of LSC1, PTR2, PAD1, OPT2, ARO10 and PSP1 has no clear known relationship.

Locally distinct subgraphs

The analysis of metabolic pathways described above provides a test of the extent to which known pathways are represented in the FOCI graph. That is, we assumed some prior knowledge about network structure of subsets of genes and asked whether our estimated network is coherent vis-à-vis this prior knowledge. Conversely, one might want to find interesting and distinct subgraphs within the FOCI network without the injection of any prior knowledge and ask whether such subgraphs correspond to particular biological processes or functions. To address this second issue we developed an algorithm to compute 'locally distinct subgraphs' of the yeast FOCI coexpression network as detailed in the Materials and methods section. Briefly, this is an unsupervised graph-search algorithm that defines 'interestingness' in terms of local edge topology and the distribution of local edge weights on the graph. The goal of this algorithm is to find connected subgraphs whose edge-weight distribution is distinct from that of the edges that surround the subgraph thus, these locally distinct subgraphs can be thought of as those vertices and associated edges that 'stand out' from the background of the larger graph as a whole.

We constrained the size of the subgraphs to be between seven and 150 genes, and used squared marginal correlation coefficients as the weighting function on the edges of the FOCI graph. We found 32 locally distinct subgraphs, containing a total of 830 genes (Table 2). Twenty-four out of the 32 subgraphs have consistent Gene Ontology (GO) annotation terms [17] with p-values less than 10 -5 (see Materials and methods). This indicates that most locally distinct subgraphs are highly enriched with respect to genes involved in particular biological processes or functions. Members of the 21 largest locally distinct subgraphs are highlighted in Figure 1. The complete list of subgraphs and the genes assigned to them is given in Additional data file 2.

The five largest locally distinct subgraphs have the following primary GO annotations: protein biosynthesis (subgraphs A and B) ribosome biogenesis and assembly (subgraph C) response to stress and carbohydrate metabolism (subgraph K) and sporulation (subgraph N). Several of these subgraphs show very high specificity for genes with particular GO annotations. For example, in subgraphs A and B approximately 97% (32 out of 33) and 95.5% (64 out of 67) of the genes are assigned the GO term 'protein biosynthesis'.

Subgraph P is also relatively large and contains many genes with roles in DNA replication and repair. Similarly, 21 of the 34 annotated genes in Subgraph F have a role in protein catabolism. Three medium-sized subgraphs (S, T, U) are strongly associated with the mitotic cell cycle and cytokinesis. Other examples of subgraphs with very clear biological roles are subgraph R (histones) and subgraph Z (genes involved in conjugation and sexual reproduction). Subgraph X contains genes with roles in methionine metabolism or transport.

Some locally distinct subgraphs can be further decomposed. For example, subgraph K contains at least two subgroups. One of these is composed primarily of genes encoding chaperone proteins: STI1, SIS1, HSC82, HSP82, AHA1, SSA1, SSA2, SSA4, KAR2, YPR158w, YLR247c. The other group contains genes primarily involved in carbohydrate metabolism. These two subgroups are connected to each other exclusively through HSP42 and HSP104.

Three of the locally distinct subgraphs - Q, W and CC - are composed primarily of genes for which there are no GO biological process annotations. Interestingly, the majority of genes assigned to these three groups are found in subtelomeric regions. These three subgraphs are not themselves directly connected in the FOCI graph, so their regulation is not likely to be simply an instance of a regulation of subtelomeric silencing [18]. Subgraph Q includes 26 genes, five of which (YRF1-2, YRF1-3, YRF1-4, YRF1-5, YRF1-6) correspond to ORFs encoding copies of Y'-helicase protein 1 [19]. Eight additional genes (YBL113c, YEL077c, YHL050c, YIL177c, YJL225c, YLL066c, YLL067c, YPR204w) assigned to this subgraph also encode helicases. This helicase subgraph is closely associated with subgraph P, which contains numerous genes involved in DNA replication and repair (see Figure 1). Subgraph W contains 10 genes, only one of which is assigned a GO process, function or component term. However, nine of the 10 genes in the subgraph (PAU1, PAU2, PAU4, PAU5, PAU6, YGR294w, YLR046c, YIR041w, YLL064c) are members of the seripauperin gene family [20], which are primarily found subtelomerically and which encode cell-wall mannoproteins and may play a role in maintaining cell-wall integrity [18]. Another example of a subgraph corresponding to a multigene family is subgraph CC, which includes nine subtelomeric ORFs, six of which encode proteins of the COS family. Cos proteins are associated with the nuclear membrane and/or the endoplasmic reticulum and have been implicated in the unfolded protein response [21].

As a final example, we consider subgraph FF, which is composed of seven ORFs (YAR010c, YBL005w-A, YJR026w, YJR028w, YML040w, YMR046c, YMR051c) all of which are parts of Ty elements, encoding structural components of the retrotransposon machinery [22, 23]. This set of genes nicely illustrates the fact that delineating locally distinct groups can lead to the discovery of many interesting interactions. There are only six edges among these seven genes in the estimated FOCI graph, and the marginal correlations among the correlation measures of these genes are relatively weak (mean r

0.62). Despite this, the local distribution of edge weights in FOCI graph is such that this group is highlighted as a subgraph of interest. Locally strong subgraphs such as these can also be used as the starting point for further graph search procedures. For example, querying the FOCI network for immediate neighbors of the genes in subgraph FF yields three additional ORFs - YBL101w-A, YBR012w-B, and RAD10. Both YBL101w-A and YBR012w-B are Ty elements, whereas RAD10 encodes an exonuclease with a role in recombination.


Graduate Subjects

MIT-WHOI Joint Program in Oceanography

7.410 Applied Statistics

Prereq: Permission of instructor
G (Spring)
3-0-9 units
Can be repeated for credit.

Provides an introduction to modern applied statistics. Topics include likelihood-based methods for estimation, confidence intervals, and hypothesis-testing bootstrapping time series modeling linear models nonparametric regression and model selection. Organized around examples drawn from the recent literature.

7.411 Seminars in Biological Oceanography

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

Selected topics in biological oceanography.

7.421 Problems in Biological Oceanography

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

Advanced problems in biological oceanography with assigned reading and consultation.

Information: M. Neubert (WHOI)

7.430 Topics in Quantitative Marine Science

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussions on quantitative marine ecology. Topics vary from year to year.

7.431 Topics in Marine Ecology

Prereq: Permission of instructor
G (Fall)
2-0-4 units
Can be repeated for credit.

Lectures and discussions on ecological principles and processes in marine populations, communities, and ecosystems. Topics vary from year to year.

7.432 Topics in Marine Physiology and Biochemistry

Prereq: Permission of instructor
G (Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussions on physiological and biochemical processes in marine organisms. Topics vary from year to year.

7.433 Topics in Biological Oceanography

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussions on biological oceanography. Topics vary from year to year.

7.434 Topics in Zooplankton Biology

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussions on the biology of marine zooplankton. Topics vary from year to year.

7.435 Topics in Benthic Biology

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussions on the biology of marine benthos. Topics vary from year to year.

7.436 Topics in Phytoplankton Biology

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussion on the biology of marine phytoplankton. Topics vary from year to year.

7.437 Topics in Molecular Biological Oceanography

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussion on molecular biological oceanography. Topics vary from year to year.

7.438 Topics in the Behavior of Marine Animals

Prereq: Permission of instructor
G (Fall, Spring)
2-0-4 units
Can be repeated for credit.

Lectures and discussion on the behavioral biology of marine animals. Topics vary from year to year.

7.439 Topics in Marine Microbiology

Prereq: Permission of instructor
G (Fall)
2-0-4 units
Can be repeated for credit.

Lectures and discussion on the biology of marine prokaryotes. Topics vary from year to year.

7.440 An Introduction to Mathematical Ecology

Prereq: Calculus I (GIR), 1.018[J], or permission of instructor
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
3-0-9 units

Covers the basic models of population growth, demography, population interaction (competition, predation, mutualism), food webs, harvesting, and infectious disease, and the mathematical tools required for their analysis. Because these tools are also basic to the analysis of models in biochemistry, physiology, and behavior, subject also broadly relevant to students whose interests are not limited to ecological problems.

7.470 Biological Oceanography

Prereq: Permission of instructor
G (Spring)
3-0-9 units

Intended for students with advanced training in biology. Intensive overview of biological oceanography. Major paradigms discussed, and dependence of biological processes in the ocean on physical and chemical aspects of the environment examined. Surveys the diversity of marine habitats, major groups of taxa inhabiting those habitats, and the general biology of the various taxa: the production and consumption of organic material in the ocean, as well as factors controlling those processes. Species diversity, structure of marine food webs, and the flow of energy within different marine habitats are detailed and contrasted.

7.491 Research in Biological Oceanography

Prereq: Permission of instructor
G (Fall, Spring, Summer)
Units arranged [P/D/F]
Can be repeated for credit.

Directed research in biological oceanography not leading to graduate thesis and initiated prior to the qualifying exam.

Microbiology (MICRO)

7.492[J] Methods and Problems in Microbiology

Same subject as 1.86[J], 20.445[J]
Prereq: None
G (Fall)
3-0-9 units

Students will read and discuss primary literature covering key areas of microbial research with emphasis on methods and approaches used to understand and manipulate microbes. Preference to first-year Microbiology and Biology students.

7.493[J] Microbial Genetics and Evolution

Same subject as 1.87[J], 12.493[J], 20.446[J]
Prereq: 7.03, 7.05, or permission of instructor
G (Fall)
4-0-8 units

Covers aspects of microbial genetic and genomic analyses, central dogma, horizontal gene transfer, and evolution.

A. D. Grossman, O. Cordero

7.494 Research Problems in Microbiology

Prereq: Permission of instructor
G (Fall, Spring, Summer)
Units arranged [P/D/F]
Can be repeated for credit.

Directed research in the fields of microbial science and engineering.

7.498 Teaching Experience in Microbiology

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

For qualified graduate students in the Microbiology graduate program interested in teaching. Classroom or laboratory teaching under the supervision of a faculty member.

7.499 Research Rotations in Microbiology

Prereq: None. Coreq: 7.492[J] or 7.493[J] permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

Introduces students to faculty participating in the interdepartmental Microbiology graduate program through a series of three lab rotations, which provide broad exposure to microbiology research at MIT. Students select a lab for thesis research by the end of their first year. Given the interdisciplinary nature of the program and the many research programs available, students may be able to work jointly with more than one research supervisor. Limited to students in the Microbiology graduate program.

7.MTHG Microbiology Graduate Thesis

Prereq: Permission of instructor
G (Fall, IAP, Spring, Summer)
Units arranged
Can be repeated for credit.

Program of research leading to the writing of a PhD thesis. To be arranged by the student and the appropriate MIT faculty member.

Biology

7.50 Method and Logic in Molecular Biology

Prereq: None. Coreq: 7.51 and 7.52 or permission of instructor
G (Fall)
4-0-8 units

Logic, experimental design and methods in biology, using discussions of the primary literature to discern the principles of biological investigation in making discoveries and testing hypotheses. In collaboration with faculty, students also apply those principles to generate a potential research project, presented in both written and oral form. Limited to Course 7 graduate students.

I. Cheeseman, M. Hemann, J. Lees, D. Sabatini, F. Solomon, S. Vos

7.51 Principles of Biochemical Analysis

Prereq: Permission of instructor
G (Fall)
6-0-6 units

Principles of biochemistry, emphasizing structure, equilibrium studies, kinetics, informatics, single-molecule studies, and experimental design. Topics include macromolecular binding and specificity, protein folding and unfolding, allosteric systems, transcription factors, kinases, membrane channels and transporters, and molecular machines.

7.52 Genetics for Graduate Students

Prereq: Permission of instructor
G (Fall)
4-0-8 units

Principles and approaches of genetic analysis, including Mendelian inheritance and prokaryotic genetics, yeast genetics, developmental genetics, neurogenetics, and human genetics.

H. R. Horvitz, C. Kaiser, E. Lander

7.540[J] Frontiers in Chemical Biology

Same subject as 5.54[J], 20.554[J]
Prereq: 5.07[J], 5.13, 7.06, and permission of instructor
G (Fall)
3-0-9 units

See description under subject 5.54[J].

L. Kiessling, M. Shoulders

7.546[J] Science and Business of Biotechnology

Same subject as 15.480[J], 20.586[J]
Prereq: None. Coreq: 15.401 permission of instructor
G (Spring)
3-0-6 units

See description under subject 15.480[J].

7.548[J] Advances in Biomanufacturing

Same subject as 10.53[J]
Subject meets with 7.458[J], 10.03[J]
Prereq: None
G (Spring second half of term)
1-0-2 units

Seminar examines how biopharmaceuticals, an increasingly important class of pharmaceuticals, are manufactured. Topics range from fundamental bioprocesses to new technologies to the economics of biomanufacturing. Also covers the impact of globalization on regulation and quality approaches as well as supply chain integrity. Students taking graduate version complete additional assignments.

J. C. Love, A. Sinskey, S. Springs

7.549[J] Case Studies and Strategies in Drug Discovery and Development

Same subject as 15.137[J], 20.486[J], HST.916[J]
Prereq: None
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
2-0-4 units

See description under subject 20.486[J].

7.55 Case Studies in Modern Experimental Design

Prereq: Permission of instructor
G (Spring)
2-0-7 units

Focuses on enhancing students' ability to analyze, design and present experiments, emphasizing modern techniques. Class discussions begin with papers that developed or utilized contemporary approaches (e.g., quantitative microscopy, biophysical and molecular genetic methods) to address important problems in biology. Each student prepares one specific aim of a standard research proposal for a project that emphasizes research strategy, experimental design, and writing.

L. Guarente, S. Spranger

7.571 Quantitative Analysis of Biological Data (New)

Prereq: None
G (Spring first half of term)
2-0-4 units

Application of probability theory and statistical methods to analyze biological data. Topics include: descriptive and inferential statistics, an introduction to Bayesian statistics, design of quantitative experiments, and methods to analyze high-dimensional datasets. A <em>conceptual</em> understanding of topics is emphasized, and methods are illustrated using the Python programming language. Although a basic understanding of Python is encouraged, no programming experience is required. Students taking the graduate version are expected to explore the subject in greater depth.

7.572 Quantitative Measurements and Modeling of Biological Systems (New)

Prereq: None
G (Spring second half of term)
2-0-4 units

Quantitative experimental design, data analysis, and modeling for biological systems. Topics include absolute/relative quantification, noise and reproducibility, regression and correlation, and modeling of population growth, gene expression, cellular dynamics, feedback regulation, oscillation. Students taking the graduate version are expected to explore the subject in greater depth.

7.573 Modern Biostatistics (New)

Subject meets with 7.093
Prereq: 7.03 and 7.05
G (Spring first half of term)
2-0-4 units

Provides an introduction to probability and statistics used in modern biology. Discrete and continuous probability distributions, statistical modeling, hypothesis testing, Bayesian statistics, independence, conditional probability, Markov chains, methods for data visualization, clustering, principal components analysis, nonparametric methods, Monte Carlo simulations, false discovery rate. Applications to DNA, RNA, and protein sequence analysis genetics genomics. Homework involves the R programming language, but prior programming experience is not required. Students registered for the graduate version complete an additional project, applying biostatistical methods to data from their research.

7.574 Modern Computational Biology (New)

Subject meets with 7.094
Prereq: 7.03 and 7.05
G (Spring second half of term)
2-0-4 units

Introduces modern methods in computational biology, focusing on DNA/RNA/protein sequence analysis. Topics include next-generation DNA sequencing and sequencing data analysis, RNA-seq (bulk and single-cell), ribosome profiling, and proteomics. Students registered for the graduate version complete an additional project, applying bioinformatic methods to data from their research.

7.58 Molecular Biology

Subject meets with 7.28
Prereq: 7.03, 7.05, and permission of instructor
G (Spring)
5-0-7 units

Detailed analysis of the biochemical mechanisms that control the maintenance, expression, and evolution of prokaryotic and eukaryotic genomes. Topics covered in lecture and readings of relevant literature include: gene regulation, DNA replication, genetic recombination, and mRNA translation. Logic of experimental design and data analysis emphasized. Presentations include both lectures and group discussions of representative papers from the literature. Students taking the graduate version are expected to explore the subject in greater depth.

7.59[J] Teaching College-Level Science and Engineering

Same subject as 1.95[J], 5.95[J], 8.395[J], 18.094[J]
Subject meets with 2.978
Prereq: None
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Fall)
2-0-2 units

See description under subject 5.95[J].

7.60 Cell Biology: Structure and Functions of the Nucleus

Prereq: 7.06 or permission of instructor
G (Spring)
3-0-9 units

Eukaryotic genome structure, function, and expression, processing of RNA, and regulation of the cell cycle. Emphasis on the techniques and logic used to address important problems in nuclear cell biology. Lectures on broad topic areas in nuclear cell biology and discussions on representative recent papers.

7.61[J] Eukaryotic Cell Biology: Principles and Practice

Same subject as 20.561[J]
Prereq: Permission of instructor
G (Fall)
4-0-8 units

Emphasizes methods and logic used to analyze structure and function of eukaryotic cells in diverse systems (e.g., yeast, fly, worm, mouse, human development, stem cells, neurons). Combines lectures and in-depth roundtable discussions of literature readings with the active participation of faculty experts. Focuses on membranes (structure, function, traffic), organelles, the cell surface, signal transduction, cytoskeleton, cell motility and extracellular matrix. Ranges from basic studies to applications to human disease, while stressing critical analysis of experimental approaches. Enrollment limited.

7.62 Microbial Physiology

Subject meets with 7.21
Prereq: 7.03, 7.05, and permission of instructor
G (Fall)
4-0-8 units

Biochemical properties of bacteria and other microorganisms that enable them to grow under a variety of conditions. Interaction between bacteria and bacteriophages. Genetic and metabolic regulation of enzyme action and enzyme formation. Structure and function of components of the bacterial cell envelope. Protein secretion with a special emphasis on its various roles in pathogenesis. Additional topics include bioenergetics, symbiosis, quorum sensing, global responses to DNA damage, and biofilms. Students taking the graduate version are expected to explore the subject in greater depth.

G. C. Walker, A. J. Sinskey

7.63[J] Immunology

Same subject as 20.630[J]
Subject meets with 7.23[J], 20.230[J]
Prereq: 7.06 and permission of instructor
G (Spring)
5-0-7 units

Comprehensive survey of molecular, genetic, and cellular aspects of the immune system. Topics include innate and adaptive immunity cells and organs of the immune system hematopoiesis immunoglobulin, T cell receptor, and major histocompatibility complex (MHC) proteins and genes development and functions of B and T lymphocytes immune responses to infections and tumors hypersensitivity, autoimmunity, and immunodeficiencies. Particular attention to the development and function of the immune system as a whole, as studied by modern methods and techniques. Students taking graduate version explore the subject in greater depth, including study of recent primary literature.

S. Spranger, M. Birnbaum

7.64 Molecular Mechanisms, Pathology and Therapy of Human Neuromuscular Disorders

Prereq: Permission of instructor
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
3-0-9 units

Investigates the molecular and clinical basis of central nervous system and neuromuscular disorders with particular emphasis on strategies for therapeutic intervention. Considers the in-depth analysis of clinical features, pathological mechanisms, and responses to current therapeutic interventions. Covers neurodegenerative diseases, such as Huntington's disease, Parkinson's disease, Alzheimer's disease, Amyotropic Lateral Schlerosis, Frontal Temporal Dementia, and neuromuscular disorders, such as Myotonic Dystrophy, Facio Scapular Humoral Dystrophy, and Duchenne Muscular Dystrophy.

7.65[J] Molecular and Cellular Neuroscience Core I

Same subject as 9.015[J]
Prereq: None
G (Fall)
3-0-9 units

See description under subject 9.015[J].

7.66 Molecular Basis of Infectious Disease

Subject meets with 7.26
Prereq: 7.06 and permission of instructor
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
4-0-8 units

Focuses on the principles of host-pathogen interactions with an emphasis on infectious diseases of humans. Presents key concepts of pathogenesis through the study of various human pathogens. Includes critical analysis and discussion of assigned readings. Students taking the graduate version are expected to explore the subject in greater depth.

7.68[J] Molecular and Cellular Neuroscience Core II

Same subject as 9.013[J]
Prereq: Permission of instructor
G (Spring)
3-0-9 units

See description under subject 9.013[J].

7.69[J] Developmental Neurobiology

Same subject as 9.181[J]
Subject meets with 7.49[J], 9.18[J]
Prereq: 9.011 or permission of instructor
G (Spring)
3-0-9 units

See description under subject 9.181[J].

7.70 Regulation of Gene Expression

Prereq: Permission of instructor
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
4-0-8 units

Seminar examines basic principles of biological regulation of gene expression. Focuses on examples that underpin these principles, as well as those that challenge certain long-held views. Topics covered may include the role of transcription factors, enhancers, DNA modifications, non-coding RNAs, and chromatin structure in the regulation of gene expression and mechanisms for epigenetic inheritance of transcriptional states. Limited to 40.

7.71 Biophysical Technique

Subject meets with 5.78
Prereq: 5.13, 5.60, (5.07[J] or 7.05), and permission of instructor
G (Spring)
5-0-7 units

Introduces students to modern biophysical methods to study biological systems from atomic, to molecular and cellular scales. Includes an in-depth discussion on the techniques that cover the full resolution range, including X-ray crystallography, electron-, and light microscopy. Discusses other common biophysical techniques for macromolecular characterizations. Lectures cover theoretical principles behind the techniques, and students are given practical laboratory exercises using instrumentation available at MIT. Meets with 5.78 when offered concurrently.

7.72 Stem Cells, Regeneration, and Development

Prereq: Permission of instructor
G (Spring)
4-0-8 units

Topics include diverse stem cells, such as muscle, intestine, skin, hair and hematopoietic stem cells, as well as pluripotent stem cells. Topics address cell polarity and cell fate positional information and patterning of development and regeneration limb, heart and whole body regeneration stem cell renewal progenitor cells in development responses to wounding and applications of stem cells in development of therapies. Discussions of papers supplement lectures.

7.73 Principles of Chemical Biology

Prereq: 7.05 and permission of instructor
G (Spring)
Not offered regularly consult department
3-0-9 units

Spanning the fields of biology, chemistry and engineering, class addresses the principles of chemical biology and its application of chemical and physical methods and reagents to the study and manipulation of biological systems. Topics include bioorthogonal reactions and activity-based protein profiling, small molecule inhibitors and chemical genetics, fluorescent probes for biological studies, and unnatural amino acid mutagenesis. Also covers chemical biology approaches for studying dynamic post-translational modification reactions, natural product biosynthesis and mutasynthesis, and high-throughput drug screening. Students taking the graduate version are expected to explore the subject in greater depth.

7.74[J] Topics in Biophysics and Physical Biology

Same subject as 8.590[J], 20.416[J]
Prereq: None
G (Fall)
Not offered regularly consult department
2-0-4 units

See description under subject 20.416[J].

I. Cisse, N. Fakhri, M. Guo

7.76 Topics in Macromolecular Structure and Function

Prereq: Permission of instructor
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
3-0-6 units

In-depth analysis and discussion of classic and current literature, with an emphasis on the structure, function, and mechanisms of proteins and other biological macromolecules.

7.77 Nucleic Acids, Structure, Function, Evolution and Their Interactions with Proteins

Prereq: 7.05, 7.51, or permission of instructor
G (Spring)
3-0-9 units

Surveys primary literature, focusing on biochemical, biophysical, genetic, and combinatorial approaches for understanding nucleic acids. Topics include the general properties, functions, and structural motifs of DNA and RNA RNAs as catalysts and as regulators of gene expression RNA editing and surveillance, and the interaction of nucleic acids with proteins, such as zinc-finger proteins, modification enzymes, aminoacyl-tRNA synthetases and other proteins of the translational machinery. Includes some lectures but is mostly analysis and discussion of current literature in the context of student presentations.

D. Bartel, U. RajBhandary

7.80 Fundamentals of Chemical Biology

Subject meets with 5.08[J], 7.08[J]
Prereq: 5.13 and (5.07[J] or 7.05)
G (Spring)
4-0-8 units

Spanning the fields of biology, chemistry, and engineering, this class introduces students to the principles of chemical biology and the application of chemical and physical methods and reagents to the study and manipulation of biological systems. Topics include nucleic acid structure, recognition, and manipulation protein folding and stability, and proteostasis bioorthogonal reactions and activity-based protein profiling chemical genetics and small-molecule inhibitor screening fluorescent probes for biological analysis and imaging and unnatural amino acid mutagenesis. The class will also discuss the logic of dynamic post-translational modification reactions with an emphasis on chemical biology approaches for studying complex processes including glycosylation, phosphorylation, and lipidation. Students taking the graduate version are expected to explore the subject in greater depth.

B. Imperiali, L. Kiessling, R. Raines

7.81[J] Systems Biology

Same subject as 8.591[J]
Subject meets with 7.32
Prereq: (18.03 and 18.05) or permission of instructor
G (Fall)
3-0-9 units

See description under subject 8.591[J].

7.82 Topics of Mammalian Development and Genetics

Prereq: Permission of instructor
Acad Year 2020-2021: Not offered
Acad Year 2021-2022: G (Spring)
3-0-9 units

Seminar covering embryologic, molecular, and genetic approaches to development in mice and humans. Topics include preimplantation development gastrulation embryonic stem cells, gene targeting and nuclear reprogramming of somatic cells genomic imprinting X-inactivation sex determination and germ cells.

7.85 The Hallmarks of Cancer

Subject meets with 7.45
Prereq: None. Coreq: 7.06 permission of instructor
G (Fall)
4-0-8 units

Provides a comprehensive introduction to the fundamentals of cancer biology and cancer treatment. Topics include cancer genetics, genomics, and epigenetics familial cancer syndromes signal transduction, cell cycle control, and apoptosis cancer metabolism stem cells and cancer metastasis cancer immunology and immunotherapy conventional and molecularly-targeted therapies and early detection and prevention. Students taking graduate version complete additional assignments.

T. Jacks, M. Vander Heiden

7.86 Building with Cells

Subject meets with 7.46
Prereq: 7.03 and 7.05
G (Fall)
4-0-8 units

Focuses on fundamental principles of developmental biology by which cells build organs and organisms. Analyzes the pivotal role of stem cells in tissue maintenance or repair, and in treatment of disease. Explores how to integrate this knowledge with engineering tools to construct functional tissue structures. Students taking graduate version complete additional assignments.

7.89[J] Topics in Computational and Systems Biology

Same subject as CSB.100[J]
Prereq: Permission of instructor
G (Fall)
2-0-10 units

See description under subject CSB.100[J]. Preference to first-year CSB PhD students.

7.930[J] Research Experience in Biopharma

Same subject as 20.930[J]
Prereq: None
G (Spring)
2-10-0 units

See description under subject 20.930[J].

7.931 Independent Study in Biology

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

Program of study or research to be arranged with a department faculty member.

7.932 Independent Study in Biology

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged
Can be repeated for credit.

Program of study or research to be arranged with a department faculty member.

7.933 Research Rotations in Biology

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

Introduces students to faculty participating in the Biology graduate program through a series of lab rotations, which provide broad exposure to biology research at MIT. Students select a lab for thesis research by the end of their first year. Limited to students in the Biology graduate program.

7.934 Teaching Experience in Biology

Prereq: Permission of instructor
G (Fall, Spring)
Units arranged [P/D/F]
Can be repeated for credit.

For qualified graduate students in the Biology graduate program interested in teaching. Classroom or laboratory teaching under the supervision of a faculty member.

7.935 Responsible Conduct in Biology

Prereq: Permission of instructor
G (Fall)
Units arranged [P/D/F]

Sessions focus on the responsible conduct of science. Considers recordkeeping and reporting roles of mentor and mentee authorship, review, and confidentiality resolving conflicts misfeasance and malfeasance collaborations, competing interests, and intellectual property and proper practices in the use of animal and human subjects. Limited to second-year graduate students in Biology.

7.936 Professional Development in Biology

Prereq: None
G (Fall, Spring)
0-2-0 units

Required for course 7 doctoral students to gain professional perspective in career development activities such as internships, scientific meetings, and career and networking events. Written report required upon completion of activities.

7.941 Research Problems

Prereq: Permission of instructor
G (Fall, Summer)
Units arranged [P/D/F]
Can be repeated for credit.

Directed research in a field of biological science, but not contributory to graduate thesis.

Consult Biology Education Office

7.942 Research Problems

Prereq: Permission of instructor
G (Spring)
Units arranged [P/D/F]
Can be repeated for credit.

Directed research in a field of biological science, but not contributory to graduate thesis.

Consult Biology Education Office

7.95 Cancer Biology

Prereq: 7.85 and permission of instructor
G (Spring)
3-0-9 units

Advanced seminar involving intensive analysis of historical and current developments in cancer biology. Topics address principles of apoptosis, principles of cancer biology, cancer genetics, cancer cell metabolism, tumor immunology, and therapy. Detailed analysis of research literature, including important reports published in recent years. Enrollment limited.

7.98[J] Neural Plasticity in Learning and Memory

Same subject as 9.301[J]
Prereq: Permission of instructor
G (Spring)
3-0-6 units

See description under subject 9.301[J]. Juniors and seniors require instructor's permission.

7.S930 Special Subject in Biology

Prereq: Permission of instructor
G (Fall, Spring, Summer)
Units arranged [P/D/F]
Can be repeated for credit.

Covers material in various fields of biology not offered by the regular subjects of instruction.

7.S931 Special Subject in Biology

Prereq: Permission of instructor
G (Fall, Spring, Summer)
Units arranged [P/D/F]
Can be repeated for credit.

Covers material in various fields of biology not offered by the regular subjects of instruction.

7.S932 Special Subject in Biology

Prereq: Permission of instructor
G (Fall, IAP, Spring)
Not offered regularly consult department
Units arranged [P/D/F]
Can be repeated for credit.

Covers material in various fields of biology not offered by the regular subjects of instruction.

7.S939 Special Subject in Biology

Prereq: Permission of instructor
G (Fall, IAP, Spring)
Not offered regularly consult department
Units arranged
Can be repeated for credit.

Covers material in various fields of biology not offered by the regular subjects of instruction.

7.THG Graduate Biology Thesis

Prereq: Permission of instructor
G (Fall, IAP, Spring, Summer)
Units arranged
Can be repeated for credit.


Materials and methods

Strains, media and growth conditions

S. cerevisiae strain BY4741 (MAT a, his3Δ, leu2Δ0, met15Δ0, uraΔ0) was used and grown in synthetic complete medium at 30°C. Cells were grown to a density of 1 × 10 7 cells per ml. Cultures were split into two NaAsO2 (100 μM and 1 mM in two biological repeats) was added to one culture, and both were incubated at 30°C for 0.5, 2 or 4 h. Cells were pelleted and washed in distilled water before RNA extraction. Deletion strains (yap1Δ, cad1Δ, arr1Δ and rpn4Δ) of the same background were obtained from Research Genetics, confirmed and treated the same way, for 2 h and 100 μM NaAsO2.

RNA extraction

For the cDNA hybridization experiments, total RNA was isolated using an acid-phenol method. Pellets were resuspended in 4 ml lysis buffer (10 mM Tris-HCL pH 7.5, 10 mM EDTA, 0.5% SDS). Four milliliters of acid (water-saturated, low pH) phenol was added followed by vortexing. The lysing cell solutions were incubated at 65°C for 1 h with occasional vigorous vortexing and then placed on ice for 10 min before centrifuging at 4°C for 10 min. The aqueous layers were re-extracted with phenol (room temperature, no incubation) and extracted once with chloroform. Sodium acetate was then added to 0.3 M with 2 volumes of absolute ethanol, placed at -20°C for 30 min, and then spun. Pellets were washed two or three times with 70% ethanol followed by Qiagen Poly(A) + RNA purification with the Oligotex oligo (dT) selection step. Total RNA for the specific knockout strains and parent experiment was isolated by enzymatic reaction, following the RNeasy yeast protocol (Qiagen).

Microarray hybridizations and analyses

A cDNA yeast chip, developed in-house at National Institute of Environmental Health Sciences (NIEHS), was used for gene-expression profiling experiments. A complete listing of the ORFs on this chip is available at [70]. cDNA microarray chips were prepared as previously described [71, 72]. The cDNA was spotted as described [73]. Each poly(A) RNA sample (2 μg) was labeled with Cy3- or Cy5-conjugated dUTP (Amersham) by a reverse transcription reaction using the reverse transcriptase SuperScript (Invitrogen), and the primer oligo(dT) (Amersham). The hybridizations and analysis were performed as described Hewitt et al. [74] except that genes having normalized ratio intensity values outside of a 95% confidence interval were considered significantly differentially expressed. Lists of differentially expressed genes were deposited into the NIEHS MAPS database [75]. Genes that were differentially expressed in at least three of the four replicate experiments were compiled and subsequently clustered using the Cluster/Treeview software [76]. GeneSpring (Silicon Genetics) and Cytoscape [28] were used to further analyze and visualize the data.

The knockout experiments were conducted on an Agilent yeast oligo array platform. Samples of 10 μg total RNA were labeled using the Agilent fluorescent direct label kit protocol and hybridizations were performed for 16 h in a rotating hybridization oven using the Agilent 60-mer oligo microarray-processing protocol. Slides were washed as indicated and scanned with an Agilent scanner. Data was gathered using the Agilent feature extraction software, using defaults for all parameters, save the ratio terms. To account for the use of the direct label protocol, error terms were changed to: Cy5 multiplicative error = 0.15 Cy3 multiplicative error = 0.25 Cy5 additive error = 20 Cy3 additive error = 20.

GEML files and images were exported from the Agilent feature extraction software and deposited into Rosetta Resolver (version 3.2, build 3.2.2.0.33) (Rosetta Biosoftware). Two arrays for each sample pair, including a fluor reversal, were combined into ratio experiments in Rosetta Resolver. Intensity plots were generated for each ratio experiment and genes were considered 'signature genes' if the p-value was less than 0.001. p-values were calculated using the Rosetta Resolver error model with Agilent error terms. The signature genes were analyzed with GeneSpring. The entire in-house and Agilent-based dataset is available in the Additional data files.

Ontology enrichment

Genes have previously been categorized into various ontologies and pathways. If a particular pathway is enriched for genes that are significantly expressed in response to a process, we conclude that the pathway is likely to be involved in this process. In total, 829 genes out of 6,240 had a significant alteration in expression in at least one experimental condition. Along with the size of each functional category, a statistical measure for the significance of the enrichment was calculated by using a hypergeometric test. The level of significance for this test was determined using the Bonferroni correction, where the α value was set at 0.05 and the number of tests conducted for KEGG pathway and Simplified Gene Ontology (biological process) were 27 and 11, respectively.

Network searches

The ActiveModules algorithm was used to identify neighborhoods in the regulatory network corresponding to significant levels of differential expression. In this search, if a protein has many neighbors, it is likely that at random a few will show significant changes in expression and these could be selected as a significant sub-network. Neighborhood scoring is a method we used to correct for this bias. In this scheme, a significant sub-network must contain either all or none of the neighbors of each protein. The significance then represents an aggregate over all neighbors of a protein. This prevents the biased selection of a few top-scoring proteins out of a large neighborhood in the search for significant sub-networks. For an in-depth description of this algorithm see Ideker et al. [1].

In defining the network used in the metabolic analysis, edges corresponding to metabolites linking more than 175 reactions were eliminated. This excludes metabolic cofactors such as ATP, NADH and H2O from the search. Scores for each ORF were generated by mapping the fitness significance value to a Z-score. To assign scores to the individual reactions, Förster's mapping from ORF to reaction was used to generate a list of ORFs for each reaction. The Z-scores of these ORFs were then aggregated into a single score for that reaction using the following equation:

We used a dynamic programming algorithm adapted from Kelley et al. [77] to identify high-scoring paths in this network. Briefly, the highest-scoring path of length (n) ending at each node is determined by combining the scores of the individual node and the highest-scoring path of length (n - 1) ending at a neighbor node using the following formula:

Since a node with many neighbors is more likely to belong to a high-scoring path by random chance, the score of the neighboring path is corrected against the extreme-value statistic with the number of observations equal to the number of neighbors.

The significances of the top-scoring networks were determined by comparison to a distribution of the top-scoring networks from random data (reaction scores randomized with respect to the nodes of the network). After running the path finding/scoring algorithm, the score of the single highest-scoring path was added to the null distribution. This process was repeated for 10,000 interactions. This null distribution was then used to determine an empirical p-value, which represents the null hypothesis that there is no significant correlation between the topology of the metabolic network and the assignment of significance values to nodes in that network.

Specific deletion experiment filter on fold-change comparisons

The intensity plots were generated from each experiment in Rosetta Resolver. A gene was considered a signature gene if the p-value was less than 0.001 and if the fold-change value was greater than or equal to twofold. Signature genes were then broadcasted on the intensity plot and exported as text files. Lists were imported into GeneSpring. The 'Filter on Fold Change' function was used to compare the parent control vs. parent AsIII experiment with each deletion (AsIII) experiment. The gene list selected for each filter on fold change analysis was a combination of the parent signature gene list and the signature gene list of the AsIII-treated deletion being analyzed at the time. For example, if the comparison was being done between parent (AsIII-treated) and Yap1 (AsIII-treated), the list used in the analysis was the combination of the parent signature genes and the Yap1 signature genes. The filter on fold change function reports genes that were selected from the one condition (parent) that had normalized data values that were greater or less than those in the other condition (deletion under investigation) by a factor of twofold. Each resulting gene list was saved. All the resulting gene lists were combined and an annotated gene list was exported for use in Eisen's Cluster/Treeview package (described earlier). The format of the exported data was the natural log. The gene tree generated for the paper was generated in GeneSpring. Each filter on fold change was saved as an annotated gene list.

Generation of specific deletion experiment 'minus' lists

Signature gene lists were generated in Rosetta Resolver from intensity plots as described above. Each signature gene list was saved as a 'Bioset' in Resolver. The parent Bioset was compared to each deletion Bioset using the 'Minus' function. This function finds those members in Bioset group 1 (parent) that do not exist in Bioset group 2 (deletion). Each of the resulting lists was saved as a new Bioset. The new 'minus' Bioset was broadcasted on its corresponding intensity plot and exported as a text file. This was repeated for each experiment with fine-tuning of the data using GeneSpring.

Phenotypic profiling

Homozygous diploid deletion strains and pooling of the strains were done as described [66]. Aliquots were grown until logarithmic phase, diluted to OD600 0.05-0.1, split into tubes and treated with arsenic for 1-2 h at 1 mM, 2 mM and 5 mM concentrations. Similar responses were observed at each concentration, so the results were pooled. These cultures and a mock-treated sample were maintained in logarithmic phase growth by periodic dilution for 16-18 h. UPTAG and DOWNTAG sequences were separately amplified from genomic DNA of the drug and mock-treated samples by PCR using biotin-labeled primers as described previously [66]. The amplification products were combined and hybridized to Tags3 arrays (Affymetrix). Procedures for PCR amplification, hybridization and scanning were done as described [66], and according to the manufacturer's recommendation when applicable. The images were quantified by using the Affymetrix Microarray Suite software. UPTAG and DOWNTAG values were separately normalized, ratioed (treated sample signal/control) and filtered for intensities above background [78].


Large-scale discovery and characterization of protein regulatory motifs in eukaryotes

The increasing ability to generate large-scale, quantitative proteomic data has brought with it the challenge of analyzing such data to discover the sequence elements that underlie systems-level protein behavior. Here we show that short, linear protein motifs can be efficiently recovered from proteome-scale datasets such as sub-cellular localization, molecular function, half-life, and protein abundance data using an information theoretic approach. Using this approach, we have identified many known protein motifs, such as phosphorylation sites and localization signals, and discovered a large number of candidate elements. We estimate that

80% of these are novel predictions in that they do not match a known motif in both sequence and biological context, suggesting that post-translational regulation of protein behavior is still largely unexplored. These predicted motifs, many of which display preferential association with specific biological pathways and non-random positioning in the linear protein sequence, provide focused hypotheses for experimental validation.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Introduction

In eukaryotic evolution, the relative importance of horizontal gene transfer (HGT) compared with other sources of genetic novelty (i.e., gene duplication, modification, and de novo origination) is an unsettled topic. This is in contrast to Bacteria and Archaea, among which HGT is established as a major driver of genetic innovation ( Ochman et al. 2000 Pál et al. 2005 Treangen and Rocha 2011). Although the impact of HGT on eukaryotic evolution remains poorly characterized, HGT has been implicated in the exploitation of new niches by several microbial groups, including apicomplexans ( Striepen et al. 2004), ciliates ( Ricard et al. 2006), diplomonads ( Andersson et al. 2003), and fungi ( Slot and Hibbett 2007). Thus, HGT may be a significant driver of gene innovation in at least some eukaryotic lineages ( Keeling and Palmer 2008 Andersson 2009). However, poor resolution at the base of the eukaryotic tree of life as well as the dearth of next-generation sequence data from microbial eukaryotes complicates the interpretation of gene phylogenies otherwise suggestive of HGT (reviewed in Stiller 2011 see, e.g., Chan et al. 2012 Curtis et al. 2012 Deschamps and Moreira 2012).

Although challenges remain in measuring and interpreting HGT in eukaryotes, gene transfer during the evolution of mitochondria and plastids (i.e., endosymbiosis) is a well-established mechanism for gene transfer, a process referred to endosymbiotic gene transfer (EGT). The primary plastid of the Archaeplastida (i.e., red, green, and glaucophyte algae, Adl et al. 2005) arose through an endosymbiotic association between cyanobacteria and a heterotrophic, eukaryotic host ( Douglas 1998). The resulting photosynthetic organelle, the plastid, maintains a reduced genome of ≤200 genes, but the majority of genes required for photosynthesis have been transferred to the nuclear genome of the algal host ( Martin and Herrmann 1998). Many of these transferred genes are targeted back to the plastid, but some have been co-opted to function in novel processes and thereby increase the genetic potential of the host genome ( Martin and Herrmann 1998). Plastid endosymbioses involving eukaryotic algal endosymbionts, rather than cyanobacteria, further distributed plastids, and the genes necessary to maintain them, across the eukaryotic tree of life ( Archibald 2009).

Another possible mode of HGT in microbial eukaryotes is the acquisition of genetic material during prey ingestion ( Doolittle 1998), which is supported by accounts of HGT in phagotrophic lineages such as ciliates ( Ricard et al. 2006), euglenids ( Maruyama et al. 2011), and amoebea ( Eichinger et al. 2005). Further support for the prey ingestion model comes from algae (a nonmonophyletic group of photosynthetic, plastid-containing eukaryotes). Although primary plastid-containing organisms are strict autotrophs, with few exceptions, many algae with plastids derived via additional endosymbioses (e.g., euglenids and dinoflagellates) are mixotrophs that supplement photosynthesis with consumption of food particles ( Stoecker 1998). In the case of these mixotrophic algae, the prey ingestion hypothesis predicts that these organisms will have genes acquired both from their plastid and from their prey ( Doolittle 1998).

The dinoflagellates are protists (i.e., microbial eukaryotes) common in many aquatic environments and are ideal organisms for investigating the impact of HGT on eukaryotic evolution. Many dinoflagellate species are mixotrophs, having the ability to obtain carbon from photosynthesis as well as from ingestion of other phytoplankton and bacteria ( Hackett, Anderson et al. 2004). In support of the prey ingestion model, previous work suggests that dinoflagellate nuclear genomes contain a large number of genes acquired via plastid endosymbiosis as well as genes horizontally acquired from other sources ( Hackett et al. 2005, 2013 Nosenko et al. 2006 Nosenko and Bhattacharya 2007 Janouskovec et al. 2010 Minge et al. 2010 Wisecaver and Hackett 2010 Stuken et al. 2011 Chan et al. 2012 Orr et al. 2013). The placement of dinoflagellates is resolved and well supported in phylogenetic analyses, which is essential for inferring HGT based on phylogenetic incongruence between gene and species trees. Dinoflagellates are sister to the Perkinsidae, a parasitic group that includes the oyster pathogen Perkinsus marinus ( Reece et al. 1997). Dinoflagellates and Perkinsidae together are sister to the apicomplexans, an exclusively parasitic group responsible for many human diseases including malaria and toxoplasmosis ( Fast et al. 2002). The nearest major protist group to dinoflagellates and apicomplexans are the ciliates, and these three lineages together are the primary members of the superphylum Alveolata ( Adl et al. 2005). Phylogenetic studies suggest that alveolates are related to stramenopiles (e.g., diatoms and giant kelp) and rhizarians (e.g., foraminifera and radiolarians) an association abbreviated as the stramenopile–alveolate–rhizaria (SAR) supergroup ( Burki et al. 2007 Parfrey et al. 2010). However, despite this phylogenetic resolution as well as published cases of gene transfer, the full extent, timing, and consequences of HGT in dinoflagellates remain unknown because example genomes have not yet been sequenced.

Dinoflagellate genomes range in size from 1.5 to 185 Gbp (0.8 to over 60 times the size of the human genome) and are rife with noncoding sequence, tandem gene repeats, and other unusual features that make genome sequencing with current technology highly impractical with current assembly technology ( Wisecaver and Hackett 2011). Fortunately, transcriptome sequencing is an alternative approach for questions requiring comprehensive gene discovery in nonmodel organisms with complex genomes. Here, we analyze a comprehensive de novo transcriptome assembly for the dinoflagellate, Alexandrium tamarense strain CCMP1598, a member of the “Group IV” clade within the A. tamarense species complex. This species complex comprises five such clade groups each of which likely represents a distinct cryptic species ( Lilly et al. 2007). We cross-reference our A. tamarense Group IV gene set to transcriptomic and expressed sequences tag (EST) data from 21 additional dinoflagellate species (including transcriptome assemblies from A. tamarense Group I and Group III strains) to derive a final dinoflagellate unigene set that we then compare with 16 other algal and protist genomes. Using ancestral gene content reconstruction, we map gene acquisitions on the alveolate evolutionary tree and validate the results using a phylogenomic pipeline. This combined approach offers a robust, comparative exploration of the pattern of HGT in dinoflagellates relative to other eukaryotes.


Discussion

For the example of central and amino-acid metabolism in B. subtilis, we show that fluxome profiling by multivariate statistics from mass isotopomer distribution analysis is meaningful for the discrimination of mutants or conditions on the basis of their metabolic behavior, and applicable to conditions that are inaccessible to previous flux analysis. In sharp contrast to metabolome concentration data [24, 25], fluxome profiles contain functional information on the operation of fully assembled networks [1, 4]. As shown here by ICA, this approach enables us to distill the essential signatures of independent metabolic activities, and supports the identification of the underlying biochemical causality. Because no model or a priori knowledge on the investigated system is required, the metabolic imprints of any tracer atom and molecule can be followed in virtually any biological system, including multicellular organisms in complex multisubstrate media.

Similarly, a priori knowledge of the number of ICs to be computed is not a prerequisite. As a matter of fact, the optimal number depends primarily on the labeling patterns and can hardly be estimated from the dataset dimensions. An underestimate will generally leave some relevant signatures unrecognized, whereas an overestimate will lead to an increased fraction of components reflecting measurement or biological noise. Although statistical significance can be assessed with duplicates, this becomes prohibitive with large datasets (that is, hundreds of mutants or analytes) or reduced availability of replicas. The bottleneck resides in the stochastic approach of most ICA algorithms, for which independent runs result in different ICs or ordering thereof. Instead, algorithmic and statistical reliability of the ICs can be evaluated by repeating the estimation several times either with randomly chosen initial guesses or by slightly varying the dataset (bootstrapping [28]), respectively, and then clustering all results to identify robust ICs [29].

Two factors directly affect the results that can be obtained by comparative fluxome profiling: the detected analytes and the choice of isotopic tracer. As well as polymer-based analytes such as the proteinogenic amino acids monitored here, fluxome profiles can be detected in any set of intra- or extracellular metabolites, thereby widening the observable metabolic processes The choice of tracer depends, to some extent, on the metabolic subsystem of interest. Uniformly labeled substrates provide a more global perspective because they allow assessment of the scrambling of any carbon backbone and, in the case of experiments performed in rich media, also allow quantification of the fraction of de novo biosynthesis from the tracer relative to the uptake of a medium component. Similarly, uniformly deuterated substrates or 2 H2O are valuable for simultaneously capturing a wide number of ICs that are affected by the release, binding and exchange of water or protons. Substrates that are labeled at specific positions, in contrast, enable deeper interrogation of particular sub-networks, for example, [1- 13 C]hexoses for the initial catabolic reactions [8, 19] or [1- 13 C]aspartate to assess urea cycle activity.

The results also revealed new biological information on pathway activity, function or regulation. First, both glycolysis and the pentose phosphate pathway actively catabolized glucose in the presence of CAA, because the pgi and yqjI mutant signatures were different from the wild type and from each other. On sorbitol, in contrast, the same mutants were very similar to the wild type, suggesting that both reactions are only marginally involved in catabolism of this sugar. Second, the Krebs cycle flux was similar on glucose and sorbitol (with and without CAA), as deduced from the similarly pronounced signatures of the sdhC mutant. Third, absence of the sdhC signatures in the Krebs cycle-derived amino acids aspartate and glutamate of the mdh mutant when grown with ammonium (but not CAA) indicates activity of the malic enzyme-based pyruvate bypass [30]. Fourth, activity of the NADP-dependent malic enzyme appears to be independent of catabolite repression because pronounced signatures of the ytsJ mutant were seen on all substrates. The gluconeogenic phosphoenolpyruvate synthetase Pps, in contrast, was inactive in the presence of the repressing glucose but active on pyruvate or sorbitol. Fifth, as discussed above the data reveal a Krebs cycle-promoting effect of the repressor CggR on sorbitol but not on glucose, most likely through the repression of glycolytic genes [22].

The comparative fluxome profiling presented here complements traditional flux analysis because it enables potentially rapid and automated identification of relevant mutants or conditions from large-scale datasets, for example from entire mutant libraries. The approach is quantitative in terms of the relative difference between variants, but qualitative with respect to the in vivo flux. Interesting variants are then subjected to deeper interrogation of the specific metabolic phenomenon identified. Besides mere data mining, fluxome profiling also has the potential to identify complex functional traits in higher cells where current flux methods fail, and possibly even identify the underlying biochemical mechanism of discriminant mass isotope signatures.


Methods

Strains and plasmids

E. coli DH10B was used for cloning. E. coli BLR (DE3) and DH1 were used for expression studies with BglBrick vectors. Plasmids and BglBrick parts used in this study are listed in Table 1. Media were supplemented with 100 μg/mL ampicillin, 35 μg/mL chloramphenicol, or 50 μg/mL kanamycin to select for plasmid maintenance. All strains were grown at 30°C unless described otherwise.

Construction of BglBrick vector parts

The template plasmids or parts for the BglBrick vectors constructed here are listed in Table 1 and the primers for PCR amplification are listed in Table 2. Each gene component has been either PCR amplified from a template using Phusion™ High-Fidelity DNA polymerase (New England BioLabs, F-530) or digested from template plasmids and incorporated into the BglBrick vector plasmid by standard restriction digestion/ligation method.

Replication origins

The p15A origin was obtained from plasmid pZA31-luc, the ColE1 origin from plasmid pZE12-luc, and the pSC101* origin from plasmid pZS*24-MCS1 [39]. A BglII site in the pSC101* origin was eliminated by site-directed mutagenesis. The oligonucleotides used to remove the BglII site in the pSC101* origin were pSC101QC F1 and pSC101QC R1 creating pSC101**. Each origin of replication and terminator sequence module was cloned in using the AvrII and SacI sites. Plasmid pMBIS was used as template for the pBBR1 origin. The BBR1 region was amplified in two parts, and primers were designed to make a C to T point mutation in the overlapping region of the two PCR products to increase the copy number as reported [22]. Forward primer pBBR1 F1 (5'- gatcaCCTAGGctacagccgatagtctggaacagcgc -3') and reverse primer pBBR1 mut R1 (5'- ccggcaccgtgtTggcctacgtggtc -3') were used to generate the first product with a 5'-AvrII site, and forward primer pBBR1 mut F1 (5'- gaccacgtaggccAacacggtgccgg -3') and reverse primer pBBR1 R2 (5'- agatcaACTAGTgcctccggcctgcggcctgcgcgcttcg -3') were used to generate the second product with a 3'- SpeI site. These two parts were then combined in a splice overlap extension-PCR (SOE-PCR) reaction with primers pBBR1 F1 and pBBR1 R2 to create the product containing the entire pBBR1 origin of replication. The PCR product was digested with AvrII and SpeI and ligated with existing intermediate vectors to generate three additional intermediate vectors containing pBBR1 and each antibiotic resistance module.

Antibiotic resistance

All antibiotic resistance segments (SacI to AatII) were digested from the parent plasmids listed in Table 1. The BglBrick restriction site found in Cm and Km resistance gene components were removed by site-specific mutagenesis. The oligonucleotides used to remove the EcoRI site in the Cm resistance gene were the forward CmQC F1 (5'-ctttcattgccatacgAaattccggatgagcattc-3') and reverse CmQC R1 (5'-gaatgctcatccggaattTcgtatggcaatgaaag-3') (point mutation is capitalized). The oligonucleotides used to remove the BglII site in the Km resistance gene promoter were KanQC F1 (5'- cctgtctcttgatcagatcAtgatcccctgc-3') and KanQC R1 (5'- gcaggggatcaTgatctgatcaagagacagg-3').

Rfp (or gfp) and terminator

The rfp-terminator (rfp-term) module was constructed by splice overlap extension-PCR (SOE-PCR [47]. First, SOE-PCR was performed to generate rfp with BglBrick restriction sites EcoRI and BglII and RBS (TTTAAGAAGGAGATATACAT) on the 5'-end, and with BglBrick restriction sites BamHI and XhoI and a double terminator sequence followed by an AatII site on the 3'-end. Two PCRs were performed to amplify rfp and the terminator separately, using primers to introduce the restriction sites, RBS, and overlapping sequence for SOE-PCR. Forward primer RFP F1 and reverse primer RFP R1 were used to generate the product containing EcoRI, BglII, RBS, and rfp. Forward primer Term F1 and reverse primer Term R1 were used to generate the product containing the BamHI, XhoI, the double terminator sequence and AvrII. The products were then combined and a second PCR was performed with the RFP F1 and Term R1. The resulting SOE-PCR product (rfp-term) was in turn used in additional SOE-PCRs to generate complete modules containing the 8 different promoter systems followed by rfp-term.

Promoters and repressors

The primers for each promoter system (containing repressor and promoter) were engineered to include a 5'AatII site for later cloning steps and an rfp overlapping sequence on the 3' end to facilitate the addition of the rfp-terminator module via SOE-PCR. When the promoter system contained any of the 4 BglBrick restriction sites, an additional set of primers to remove the restriction site was prepared for SOE-PCR. Primers for each promoter system are listed in the Table 2.

Final pBb vector assembly

To construct the promoter system with the rfp-terminator module, each of the eight promoter system modules were combined with rfp-terminator by SOE-PCR using the F1 primer from each promoter system construction and the reverse primer Term R1. These eight products were then digested with AatII and AvrII and individually ligated with the AatII and AvrII digested fragment from the intermediate plasmid containing amp R and ColE1. The eleven remaining intermediate plasmids were then digested with AvrII and AatII to isolate the antibiotic resistance-replication origin (AR-ori) modules. In total, each of the twelve AR-ori modules was ligated with each of the eight AvrII and AatII digested promoter-rfp-terminator modules to produce 96 unique pBb vectors.

Data sheet experiments

General

Ampicillin-resistant pBb plasmids were transformed into E. coli BLR(DE3) electrocompetent cells and/or E. coli DH1 electrocompetent cells and plated on LB-agar with 50 μg/ml Carbenicillin (Cb) for overnight incubation at 37°C. A single colony was picked and used to prepare the seed culture in LB broth containing 50 μg/ml Cb. Fresh culture tubes with 3 ml LB broth containing 50 μg/ml Cb were inoculated with 60 μl overnight seed culture and grown at 37°C, 200 rpm until the OD600 reached about 0.55. All experiments were replicated in triplicate.

Inducer dose response

The outer wells of a 96-well clear-bottom plate with lid (Corning no: 3631) were filled with 200 μl sterile water and the plate was sterilized by using the optimal crosslink setting on the UV crosslinker (Spectronics, Corp.). 10 × serial dilutions were made of inducers appropriate for each plasmid being tested and 20 μl was pipetted into each well so that the final volume of 200 μl would give 1x inducer concentration. Each plate included 3 control wells containing pBbE5a-RFP (or GFP) in BLR(DE3) induced with 12.5 μM IPTG. Appropriate volumes of culture and LB/Cb were added to the 96-well plate with lid and grown in a Safire (Tecan) microplate reader at 30°C for 20.5 hours. OD600 and RFP fluorescence were measured every 570 seconds using an excitation wavelength of 584 nm and an emission wavelength of 607 nm. For the constructs containing GFP (pBbB plasmids), an excitation wavelength of 400 nm and an emission wavelength of 510 nm were used for fluorescence measurement.

Strain and medium dependence

E. coli BLR(DE3) and DH1 transformed with pBb plasmid were streaked on LB-agar with 50 μg/ml Cb and grown at 37°C overnight. Seed cultures were prepared in LB broth containing 50 μg/mL Cb inoculated with a single colony and grown at 37°C, 200 rpm overnight. Each experiment with a pBb plasmid-harboring strain was replicated in triplicate, and each set of experiments included 6 control tubes containing pBbE5a-RFP in BLR(DE3) in LB (3 uninduced and 3 induced with 100 μM IPTG). For the M9 minimal medium (MM) experiment, three rounds of adaptation were performed in minimal medium. After adaptation, fresh tubes with 3 mL fresh MM were inoculated with adapted seed culture to OD600 approximately 0.15 and grown at 37°C to OD600 of approximately 0.5. One set of tubes were induced at different inducer concentrations and all cultures were grown at 30°C, 200 rpm for 66 hours post induction. Samples were taken at 18 h, 42 h and 66 h post induction. 25 μL of culture was taken into a 96-well plate and diluted to 200 μL with fresh medium, and OD600 and fluorescence were measured. For LB and TB media experiments, overnight seed cultures were used directly for inoculation without adaptation.

Catabolite repression and inducer crosstalk

Seed cultures were prepared as described in strain and medium dependence experiments. Three different media (MM, phosphate buffered LB, and phosphate buffered TB) containing 1% glucose were used for catabolite repression experiments. Inoculated cultures were grown at 37°C to OD600 of approximately 0.5, and induced to achieve maximum expression (100 μM IPTG, 20 mM arabinose, 400 nM aTc, or 20 mM propionate). Cultures were grown at 30°C, 200 rpm for 66 hours post induction, and OD600 and fluorescence was measured at each sampling. For the inducer crosstalk experiment, LB broth containing 50 μg/ml Cb was inoculated with seed cultures containing E. coli BLR(DE3) harboring the ampicillin-resistant pBb. Cultures were induced at OD600 of approximately 0.5 with the appropriate inducer, and one of the non-cognate inducers was also added to the individually induced culture during induction. Cultures were grown at 30°C, 200 rpm for 18 hours post-induction, and OD600 and fluorescence were measured using the Tecan.

Bacterial DNA isolation to quantify plasmid copy number

E. coli DH1 and BLR were grown overnight at 30°C, 200 rpm shaking after inoculating 5 mL cultures of LB medium (supplemented with 50 μg/mL kanamycin) with single colonies from freshly streaked plates. After sub-culturing (1:50) into shake flasks containing 50 mL of LB medium (supplemented with 50 μg/mL kanamycin), cells were grown at 30°C, 200 rpm shaking until an OD600 of 0.3-0.4 was reached. At this time, 1 mL of cells was spun down and the supernatant subsequently removed. The cell pellets were then frozen. Total DNA was isolated from these pellets for use at a future date. The DNA isolation method reported in previous publications [33, 48] was adopted. Bacterial cell pellets were resuspended in 400 μL of 50 mM Tris/50 mM EDTA, pH 8, by vortexing. Cell membranes were permeablized by the addition of 8 μL of 50 mg/mL lysozyme (Sigma) in 10 mM Tris/1 mM EDTA, pH 8, followed by incubation at 37°C for 30 min. To complete cell lysis, 4 μL of 10% SDS and 8 μL of 20 mg/mL Proteinase K solution (Invitrogen) were added to each tube, mixed with a syringe with 21-gauge, 1.5-inch needle, and incubated at 50°C for 30 min. Proteinase K was subsequently heat inactivated at 75°C for 10 min, and RNA was digested with the addition of 2 μL of 100 mg/mL RNase A solution (Qiagen) followed by incubation at 37°C for 30 min. Total DNA extraction then proceeded by adding 425 μL of 25:24:1 phenol:chloroform:isoamyl alcohol, vortexing vigorously for

1 min, allowing the tubes to sit at room temperature for a few minutes, and then centrifugation for 5 min at 14,000 × g, 4°C. Next, 300 μL of the upper aqueous phase was transferred to a new tube using a wide-opening pipet tip. DNA extraction continued by adding 400 μL of chloroform to each tube, vigorous vortexing for

1 min, allowing the tubes to sit at room temperature for a few minutes, and centrifugation for 5 min at 14,000 × g, 4°C. Next, 200 μL of the upper aqueous phase was transferred to a new tube using a wide-opening pipet tip. Following chloroform extraction, total DNA was ethanol precipitated overnight, washed with 70% ethanol, and finally resuspended in 40 μL of nuclease-free water. DNA concentration and purity were assayed using a Nanodrop spectrophotometer, and integrity examined on 1% agarose gels.

Real-time qPCR quantification of plasmid copy number

Primer sets specific to the neomycin phosphotransferase II (nptII) gene (forward: GCGTTGGCTACCCGTGATAT, reverse: AGGAAGCGGTCAGCCCAT) [49] and 16S rDNA gene (forward: CCGGATTGGAGTCTGCAACT, reverse: GTGGCATTCTGATCCACGATTAC) [33] were used for real-time qPCR. These primers amplified a single product of the expected size as confirmed by the melting temperatures of the amplicons. nptII resides in single-copy on the plasmids characterized in this study, while 16S rDNA gene resides on multiple copies on the E. coli chromosome [36] and was used for normalization [22, 33, 35]. In order to determine plasmid copy number (i.e. number of plasmids per genomic equivalent), E. coli DH1 and BLR transgenic strains with a single nptII integration (data not shown) were used for calibration. Total DNA isolated from each strain was first digested overnight using EcoR I (New England Biolabs) at 37°C. Real-time qPCR was conducted on a BioRad iCycler with 96-well reaction blocks in the presence of SYBR Green under the following conditions: 1X iQ SYBR Green Supermix (BioRad), 150 nM nptII (500 nM 16S) primers in a 25 μL reaction. Real-time qPCR cycling was 95°C for 3 min, followed by 40 cycles of 30 sec at 95°C, 30 sec at 60°C, and 30 sec at 72°C. Threshold cycles (Ct) were determined with iCycler (BioRad) software for all samples. A standard curve was prepared for quantification. For this purpose, a four-fold dilution series of a total of seven dilutions was prepared from a digested total DNA sample, and each dilution was subjected to qPCR analysis in at least duplicate with either the nptII- or 16S-specific primers. Obtained Ct values were used by the iCycler software package to plot a standard curve that allowed quantification of nptII or 16S in the digested total DNA samples (i.e. unknowns) relative to the DNA sample used to prepare the standard curve.

Expression control in the three-plasmid system

BLR (DE3) cells were transformed with three plasmids: pBbA8a-CFP, pBbE5c-YFP and pBbS2k-RFP. A single colony was used to inoculate LB medium and the overnight cultures were grown at 37°C in minimal medium (M9 medium supplied with 75 mM MOPS, 2 mM MgSO4, 1 mg/L thiamine, 10 nM FeSO4, 0.1 mM CaCl2 and micronutrients) supplemented with 2% glucose. Cells were induced at OD

0.6 with combinations of different amounts of arabinose, IPTG and aTc. In detail, the arabinose concentrations used were 0, 5 mM, and 20 mM the IPTG concentrations used were 0, 30 μM, and 100 μM and the aTc concentrations used were 0, 12.5 nM, 25 nM, and 40 nM. After induction, cells were grown at 30°C for 12 hours until cell culture fluorescence was measured. Cell culture fluorescence was recorded on a SpectraMax M2 plate reader (Molecular Devices) using 96-well Costar plates with each well containing 150 μl of cell culture. For CFP, λex = 433 nm and λem = 474 nm were used for YFP, λex = 500 nm and λem = 530 nm were used and for RFP, λex = 584 nm and λem = 615 nm were used. Cell density was estimated by measuring the absorbance at 610 nm. Cell culture fluorescence from each well was normalized by its cell density. All the data were average from at least two independent measurements.


Watch the video: DNA replication - 3D (May 2022).