Where to find E.coli gene expression data?

Where to find E.coli gene expression data?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am searching E.coli whole genome expression data with different conditions, any suggestion is appreciated. Condition could be for example different growth temperature, different medias, etc. I have tried GEO but it only gives a list and one should read all pages and samples one by one to see which conditions were used.

There are some databases in which you can search for E.coli gene expression data:

  • GenExpDB: E. coli Gene Expression Database
  • Many Microbe Microarrays Database (M3D): A resource of microbial gene expression data
  • Stanford MicroArray Database (use the search tool to find relevant organisms)
  • Colombos (COLlection Of Microarrays for Bacterial OrganismS)
  • ArrayExpress

To find out about the purpose and the conditions of the experiments behind this data sets you will have to read the respective publications.

Gene expression analysis of E. coli strains provides insights into the role of gene regulation in diversification

Escherichia coli spans a genetic continuum from enteric strains to several phylogenetically distinct, atypical lineages that are rare in humans, but more common in extra-intestinal environments. To investigate the link between gene regulation, phylogeny and diversification in this species, we analyzed global gene expression profiles of four strains representing distinct evolutionary lineages, including a well-studied laboratory strain, a typical commensal (enteric) strain and two environmental strains. RNA-Seq was employed to compare the whole transcriptomes of strains grown under batch, chemostat and starvation conditions. Highly differentially expressed genes showed a significantly lower nucleotide sequence identity compared with other genes, indicating that gene regulation and coding sequence conservation are directly connected. Overall, distances between the strains based on gene expression profiles were largely dependent on the culture condition and did not reflect phylogenetic relatedness. Expression differences of commonly shared genes (all four strains) and E. coli core genes were consistently smaller between strains characterized by more similar primary habitats. For instance, environmental strains exhibited increased expression of stress defense genes under carbon-limited growth and entered a more pronounced survival-like phenotype during starvation compared with other strains, which stayed more alert for substrate scavenging and catabolism during no-growth conditions. Since those environmental strains show similar genetic distance to each other and to the other two strains, these findings cannot be simply attributed to genetic relatedness but suggest physiological adaptations. Our study provides new insights into ecologically relevant gene-expression and underscores the role of (differential) gene regulation for the diversification of the model bacterial species.

Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models

*Corresponding author. Department of Bioengineering, University of California San Diego, 417 Powell-Focht Bioengineering Hall, 9500 Gilman Drive, La Jolla, CA 92093-0412, USA. Tel.: +1 858 534 5668 Fax: +1 858 822 3120 E-mail: [email protected]

After hundreds of generations of adaptive evolution at exponential growth, Escherichia coli grows as predicted using flux balance analysis (FBA) on genome-scale metabolic models (GEMs). However, it is not known whether the predicted pathway usage in FBA solutions is consistent with gene and protein expression in the wild-type and evolved strains. Here, we report that >98% of active reactions from FBA optimal growth solutions are supported by transcriptomic and proteomic data. Moreover, when E. coli adapts to growth rate selective pressure, the evolved strains upregulate genes within the optimal growth predictions, and downregulate genes outside of the optimal growth solutions. In addition, bottlenecks from dosage limitations of computationally predicted essential genes are overcome in the evolved strains. We also identify regulatory processes that may contribute to the development of the optimal growth phenotype in the evolved strains, such as the downregulation of known regulons and stringent response suppression. Thus, differential gene and protein expression from wild-type and adaptively evolved strains supports observed growth phenotype changes, and is consistent with GEM-computed optimal growth states.


When prokaryotes are maintained at early- to mid-log phase growth through serial passaging for hundreds of generations, the strains improve fitness and evolve a higher growth rate ( Lenski and Travisano, 1994 Ibarra et al, 2002 ). This increased growth rate is the result of the appearance of a few causal mutations ( Herring et al, 2006 Conrad et al, 2009 ). In Escherichia coli, these altered growth phenotypes are consistent with predictions from genome-scale models of metabolism (GEMs) ( Ibarra et al, 2002 Fong and Palsson, 2004 ). However, it is still not known (1) whether absolute gene and protein expression levels and expression changes are consistent with optimal growth predictions from in silico GEMs or (2) whether measured expression changes can be linked to physiological changes that are based on known mechanisms or pathways. In this study, we begin to address these questions using constraint-based modeling of E. coli K-12 metabolism ( Feist and Palsson, 2008 ) to analyze omic data that document the expression changes in E. coli under adaptive evolution in three different growth conditions.

Mapping high-throughput data to a network can be useful for interpretation. However, it does not account for upstream and downstream effects of gene and protein expression changes. The analysis of data in the context of GEMs can suggest if predicted activity is consistent with the data. For this work, we used a variant of flux balance analysis (FBA), called Parsimonious enzyme usage FBA (pFBA) (Figure 1), to classify all genes according to whether they are used in the optimal growth solutions. Results from these models were compared with the data to assess whether the data were consistent with genes and proteins within the predicted optimal solutions, and whether the expression changes were consistent with measured physiology. Through this analysis, we find that the data provide a high coverage of genes that contribute to the optimal growth solutions (Figure 1B). In fact, the union of the proteomic and transcriptomic data for non-essential genes provides support for 97.7% of all non-essential gene-associated reactions within the optimal growth predictions. Thus, the spectrum of expressed genes and proteins is consistent with the pathway utilization that is predicted for these optimal growth phenotypes.

Laboratory-evolved strains attain a higher growth rate. This higher growth rate is usually associated with an increased substrate uptake rate ( Ibarra et al, 2002 Fong et al, 2005 ) and in some cases more efficient metabolism ( Ibarra et al, 2002 ). Both of these properties are also witnessed in the strains studied here. It has been reported that in most cases, evolved strain growth phenotype is consistent with GEM predictions ( Ibarra et al, 2002 Teusink et al, 2009 ). Here, we evaluate whether the laboratory-evolved strains adjust the gene and protein expression levels in accordance with pathway usage in the optimal growth predictions. Essential and non-essential genes and proteins within the optimal growth solutions are significantly upregulated (Figure 1B). This suggests that these proteins may be acting as bottlenecks that are relieved through the adaptive process, thereby allowing for a higher substrate uptake rate and growth rate. However, genes and proteins associated with reactions that cannot carry a flux in the given growth conditions are downregulated in the evolved strains (Figure 1B). Furthermore, there is downregulation of genes associated with less efficient pathways (Figure 5C). Thus, the omic data support the emergence of the predicted optimal growth states, consistent with the increased substrate uptake upstream and the increased biomass production downstream of these internal pathways.

Regulatory mechanisms, both known and unknown, are responsible for the changes seen here. Across all data sets, several metabolic regulons are significantly downregulated. However, no known regulons were enriched among upregulated genes or proteins for all but one data set. Aside from just regulating the metabolic pathways directly, these mechanisms lead to additional physiological changes. For example, in the minimal media growth conditions used here, the stringent response normally represses growth while upregulating amino-acid biosynthetic processes. However, evolved strain gene expression shows a suppression of the stringent response, as evolved strain gene expression shows either no expression change or changes opposite to the normal stringent response.

The implications of this work are as follows: (1) genome-scale gene and protein expression data are consistent with FBA computed optimal growth states, and evolved strains reinforce these optimal states (2) genome-scale models will have an important function bridging the gap between genotype and phenotype and (3) the development of additional genome-scale models of other growth-related processes such as transcription and translation ( Thiele et al, 2009 ) will have an important function in elucidating the mechanisms that contribute the most to altered phenotypes ( Lewis et al, 2009a ). In addition, reconstruction of the transcriptional regulation network will aid in identifying the control of expression changes seen in the other systems.


When prokaryotes are grown at low- to mid-log phase for hundreds of generations through periodic serial passaging, they acquire an increased growth rate ( Lenski and Travisano, 1994 Ibarra et al, 2002 Fong et al, 2003 Barrick et al, 2009 Conrad et al, 2009 Teusink et al, 2009 ). This example of laboratory adaptive evolution is expected, as faster growing mutants quickly outgrow slower growing cells, even if the initial fitness difference is small ( Applebee et al, 2008 ). Molecular changes that confer the growth improvement have been previously studied using fluxomics ( Fong et al, 2006 Hua et al, 2007 ), transcriptomics ( Fong et al, 2005 Becker and Palsson, 2008 Le Gac et al, 2008 Kinnersley et al, 2009 ), and whole-genome resequencing ( Herring et al, 2006 Barrick et al, 2009 Conrad et al, 2009 Charusanti et al, submitted for publication). For example, whole-genome resequencing of adapted strains showed that only a small number of mutations arise after hundreds of generations ( Herring et al, 2006 Conrad et al, 2009 ). Although each evolved strain acquired a different set of mutations, each set of mutations yielded a similar growth phenotype. When these mutations were introduced into the wild-type strain by allelic replacement, the wild-type cells acquired the evolved-strain growth rates ( Herring et al, 2006 ). However, the mechanism linking the mutations to the improved growth rate in most evolved strains has yet to be clearly identified, except for cases in which strains had a mutation in RNA polymerase (RNAP) or glpK ( Herring et al, 2006 ), which altered activity of transcription and glycerol uptake.

Although the genetic changes have been identified and characterized, the resulting coordination of cellular processes that lead to the altered phenotypes have only been studied briefly from a network perspective. Such studies of adaptively evolved strains have shown an activation of normally latent metabolic pathways ( Fong et al, 2006 ), expression improvements to the strains that make them more consistent with a high-growth rate for various minimal media conditions ( Becker and Palsson, 2008 ), improved respiration ( Ferea et al, 1999 ), optimization of a small growth-coupled circuit ( Dekel and Alon, 2005 ), and optimization of yield on a poor carbon source ( Teusink et al, 2009 ). In addition, the measured growth rates of evolved strains were shown to be consistent with most growth rate predictions from an in silico genome-scale metabolic model (GEM) of Escherichia coli ( Ibarra et al, 2002 Fong and Palsson, 2004 ).

Although all of these studies have elucidated some characteristics of the complex adaptation process, it is not known (1) whether absolute genome-scale gene and protein expression levels and expression changes are consistent with optimal growth predictions from in silico GEMs or (2) whether measured expression changes can be linked to physiological changes that are based on known mechanisms or pathways. To begin to address these questions, we use constraint-based modeling of E. coli K-12 metabolism ( Feist and Palsson, 2008 Lewis et al, 2009b ) to analyze a compendium of ‘omics’ data obtained from adaptive evolution experiments. First, we show that the data are consistent with pathway usage from the computationally predicted optimal growth states. We next show that expression changes during the adaptation process relative to wild type further converge to predicted enzyme usage from the optimal growth rate predictions (Figure 1). Finally, we show that changes in known regulatory processes acting on the metabolic network, but not accounted for in the GEMs, are consistent with the improved-growth phenotypes of the adapted strains.

Table of contents (27 chapters)

Recombinant Protein Expression in E. coli : A Historical Perspective

N- and C-Terminal Truncations to Enhance Protein Solubility and Crystallization: Predicting Protein Domain Boundaries with Bioinformatics Tools

Cooper, Christopher D. O. (et al.)

Harnessing the Profinity eXact™ System for Expression and Purification of Heterologous Proteins in E. coli

ESPRIT: A Method for Defining Soluble Expression Constructs in Poorly Understood Gene Sequences

Optimizing Expression and Solubility of Proteins in E. coli Using Modified Media and Induction Parameters

Optimization of Membrane Protein Production Using Titratable Strains of E. coli

Optimizing E. coli-Based Membrane Protein Production Using Lemo21(DE3) or pReX and GFP-Fusions

High Yield of Recombinant Protein in Shaken E. coli Cultures with Enzymatic Glucose Release Medium EnPresso B

A Generic Protocol for Purifying Disulfide-Bonded Domains and Random Protein Fragments Using Fusion Proteins with SUMO3 and Cleavage by SenP2 Protease

A Strategy for Production of Correctly Folded Disulfide-Rich Peptides in the Periplasm of E. coli

Split GFP Complementation as Reporter of Membrane Protein Expression and Stability in E. coli: A Tool to Engineer Stability in a LAT Transporter

Errasti-Murugarren, Ekaitz (et al.)

Acting on Folding Effectors to Improve Recombinant Protein Yields and Functional Quality

Protein Folding Using a Vortex Fluidic Device

Removal of Affinity Tags with TEV Protease

Raran-Kurussi, Sreejith (et al.)

Generation of Recombinant N-Linked Glycoproteins in E. coli

Production of Protein Kinases in E. coli

Expression of Prokaryotic Integral Membrane Proteins in E. coli

Multiprotein Complex Production in E. coli: The SecYEG-SecDFYajC-YidC Holotranslocon

Membrane Protein Production in E. coli Lysates in Presence of Preassembled Nanodiscs

Not Limited to E. coli: Versatile Expression Vectors for Mammalian Protein Expression

A Generic Protocol for Intracellular Expression of Recombinant Proteins in Bacillus subtilis

In Vivo Biotinylation of Antigens in E. coli

Cold-Shock Expression System in E. coli for Protein NMR Studies

High-Throughput Production of Proteins in E. coli for Structural Studies

Mass Spectrometric Analysis of Proteins

How to Determine Interdependencies of Glucose and Lactose Uptake Rates for Heterologous Protein Production with E. coli

Materials and Methods

High-Confidence Regulatory Network Reconstruction.

To reconstruct the high-confidence TRN (hiTRN), we combined strong evidence interactions from RegulonDB 9.4 (7) according to the RegulonDB Evidence Classification (38), with TF KO-validated ChIP-based interactions for 15 regulons from literature: arcA and fnr (10, 39, 40), argR (41, 42), trpR, lrp (42), fur (13), gadEWX (22), oxyR, soxRS (43), purR (11), crp (44), and cra (45). The regulatory direction ( + or − ) was preserved from the original study. Both directions were added if the direction was uncertain.

Expression Compendium Preparation.

Experimental conditions from EcoMAC (14) were filtered to exclude nonrelevant conditions as described in Yang et al. (46), resulting in expression profiles for 4,189 genes × 444 samples. Three of these samples (wild-type E. coli MG1655 grown aerobically in M9 medium with glucose) were used as a reference.

Nonnegative Matrix Factorization.

We performed NMF using sklearn with “nnsvd” initialization (47). The top genes accounting for 15% of each metagene’s weight were used for regulon enrichment. We compared NMF with singular value decomposition to support our choice of 40 metagenes (20) (SI Appendix, Fig. S20). We also used nonsmooth NMF (nsNMF) to identify sparse metagenes (48) and removed genes from each metagene having coefficients <0.001 (attributable to numerical error). Since NMF solves a nonconvex optimization problem and requires multiple runs to ensure global optimality, we used two methods, by Kim and Tidor (20) and Wu et al. (21), to confirm that our NMF decomposition was stable (SI Appendix, SI Methods). Metagenes are defined in Dataset S4.

Regulatory Module Identification.

We compiled a network of TFs that were coenriched in a metagene, from 100 runs each of NMF and nsNMF (48). We kept only 522 TF pairs that were strongly coenriched ( Jaccard index > 0.18 ) and significant (permutation test, P < 0.05 , from 100,000 random networks sampled from the observed frequency of coenriched TFs). We then identified modules using multilevel modularity optimization (49). The modularity coefficient of 0.483 was above the recommended cutoff of 0.3 to indicate community structure by Clauset et al. (50). The functional labels of the modules were assigned by using DAVID (51) functional annotation, followed by manual curation.

DEG Identification.

DEGs were identified by using the R package limma in Bioconductor (52), with thresholds of | log 2 ( Fold change | > 1 and FDR-adjusted P < 0.05 . Three samples were used as the reference: wild-type MG1655 grown in M9 with glucose as carbon source under aerobic conditions. The resulting 441 samples of expression profiles relative to the reference corresponded to 174 experimental conditions. Of these conditions, 166 showed significant differential expression. In 162 of these 166 conditions, at least one regulon was enriched for DEGs.

Network-Expression Consistency Analysis.

We determined the consistency of DEGs with the hiTRN for 21 TF KO experiments. Network reachability was performed by using igraph in Python (49) and sign consistency by using SigNetTrainer in Matlab (17).

Expression Profile Regression.

We used supervised machine learning (multiple linear regression and support vector regression) to predict log-fold change in expression of 1,364 TUs having at least one known regulator. We compared eight model structures with features including known regulators of each TU, cooperation/competition terms for all pairs of TFs, and known sigma factors. Models were evaluated by using a stratified 10-fold cross-validation to reduce overfitting (Materials and Methods). We determined whether our models captured condition-specific effects by comparing them against models trained on 1,000 randomly shuffled TU profiles, while maintaining the order of regulator expression profiles. We further determined the significance of the TRN for predicting expression by comparing models trained on the known TRN against those trained on 1,000 random TRNs having random TFs assigned to each TU, preserving the distribution of regulators per TU. TFs having high MI with known TFs were not randomly assigned to the TU.

Information Analysis.

We computed MI between TFs and target genes using the NPEET Python package (53). As described in Faith et al. (9), we compared this MI to a background distribution of MI scores using the Wilcoxon rank-sum test ( α = 0.05 ).


The implementation of comprehensive analysis tools from systems biology into bioprocess development concepts enables the change from empirical to rational knowledge based approaches in host engineering and process design. DNA microarrays are powerful, state of the art tools for the monitoring of cellular systems on transcriptome level providing insight into cellular response to defined changes in cultivation conditions, e.g induction of recombinant protein production [1]. The successful application of microarrays as monitoring tool in bioprocess development strongly depends on concerted design of cultivation experiments as well as array experiments and systematic data analysis. To enable interpretation of results the most significant information must be extracted from the acquired microarray data by using optimally suited methods of statistics and bioinformatics. Comparative analysis of data sets from independent experiments provide additional information and contributes to the optimal exploitation of microarray data. Cluster analysis is frequently used in gene expression data analysis to find groups of co-expressed genes which can finally suggest functional pathways and interactions between genes. Clusters of co-expressed genes can help to discover potentially co-regulated genes or genes associated to conditions under investigation, i.e., the induction strategies. Usually cluster analysis provides a good initial investigation of microarray data before actually focusing on smaller gene groups of interest. In the literature numerous cluster algorithms for clustering gene expression data have been proposed. Besides traditional methods like hierarchical clustering, K-means, partitioning around medoids (PAM, K-medoids) or self-organizing maps there are several algorithms dealing with time-course gene expression data (e.g., [2–5]). Clustering is commonly used to reduce the complexity of the data from multidimensional space to a single nominal variable, the cluster membership. In the analysis of microarray data clustering is used as vector quantization because no clear density clusters exist in the data. Genetic interactions are so complex that the definition of gene clusters is not clear. Additionally microarray data are very noisy and co-expressed genes can end up in different clusters. Therefore the set of genes is divided into artificial subsets where relationships between clusters play an important role. Depending on the purpose of the cluster analysis different numbers of clusters can be appropriate. Few large clusters are typically used for a broad overview of a data set and many small clusters are more suitable to detect co-regulated genes (e.g., over 25 clusters in [2]).

The display of cluster solutions particularly for a large number of clusters is very important in exploratory data analysis. Visualization methods are necessary in order to make cluster analysis useful for practitioners. They give an understanding of the relationships between segments of a partition and make it easier to interpret the cluster results. In this work neighborhood graphs [6] are used for visual assessment of the cluster structure of partitioning cluster solutions.

All cluster algorithms and visualization methods used are implemented in the statistical computing environment R ([7], R package flexclust [6] contains extensible implementations of the K-centroids and QT-Clust algorithm. The new interactive visualization toolbox gcExplorer [8] uses the non-linear graph layout algorithms implemented in the open-source graph visualization software Graphviz ( for the arrangement of nodes. Bioconductor packages graph and Rgraphviz [9] provide tools for creating, manipulating, and visualizing graphs in R as well as an interface to Graphviz. The gcExplorer contains several possibilities to investigate gene clusters. A detailed view of single clusters is given by clicking on the nodes of the graph where various panel functions can be used to show the corresponding genes, e.g., matrix plots for gene expression profiles over time or HTML tables giving detailed information about differential expression as well as links to databases. Properties of the clusters can be included in the display of the neighborhood graph, e.g., cluster size or cluster tightness. Additionally external knowledge from differential expression analysis or functional grouping is used to investigate the data. Finally different experiments can easily be compared by visualizing groups of genes with common expression pattern in one experiment and potentially different expression pattern in the other experiment. The latest release of gcExplorer is always available at the Comprehensive R Archive Network CRAN:

In this paper the utility of the interactive visualization toolbox gcExplorer is demonstrated for the interpretation of E. coli microarray data. The data sets used derive from two independent fedbatch experiments conducted in order to investigate the impact of different induction strategies on the host metabolism and product yield. The goal of the comparison is to identify genes and pathways that act similar in both settings and more importantly to identify groups of genes with differential reaction to the two induction strategies. For this reason cluster analysis followed by comparative graphical investigation of the different groups of genes is performed. The graphical exploration of clusterings is applicable to arbitrary partitioning cluster solutions. In this case the stochastic quality cluster algorithm QT-Clust [10] is used. In the Methods Section this cluster algorithm and the concept of neighborhood graphs are reviewed for completeness. The data sets used are described in the Data Section. In the Results Section several steps of the analysis of the given data sets are presented including the visualization of the cluster structure and the direct graphical comparison of these two experiments. Further, a method is presented how to include external knowledge about gene function in the display of cluster solutions. It is shown that the identification of potentially interesting gene candidates or functional groups is substantially accelerated and eased.


Similar to the role that the elucidation of the structure of DNA had in the foundation of modern genetics, the concepts more recently revealed about transcription factor binding sites (TFBSs) and their effects on the activity of promoters that transcribe transcription units, operons, and regulons serve as the foundation for how we think about gene regulation in microbial organisms, and with some modifications, in higher organisms as well. These concepts were the product of research in Escherichia coli K-12 during the second half of the twentieth century. They underlie the computational infrastructures for electronic databases on microbes, such as RegulonDB, to encode and populate all knowledge that molecular biologists have generated, from the time of the seminal works by Jacob and Monod to today. Over 20 years of continued curation have resulted in the placement of every binding site, promoter, transcription factor (TF) and its active conformation, or any other piece of published knowledge on gene regulation, in their corresponding coordinates of the updated complete genome sequence of this bacterium.

However, the emergence of “postgenomic methodologies” has changed the game. We now have whole-genome expression profiles for thousands of different conditions (e.g., the COLOMBOS and M3D databases [1, 2]) and whole-genome identification of binding sites for around 65 TFs these numbers continue to increase. During the last decade, we have seen a sharp increase in the number of studies on transcriptional regulation in E. coli K-12 involving different high-throughput (HT) approaches (Fig. 1), and it is likely that we are transitioning to high-throughput (HT) approaches dominating research, as opposed to the more directed molecular biology experiments already deposited in RegulonDB. See the variety of novel HT methodologies shown in Table 1.

Number of publications studying transcriptional regulation in E. coli K-12, using either classic molecular biology or HT technologies through the years

In the midst of the accelerated pace of generation of data and experimental information in the genomic era, databases and other electronic resources are the major instruments with which to integrate and facilitate access to the tsunami of data otherwise only incompletely captured by individual investigators. Table 2 lists the major databases and repositories with information about the biology of E. coli K-12. The two up-to-date manually curated databases are RegulonDB [3] and EcoCyc [4]. Our team is in charge of curating transcriptional regulation for these two databases. On the other hand, COLOMBOS is the only database with microarray data specific for E. coli, and it also contains similar data for a few other microorganisms [1]. Otherwise, HT data are found in the general repositories GEO and ArrayExpress (Table 1).

Years ago, there were efforts in the USA to organize HT data for E. coli. These included EcoliHub and its subsequent PortEco version, in addition to EcoliWiki none of these is currently actively maintained [5]. Therefore, an investigator interested in gathering what is currently known about a particular regulatory system in E. coli has to spend time searching these different resources.

Given that HT methodologies enrich our knowledge on gene regulation and gene expression, expanding the current model beyond RegulonDB is a natural next step. However, this is not a straightforward task. HT data sometimes challenge the Jacob and Monod paradigm, such as when there is supporting evidence for a binding site far from any promoter, or when a promoter site is found in a non-coding region between two convergent ends of genes, where no transcription initiation is expected to occur. HT methodologies generate large amounts of what sometimes appears as disconnected pieces of data. For instance, a single study might reveal ≈ 14,000 candidate transcription start sites (TSSs), of which more than 11,000 occur within the coding regions (≈ 5500 in the sense strand and ≈ 5400 in the antisense strand) [6]. Similarly, it is no longer surprising to find binding sites within the coding regions in HT binding experiments. The number of these TSSs or binding sites that are either non-functional or that participate in roles not directly related to gene regulation is still an open question.

As a result, we need a mixed model that can accommodate both the complete picture of a transcription unit with its promoter and binding sites where objects and their interactions make sense, as well as plausible but disconnected objects. First, the data should be available in a structured way when possible, but with enough flexibility to allow users to make their own decisions. Second, we need to implement tools and criteria to identify experiments performed under similar conditions. An ontology and its corresponding controlled vocabulary for precisely defining growth conditions are part of our efforts in this direction [7]. This is the basis for merging our classic curation with the one presented here for HT binding experiments, together with the expression profiles to identify the effects of binding, to construct a regulatory interaction. Third, we need to define additional evidence codes for different types of HT experiments, together with the limits that define when there is sufficient information to include a new regulatory interaction or any other piece of evidence that contributes to plausible regulatory processes, as opposed to scattered elements without enough support for their interpretation as functional elements of gene regulation. Finally, we have to define the features of and how to display HT-generated binding sites and regulatory interactions in a way consistent with those that already exist. Altogether, this constitutes the basis for adequately gathering and enabling the comparisons and integration needed to manage the vast current knowledge about transcriptional regulation in E. coli. We present here the first version of a more complete integration of HT binding experimental results (from chromatin immunoprecipitation [ChIP] experiments and genomic systematic evolution of ligands by exponential enrichment [gSELEX] data) with the previously curated literature.

Table of contents (23 chapters)

Cold-Inducible Promoters for Heterologous Protein Expression

Dual-Expression Vectors for Efficient Protein Expression in Both E. coli and Mammalian Cells

A Dual-Expression Vector Allowing Expression in E. coli and P. pastoris, Including New Modifications

Purification of Recombinant Proteins from E. coli by Engineered Inteins

Calmodulin as an Affinity Purification Tag

Calmodulin-Binding Peptide as a Removable Affinity Tag for Protein Purification

Maltose-Binding Protein as a Solubility Enhancer

Thioredoxin and Related Proteins as Multifunctional Fusion Tags for Soluble Expression in E. coli

Discovery of New Fusion Protein Systems Designed to Enhance Solubility in E. coli

Assessment of Protein Folding/Solubility in Live Cells

Improving Heterologous Protein Folding via Molecular Chaperone and Foldase Co-Expression

High-Throughput Purification of PolyHis-Tagged Recombinant Fusion Proteins

Co-Expression of Proteins in E. coli Using Dual Expression Vectors

Small-Molecule Affinity-Based Matrices for Rapid Protein Purification

Use of tRNA-Supplemented Host Strains for Expression of Heterologous Genes in E. coli

Screening Peptide/Protein Libraries Fused to the λ Repressor DNA-Binding Domain in E. coli Cells

Mariño-RamÍrez, Leonardo (et al.)

Studying Protein-Protein Interactions Using a Bacterial Two-Hybrid System

Using Bio-Panning of FLITRX Peptide Libraries Displayed on E. coli Cell Surface to Study Protein-Protein Interactions

Use of Inteins for the In Vivo Production of Stable Cyclic Peptide Libraries in E. coli

Combinatorial Biosynthesis of Novel Carotenoids in E. coli

Using Transcriptional-Based Systems for In Vivo Enzyme Screening

Identification of Genes Encoding Secreted Proteins Using Mini-OphoA Mutagenesis


As we saw above, rapid lysis (r) mutants were found that mapped to three different regions of the T4 genome: rI, rII, and rIII. This meant that those in different regions were not alleles of the same gene and more than one gene product participated in the lysis function. Even within one "locus", rII, there turned out to be two different stretches of DNA both of which were needed intact for the lysis function. This was revealed by the complementation test that Benzer used. In this test,

  • E. coli strain K (which rII mutants can infect but not complete their life cycle) &mdash growing in liquid culture &mdash was
  • coinfected with two different rII mutants (shown in the figure as "1" and "2").

Note that this procedure differs from the earlier one (recombination) in that the nonpermissive E. coli K is used for the initial infection (not strain B as before). Neither strain rII"1" nor strain rII"2" is able to grown in E. coli K. But if the lost function in rII"1" is NOT the same as the lost function in rII"2", then

  • each should be able to produce the gene product missing in the other &mdash complementation &mdash and
  • living phages will be produced. (Again, there is no need to count plaques simply see if they are formed or not.)

Mutant strains 1 2 3 4 5
1 0 0 + 0 +
2 0 + 0 +
3 0 + 0
4 0 +
5 0

From these results, you can deduce that these 5 rII mutants fall into two different complementation groups, which Benzer designated A (containing strains 1, 2, and 4) and B (containing strains 3 and 5). Later work showed that the function of rII depended on the polypeptide products encoded by two adjacent regions (A and B) of rII (perhaps acting as a heterodimer). In terms of function, then, both A and B qualify as independent genes. In coinfections by two mutant strains,

  • If either A or B is mutated on the same DNA molecule ("cis"), there is no function while
  • if A is mutated in one DNA molecule and B in the other ("trans"), function is restored.

Complementation, then, is the ability of two different mutations to restore wild-type function when they are in the "trans" (on different DNA molecules), but not when they are in "cis" (on the same DNA molecule). Benzer coined the term cistron for these genetic units of function. But today, we simply modify earlier concepts of the "gene" to fit this operational definition.

Watch the video: Protein microarray interaction assay (July 2022).


  1. Caly

    Even though I am a student of a financial university, the topic is not entirely for my brains. But, it should be noted that it is very useful for ordinary life. Better to see the experience of others

  2. Hoben

    Yes indeed. I agree with all of the above. We can communicate on this theme. Here or at PM.

  3. Thang

    I think you are wrong. Let's discuss this.

  4. Attmore

    change domain name

  5. Buiron

    Of course. I agree with all of the above. We can communicate on this theme.

  6. Dracon

    I apologize for interrupting you.

Write a message