Information

6.14: Connections to Other Pathways - Biology

6.14: Connections to Other Pathways - Biology


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

There are several connections between metabolism of fats and fatty acids to other metabolic pathways. As noted, phosphatidic acid is an intermediate in the synthesis of triacylglycerols, as well as of other lipids, including phosphoglycerides. Diacylglycerol (DAG), which is an intermediate in fat synthesis, also acts as a messenger in some signaling systems. Fatty acids twenty carbons long based on arachidonic acid (also called eicosanoids) are precursors of the classes of molecules known as leukotrienes and prostaglandins. The latter, in turn, are precursors of the class of molecules known as thromboxanes. The ultimate products of beta oxidation are acetyl-CoA molecules and these can be assembled by the enzyme thiolase to make acetoacetyl-CoA, which is a precursor of both ketone bodies and the isoprenoids, a broad category of compounds that include steroid hormones, cholesterol, bile acids, and the fat soluble vitamins, among others.


6.14: Connections to Other Pathways - Biology

Glycogen, a polymer of glucose, is a short-term energy storage molecule in animals. When there is adequate ATP present, excess glucose is converted into glycogen for storage. Glycogen is made and stored in the liver and muscle. Glycogen will be taken out of storage if blood sugar levels drop. The presence of glycogen in muscle cells as a source of glucose allows ATP to be produced for a longer time during exercise.

Sucrose is a disaccharide made from glucose and fructose bonded together. Sucrose is broken down in the small intestine, and the glucose and fructose are absorbed separately. Fructose is one of the three dietary monosaccharides, along with glucose and galactose (which is part of milk sugar, the disaccharide lactose), that are absorbed directly into the bloodstream during digestion. The catabolism of both fructose and galactose produces the same number of ATP molecules as glucose.


Connections of Other Sugars to Glucose Metabolism

Glycogen, a polymer of glucose, is an energy storage molecule in animals. When there is adequate ATP present, excess glucose is shunted into glycogen for storage. Glycogen is made and stored in both liver and muscle. The glycogen will be hydrolyzed into glucose monomers (G-1-P) if blood sugar levels drop. The presence of glycogen as a source of glucose allows ATP to be produced for a longer period of time during exercise. Glycogen is broken down into G-1-P and converted into G-6-P in both muscle and liver cells, and this product enters the glycolytic pathway.

Sucrose is a disaccharide with a molecule of glucose and a molecule of fructose bonded together with a glycosidic linkage. Fructose is one of the three dietary monosaccharides, along with glucose and galactose (which is part of the milk sugar, the disaccharide lactose), which are absorbed directly into the bloodstream during digestion. The catabolism of both fructose and galactose produces the same number of ATP molecules as glucose.


Biology 171

Learning Objectives

By the end of this section, you will be able to do the following:

  • Discuss the ways in which carbohydrate metabolic pathways, glycolysis, and the citric acid cycle interrelate with protein and lipid metabolic pathways
  • Explain why metabolic pathways are not considered closed systems

You have learned about the catabolism of glucose, which provides energy to living cells. But living things consume organic compounds other than glucose for food. How does a turkey sandwich end up as ATP in your cells? This happens because all of the catabolic pathways for carbohydrates, proteins, and lipids eventually connect into glycolysis and the citric acid cycle pathways (see (Figure)). Metabolic pathways should be thought of as porous and interconnecting—that is, substances enter from other pathways, and intermediates leave for other pathways. These pathways are not closed systems! Many of the substrates, intermediates, and products in a particular pathway are reactants in other pathways.

Connections of Other Sugars to Glucose Metabolism

Glycogen , a polymer of glucose, is an energy storage molecule in animals. When there is adequate ATP present, excess glucose is stored as glycogen in both liver and muscle cells. The glycogen will be hydrolyzed into glucose 1-phosphate monomers (G-1-P) if blood sugar levels drop. The presence of glycogen as a source of glucose allows ATP to be produced for a longer period of time during exercise. Glycogen is broken down into glucose-1-phosphate (G-1-P) and converted into glucose-6-phosphate (G-6-P) in both muscle and liver cells, and this product enters the glycolytic pathway.

Sucrose is a disaccharide with a molecule of glucose and a molecule of fructose bonded together with a glycosidic linkage. Fructose is one of the three “dietary” monosaccharides, along with glucose and galactose (part of the milk sugar dissacharide lactose), which are absorbed directly into the bloodstream during digestion. The catabolism of both fructose and galactose produces the same number of ATP molecules as glucose.

Connections of Proteins to Glucose Metabolism

Proteins are hydrolyzed by a variety of enzymes in cells. Most of the time, the amino acids are recycled into the synthesis of new proteins. If there are excess amino acids, however, or if the body is in a state of starvation, some amino acids will be shunted into the pathways of glucose catabolism ((Figure)). It is very important to note that each amino acid must have its amino group removed prior to entry into these pathways. The amino group is converted into ammonia. In mammals, the liver synthesizes urea from two ammonia molecules and a carbon dioxide molecule. Thus, urea is the principal waste product in mammals, produced from the nitrogen originating in amino acids, and it leaves the body in urine. It should be noted that amino acids can be synthesized from the intermediates and reactants in the cellular respiration cycle.


Connections of Lipid and Glucose Metabolisms

The lipids connected to the glucose pathway include cholesterol and triglycerides. Cholesterol is a lipid that contributes to cell membrane flexibility and is a precursor of steroid hormones. The synthesis of cholesterol starts with acetyl groups and proceeds in only one direction. The process cannot be reversed.

Triglycerides—made from the bonding of glycerol and three fatty acids—are a form of long-term energy storage in animals. Animals can make most of the fatty acids they need. Triglycerides can be both made and broken down through parts of the glucose catabolism pathways. Glycerol can be phosphorylated to glycerol-3-phosphate, which continues through glycolysis. Fatty acids are catabolized in a process called beta-oxidation, which takes place in the matrix of the mitochondria and converts their fatty acid chains into two-carbon units of acetyl groups. The acetyl groups are picked up by CoA to form acetyl CoA that proceeds into the citric acid cycle.


Pathways of Photosynthesis and Cellular Metabolism The processes of photosynthesis and cellular metabolism consist of several very complex pathways. It is generally thought that the first cells arose in an aqueous environment—a “soup” of nutrients—possibly on the surface of some porous clays, perhaps in warm marine environments. If these cells reproduced successfully and their numbers climbed steadily, it follows that the cells would begin to deplete the nutrients from the medium in which they lived as they shifted the nutrients into the components of their own bodies. This hypothetical situation would have resulted in natural selection favoring those organisms that could exist by using the nutrients that remained in their environment and by manipulating these nutrients into materials upon which they could survive. Selection would favor those organisms that could extract maximal value from the nutrients to which they had access.

An early form of photosynthesis developed that harnessed the sun’s energy using water as a source of hydrogen atoms, but this pathway did not produce free oxygen (anoxygenic photosynthesis). (Another type of anoxygenic photosynthesis did not produce free oxygen because it did not use water as the source of hydrogen ions instead, it used materials such as hydrogen sulfide and consequently produced sulfur). It is thought that glycolysis developed at this time and could take advantage of the simple sugars being produced but that these reactions were unable to fully extract the energy stored in the carbohydrates. The development of glycolysis probably predated the evolution of photosynthesis, as it was well suited to extract energy from materials spontaneously accumulating in the “primeval soup.” A later form of photosynthesis used water as a source of electrons and hydrogen and generated free oxygen. Over time, the atmosphere became oxygenated, but not before the oxygen released oxidized metals in the ocean and created a “rust” layer in the sediment, permitting the dating of the rise of the first oxygenic photosynthesizers. Living things adapted to exploit this new atmosphere that allowed aerobic respiration as we know it to evolve. When the full process of oxygenic photosynthesis developed and the atmosphere became oxygenated, cells were finally able to use the oxygen expelled by photosynthesis to extract considerably more energy from the sugar molecules using the citric acid cycle and oxidative phosphorylation.

Section Summary

The breakdown and synthesis of carbohydrates, proteins, and lipids connect with the pathways of glucose catabolism. The simple sugars are galactose, fructose, glycogen, and pentose. These are catabolized during glycolysis. The amino acids from proteins connect with glucose catabolism through pyruvate, acetyl CoA, and components of the citric acid cycle. Cholesterol synthesis starts with acetyl groups, and the components of triglycerides come from glycerol-3-phosphate from glycolysis and acetyl groups produced in the mitochondria from pyruvate.

Free Response

Would you describe metabolic pathways as inherently wasteful or inherently economical? Why?

They are very economical. The substrates, intermediates, and products move between pathways and do so in response to finely tuned feedback inhibition loops that keep metabolism balanced overall. Intermediates in one pathway may occur in another, and they can move from one pathway to another fluidly in response to the needs of the cell.


Connections of Lipid and Glucose Metabolisms

The lipids that are connected to the glucose pathways are cholesterol and triglycerides. Cholesterol is a lipid that contributes to cell membrane flexibility and is a precursor of steroid hormones. The synthesis of cholesterol starts with acetyl groups and proceeds in only one direction. The process cannot be reversed.

Triglycerides are a form of long-term energy storage in animals. Triglycerides are made of glycerol and three fatty acids. Animals can make most of the fatty acids they need. Triglycerides can be both made and broken down through parts of the glucose catabolism pathways. Glycerol can be phosphorylated to glycerol-3-phosphate, which continues through glycolysis. Fatty acids are catabolized in a process called beta-oxidation that takes place in the matrix of the mitochondria and converts their fatty acid chains into two carbon units of acetyl groups. The acetyl groups are picked up by CoA to form acetyl CoA that proceeds into the citric acid cycle.


Pathways of Photosynthesis and Cellular Metabolism The processes of photosynthesis and cellular metabolism consist of several very complex pathways. It is generally thought that the first cells arose in an aqueous environment—a “soup” of nutrients—probably on the surface of some porous clays. If these cells reproduced successfully and their numbers climbed steadily, it follows that the cells would begin to deplete the nutrients from the medium in which they lived as they shifted the nutrients into the components of their own bodies. This hypothetical situation would have resulted in natural selection favoring those organisms that could exist by using the nutrients that remained in their environment and by manipulating these nutrients into materials upon which they could survive. Selection would favor those organisms that could extract maximal value from the nutrients to which they had access.

An early form of photosynthesis developed that harnessed the sun’s energy using water as a source of hydrogen atoms, but this pathway did not produce free oxygen (anoxygenic photosynthesis). (Early photosynthesis did not produce free oxygen because it did not use water as the source of hydrogen ions instead, it used materials like hydrogen sulfide and consequently produced sulfur). It is thought that glycolysis developed at this time and could take advantage of the simple sugars being produced, but these reactions were unable to fully extract the energy stored in the carbohydrates. The development of glycolysis probably predated the evolution of photosynthesis, as it was well suited to extract energy from materials spontaneously accumulating in the “primeval soup.” A later form of photosynthesis used water as a source of electrons and hydrogen, and generated free oxygen. Over time, the atmosphere became oxygenated, but not before the oxygen released oxidized metals in the ocean and created a “rust” layer in the sediment, permitting the dating of the rise of the first oxygenic photosynthesizers. Living things adapted to exploit this new atmosphere that allowed aerobic respiration as we know it to evolve. When the full process of oxygenic photosynthesis developed and the atmosphere became oxygenated, cells were finally able to use the oxygen expelled by photosynthesis to extract considerably more energy from the sugar molecules using the citric acid cycle and oxidative phosphorylation.


Methods

Pathway parameter advising

Pathway parameter advising is based on the parameter advising framework 22 . A parameter advisor consists of two parts: a set of candidate parameter settings S and an accuracy estimator E. The parameter advisor evaluates each candidate parameter setting in S using E to estimate the optimal parameter set. In order to adapt parameter advising to the pathway reconstruction domain, we must choose a function E that can estimate the quality of a reconstructed pathway. While we do not have a direct way to define what criteria an optimal solution satisfies, we do have access to pathways that match biologist intuition of what a biological pathway should look like. Curated pathway databases, such as the KEGG 35 , Reactome 36 , and NetPath 37 , contain pathways that have been compiled by biologists. Therefore, we can construct our estimator around these curated pathways. This leads to the key assumption of pathway parameter advising: reconstructed pathways more topologically similar to manually curated pathways are more useful to biologists.

Our parameter tuning approach requires the inputs to the pathway reconstruction algorithm, a set of candidate parameter settings, and a set of pathways from a reference pathway database. Pathway reconstruction algorithms’ input typically consists of an interactome, such as STRING 38 , and a set of biological entities of interest, such as genes or proteins. We refer to the pathways created by the algorithm as “reconstructed” pathways and the curated database pathways as “reference” pathways. Pathway parameter advising uses a graphlet distance-based estimator E to score each reconstructed pathway’s similarity to the reference pathways. It uses these scores to return a ranking of the reconstructed pathways (or their respective parameter settings).

Pathway parameter advising is designed to be method agnostic. It can be run with any pathway reconstruction algorithm that outputs pathways and has user-specified parameters. Currently, pathway parameter advising is designed to examine undirected graphs, and directed graphs are converted to be undirected.

In order to topologically compare reconstructed and reference pathways, we first decompose all pathways into their graphlet distributions. A graphlet is a subgraph of a particular size within a network. The concept of graphlets is similar to that of network motifs 39 . However, network motifs typically refer to graphlets that appear in a network significantly more often than expected by chance.

Original work with graphlets only considered connected graphlets to better capture local topology 24 . However, we use both connected and disconnected subgraphs, thus allowing all possible combinations of nodes in a pathway to be considered a graphlet. This allows our parameter ranking to capture global topological properties such as pathway size in addition to local topology. One disadvantage of disconnected graphlet counts is that the counts of disconnected graphlets, such as the graphlet containing four unconnected nodes, grow at a much faster rate than those of connected graphlets in sparse networks. However, this does not adversely affect our ranking metric (see “Evaluating the ranking metric”).

Pathway parameter advising uses the parallel graphlet decomposition library 40 to calculate counts of all graphlets up to size 4 in a pathway. This constitutes 17 possible graphlets (Fig. 3). We convert these counts into frequencies and represent each pathway by a vector of 17 values between 0 and 1. This vector, referred to as the graphlet frequency distribution, summarizes the topological properties of a pathway, which allows us to quantify topological similarity.

To calculate the topological distance between two pathways, we take the pairwise distance of their graphlet frequency distributions. For pathways G and H, we denote their frequencies of graphlet i as Fi(G) and Fi(H), scalars between 0 and 1. We then define the GFD D(G, H) as follows:

This differs from relative GFD, which log transforms and scales the raw graphlet counts 24 . We considered other graphlet-based metrics such as a variation of relative GFD and GCD 41 but found that they performed worse in our preliminary analyses (Supplementary Fig. 11).

After calculating the graphlet frequency distribution for each reconstructed and reference pathway, we can calculate their mean GFD to the reference pathways to get E. When calculating this aggregate distance, we only consider the 20% closest reference pathways to the reconstructed pathway. The threshold choice has little impact on the parameter ranking (Supplementary Fig. 12). It is motivated by not requiring a reconstructed pathway to be similar to every reference pathway but instead similar to at least some reference pathways. Thus, a pathway G’s score E(G) is calculated as:

where Rtop is the set of the 20% closest reference pathways to G. The pathways, or equivalently the parameters used to generate those pathways, are sorted by E(G) in ascending order. Once the final ranking is created, the top reconstructed pathway can be used for downstream analysis. Alternatively, the top n pathways can be merged into an ensemble pathway.

Pathway reconstruction methods

Pathway reconstruction algorithms were chosen to have a wide range of methodologies, from NetBox’s statistical test to PathLinker’s weighted shortest paths algorithm. We used the following four methods for our implausible pathway detection and NetPath reconstruction experiments. These methods and the parameters tested are summarized in Table 1.

PathLinker

PathLinker 9 reconstructs pathways based on a weighted k-shortest paths algorithm. It finds paths between sets of receptors and transcriptional regulators, similar to the source and target nodes in minimum-cost flow. It is controlled by the parameter k, which defines how many paths to return in the final network. We varied k from 1 to 1000 in increments of 1. We used PathLinker version 1.1 for all analyses.

NetBox

NetBox 10 hierarchically constructs networks from a set of input nodes. At each iteration, it searches for linker nodes that connect two nodes in the current network. It then chooses to add these linker nodes to the network based on the results of a hypergeometric statistical test comparing the degree of the linker node to how many nodes in the pathway it connects. NetBox is controlled by the parameter p, a p value threshold, which sets the threshold for whether or not linker nodes should be included. We varied p from 0 to 1 on a log scale from 1 × 10 −30 to 1 in increments of half an order of magnitude, giving a total of 60 steps. We used NetBox version 1.0 for all analyses.

Prize-Collecting Steiner Forest

In PCSF 6,14,42 , nodes are assigned prizes and edges are given costs. The optimal subnetwork, which is found via a message-passing algorithm 43 , is the pathway F consisting of nodes NF and edges EF that best balances collected prizes versus cumulative edge costs according to the following function:

where p() is the positive prize for each node, d() is a node’s degree, c() is the cost of each edge, and κ is the number of connected components in the pathway. The optimal subnetwork is always a tree- or forest-structured graph. We varied three PCSF parameters: β, which controls the relative weight of the node prizes versus edge costs, was varied from 0 to 5 in increments of 0.5 μ, which affects the penalty for high-degree nodes, was varied from 0 to 1 in increments of 0.1 and ω, which controls the cost of adding an additional tree to the solution network, was varied from 0 to 10 in increments of 1. We used version 1.3 of the msgsteiner message-passing algorithm and version 0.3.1 of OmicsIntegrator for all analyses.

Minimum-cost flow

The minimum-cost flow problem assigns certain nodes in the network to be “sources” and others to be “targets.” Edges, which transport the flow from node to node, have a cost associated with using them and a capacity of how much flow they can hold. The solution is the network that satisfies the flow requirements of the source and target nodes while using the lowest cost in edges 12 . We implemented a version of min-cost flow using the solver provided in Google’s OR-Tools (https://developers.google.com/optimization/flow/mincostflow), which solves the min-cost flow problem using the algorithm outlined in (Bünnagel et al., 1998) 44 . This is a generic version of the algorithm used in ResponseNet 11 . Two parameters control the min-cost flow solution: the total flow through the network, which we vary from 1 to 50 in increments of 1, and the edge flow capacity, which we vary from 1 to 25 in increments of 1. We used Google’s OR-Tools version 7.1.6720 for all analyses.

Alternate parameter selection methods

We compare our pathway parameter advising approach to the following parameter selection strategies from the literature.

Cross-validation

CV involves splitting the input data, the biological nodes of interest provided to the pathway reconstruction algorithm, into training and testing sets multiple times for each parameter setting. For example, the input data could be sampled nodes from a NetPath pathway. A pathway reconstruction method is then run on each training set and evaluated on each respective testing set. In this problem setting, we do not have external ground truth with which to evaluate the predictions on test set data. Instead, we perform fivefold CV on subsets of the input data, producing a pathway from the training set nodes. The parameter values that produce pathways that recover the highest proportion of the test set nodes are chosen.

ResponseNet ranking

We also tested a parameter selection heuristic used by ResponseNet 20 . The criterion is to select parameters that result in a pathway whose nodes include at least 30% of the input data, while having the lowest proportion of low confidence edges. We extend this to rank the pathways that do include 30% of the inputs by their proportion of low confidence edges, followed by the pathways that include <30% of the inputs to form a full ranking.

PCSF robustness ranking

As suggested by (Kedaigle & Fraenkel, 2018) 14 , for PCSF we can also rank pathways by their robustness. Robustness is measured by how often nodes appeared in multiple runs with small random perturbations to the scores on the input nodes. We only applied this strategy to reconstructed pathways from PCSF. Although it could be adapted to other pathway reconstruction methods, we decided to use it only in the method for which it was directly implemented.

Evaluating reconstructed pathway plausibility

In order to examine the ability of pathway parameter advising to avoid parameter settings that lead to impractical pathways, we created topological criteria that we use to define pathways as plausible or implausible. These criteria are based on the literature where possible and were created without considering the topology of pathways from pathway databases. However, given that curated pathway databases are also based on information from the literature, these plausibility criteria should not be considered completely independent from the pathway databases. We use these criteria as a heuristic to label pathways as positive (plausible) or negative (implausible). The labels enable us to evaluate pathway rankings as a classification problem, determining if a method can correctly rank plausible reconstructed pathways before implausible pathways. These criteria are based on previous analyses of biological networks and are as follows.

We allowed pathways that had between 10 and 1000 nodes. Pathways whose size was outside this range are not practical for hypothesis generation and downstream analysis.

Hub node dependence

A common issue with pathway reconstruction is an over-reliance on high-degree or hub nodes. Dominant hub nodes can create pathways consisting almost entirely of a single node and its neighbors with few to no connections between those neighbors 14 . We score hub node dependence using the ratio of the degree of the highest degree node to the average node degree of the entire pathway. If the maximum degree is more than 20 times greater than the average degree, we consider the pathway implausible.

Clustering coefficient

Biological networks have been found to have clustering or community structure that is hierarchical 45 communities within the network exist at multiple scales and are often nested within each other. Thus, it would be reasonable to expect a plausible biological pathway to have at least a moderate level of community structure. We calculate the average clustering coefficient of all nodes in the pathway, a common metric for measuring community structure 46 . The clustering coefficient of a node is the proportion of its neighbors that are also neighbors of each other. This can be averaged over all nodes in the pathway as a measure of the overall level of clustering. We require pathways to have a mean clustering coefficient of at least 0.05, as we expect at least some small level of clustering to exist. Because this requirement eliminated all PCSF pathways in 25% of parameter tuning tasks, when evaluating PCSF we excluded this metric in all cases.

Assortativity

A network’s level of assortative mixing is defined as the tendency of high-degree nodes to be connected to other high-degree nodes. Biological networks have been found to be generally disassortative, meaning that high-degree nodes tend to be connected to low-degree nodes 47,48 . Assortativity is measured between −1 and 1, where assortative networks have positive values and disassortative networks have negative values. This value can be viewed as the correlation between a node’s degree and its neighbors’ degrees within the pathway. We consider pathways with assortativity between −1 and 0.1 plausible to allow for some leeway in pathways being slightly assortative.

We selected these criteria based on attributes it would be reasonable to expect a biological pathway to have, with values supported by the literature where possible. If any of these criteria are not met, we consider the pathway to be implausible. These thresholds were not influenced by the graph topologies in the reference pathway database in order to minimize circularity between the reference pathway-based rankings and the plausibility criteria used to evaluate those rankings. However, most of the reference pathways we considered happen to be plausible. Seventy-seven percent of the Reactome pathways we used as reference pathways are plausible, though it should be noted that these Reactome pathways have already been filtered by size (see “Data sets”).

While the criteria for defining a plausible network are useful for comparing networks created by the same method with different parameter settings, they should not be considered as a metric for comparing pathways across pathway reconstruction methods. Different pathway reconstruction methods are able to use different sources of information and have complex strengths and weaknesses beyond the local topologies they return. For instance, NetBox, which had the highest proportion of plausible pathways, cannot take into account information such as edge confidence or scores on proteins of interest that other methods such as PCSF can. Furthermore, the four plausibility properties are a binary way to determine if a pathway is reasonable or unreasonable. They cannot be used to rank pathway reconstruction parameters.

In order to make sure that our experimental results are not overly sensitive to the specific choice of thresholds for pathway plausibility, we tested other thresholds in a grid search. We varied the maximum network size threshold from 200 to 2000, the hub node dependence measure from 5 to 50, the clustering coefficient threshold from 0.0 to 0.1, and the assortativity threshold from −0.5 to 0.5. Each range was divided into ten intervals, for a total of 10,000 sets of plausibility thresholds. Figure 5 and Supplementary Fig. 1 evaluate the pathway reconstruction methods across these different thresholds.

Evaluating reconstructed NetPath pathways

When comparing reconstructed pathways to the original NetPath pathways in section “Quality of NetPath pathway reconstruction,” we used MCC to quantify the reconstruction quality 49 . MCC is a metric used in binary classification that ranges between −1 and 1, where 1 indicates a perfect binary classification and −1 indicates a completely incorrect classification. It can be viewed as the correlation between the predicted and true labels in a classification task. MCC has been shown to be well suited to evaluate classification in imbalanced settings 50 . In order to treat comparing a reconstructed and a NetPath pathway as a classification task, we consider all edges in a NetPath pathway as the positive set and all other edges as the negative net. MCC is then defined as follows:

where TP is the number of true positives (edges that appear in both the NetPath and reconstructed pathway), FP is the number of false positives (edges that do not appear in the NetPath pathway but do in the reconstructed pathway), TN is the number of true negatives (edges that do not appear in either the NetPath pathway or the reconstructed pathway), and FN is the number of false negatives (edges that appear in the NetPath pathway but not in the reconstructed pathway). When comparing MCC values of multiple pathways and reconstruction methods, we normalized MCC values by the best possible MCC among all tested parameter values for that pathway and method. We refer to this value as the adjusted MCC.

Data sets

For both PathLinker and NetBox, we used the interactome included as a part of their software packages. For PCSF and min-cost flow, we used an interactome from (Köksal et al., 2018) 3 that merged protein interactions from the iRefIndex database v13 51 and kinase–substrate interactions from PhosphoSitePlus 52 . The interactions from the iRefIndex database include confidence scores, while confidence scores for kinase–substrate interactions were inferred from the number of interactions for each kinase–substrate pair and the type of experiment that detected the interaction. If an interaction was included in both databases, the PhosphoSitePlus interaction was used. This resulted in a network with 161,901 weighted edges.

All parameter tuning was performed with Reactome as the set of reference pathways. Reactome 36 is a database of manually curated pathways, including 2287 human pathways. Reactome is open-source, where all contributions must provide literature evidence and are reviewed by an external domain expert before being added. Pathways smaller than 15 nodes were excluded, as they were too small for meaningful interpretation. Reactome pathways were retrieved using Pathway Commons 53 . Pathway Commons converted from the Reactome data model to a binary interaction model using a set of rules for each interaction type (https://www.pathwaycommons.org/pc/sif_interaction_rules.do).

The implausible pathway detection and NetPath reconstruction experiments were performed on pathways from the NetPath database. NetPath is a collection of 36 manually curated human signal transduction pathways 37 . We used 15 NetPath pathways that contain at least 1 receptor and transcriptional regulator and are sufficiently connected, as described by (Ritz et al., 2016) 9 . We designated three of these NetPath pathways as validation pathways: Wnt, TGF beta, and TNF alpha. Validation pathways were used to guide the choice of distance measure. The remaining 12 pathways were reserved as test pathways for quantitative evaluations. We sampled the NetPath pathways in different ways for each pathway reconstruction algorithm to provide inputs in their expected formats, generally following the node sampling protocol that PathLinker 9 used to reconstruct NetPath pathways. PCSF and NetBox do not require sources and targets, so we randomly sampled 30% of the pathway nodes as input. We also assigned each input a random prize sampled uniformly between 0 and 5 for PCSF. For PathLinker and min-cost flow, which require sources and targets, we selected all transcription factors and receptors for each pathway as outlined by (Ritz et al., 2016) 9 .

Influenza host factors were gathered from a meta-analysis of eight genome-wide and targeted RNAi screens 25 . These screens used RNAi to measure the effect of each screened gene’s knockdown on influenza infection. For instance, (Watanabe et al., 2014). 54 assessed influenza viral replication 48 h after infection. They identified host genes whose knockdown substantially impacted viral titers relative to a negative control (Tripathi et al., 2015). 25 merged the hits from the eight RNAi screens to obtain 1257 pro-viral host genes and separately calculated a consolidated Z-score from the normalized activities scores in the four genome-wide screens. We selected these 1257 host factors as inputs for pathway reconstruction and set the PCSF node prize to the absolute value of the consolidated Z-score.

GO 55 and KEGG pathway 35 enrichment was carried out with DAVID v6.7 26 . Enrichment was performed using GO biological process terms and all KEGG pathways. Thresholds for term inclusion were set to a count of 2 and an EASE score of 0.1.

Pathway parameter advising implementation

A Python implementation of pathway parameter advising is available at https://github.com/gitter-lab/pathway-parameter-advising under the MIT license. While v0.1.0 and later versions of the pathway parameter advising software support Python v3.6, the results here used Python v2.7.16 and Anaconda v2019.03. The following package versions were used: pandas v0.24.2, networkx v2.2, numpy v1.16.2, matplotlib v2.2.3, and seaborn v0.9.0. The Parallel Graphlet Decomposition library was pulled from GitHub on April 30, 2019.

Preprint

The article was previously published as a preprint 56 .

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.


Abstract

Small molecules are important not only as therapeutics to treat disease but also as chemical tools to probe complex biological processes. The discovery of novel bioactive small molecules has largely been catalyzed by screening diverse chemical libraries for alterations in specific activities in pure proteins assays or in generating cell-based phenotypes. New approaches are needed to close the vast gap between the ability to study either single proteins or whole cellular processes. This Review focuses on the growing number of studies aimed at understanding in more detail how small molecules perturb particular signaling pathways and larger networks to yield distinct cellular phenotypes. This type of pathway-level analysis and phenotypic profiling provides valuable insight into mechanistic action of small molecules and can reveal off-target effects and improve our understanding of how proteins within a pathway regulate signaling.


CONCLUSION

The WikiPathways project is thriving. The FAIR and open science approach and the extensive community support continues to trigger growth of the project and the database content. In the following years, our growth will be supported by recently renewed funding. The updates presented here, for biologist, chemists, authors, curators, and data scientists, demonstrate the success of our approach and open up new ways in which biological complexity can be represented and reused by others. Examples of such complexity include post-translational modifications affecting protein activity and the temporal dynamics of processes. Our recent efforts around the content and curation of metabolic pathways show a high potential for adoption by the metabolism and metabolomics communities and those applying these technologies. In fact, the demonstrated continued growth in content and features since the 2016 update shows we are getting closer to reaching our commitment to capture every pathway of interest and share them in as many useful ways as possible.


Contents

The primary results of the pathway are:

  • The generation of reducing equivalents, in the form of NADPH, used in reductive biosynthesis reactions within cells (e.g. fatty acid synthesis).
  • Production of ribose 5-phosphate (R5P), used in the synthesis of nucleotides and nucleic acids.
  • Production of erythrose 4-phosphate (E4P) used in the synthesis of aromatic amino acids.

Aromatic amino acids, in turn, are precursors for many biosynthetic pathways, including the lignin in wood. [ citation needed ]

Dietary pentose sugars derived from the digestion of nucleic acids may be metabolized through the pentose phosphate pathway, and the carbon skeletons of dietary carbohydrates may be converted into glycolytic/gluconeogenic intermediates.

In mammals, the PPP occurs exclusively in the cytoplasm. In humans, it is found to be most active in the liver, mammary glands, and adrenal cortex. [ citation needed ] The PPP is one of the three main ways the body creates molecules with reducing power, accounting for approximately 60% of NADPH production in humans. [ citation needed ]

One of the uses of NADPH in the cell is to prevent oxidative stress. It reduces glutathione via glutathione reductase, which converts reactive H2O2 into H2O by glutathione peroxidase. If absent, the H2O2 would be converted to hydroxyl free radicals by Fenton chemistry, which can attack the cell. Erythrocytes, for example, generate a large amount of NADPH through the pentose phosphate pathway to use in the reduction of glutathione.

Hydrogen peroxide is also generated for phagocytes in a process often referred to as a respiratory burst. [5]

Oxidative phase Edit

In this phase, two molecules of NADP + are reduced to NADPH, utilizing the energy from the conversion of glucose-6-phosphate into ribulose 5-phosphate.

The entire set of reactions can be summarized as follows:

Reactants Products Enzyme Description
Glucose 6-phosphate + NADP+ → 6-phosphoglucono-δ-lactone + NADPH glucose 6-phosphate dehydrogenase Dehydrogenation. The hydroxyl on carbon 1 of glucose 6-phosphate turns into a carbonyl, generating a lactone, and, in the process, NADPH is generated.
6-phosphoglucono-δ-lactone + H2O → 6-phosphogluconate + H + 6-phosphogluconolactonase Hydrolysis
6-phosphogluconate + NADP + → ribulose 5-phosphate + NADPH + CO2 6-phosphogluconate dehydrogenase Oxidative decarboxylation. NADP + is the electron acceptor, generating another molecule of NADPH, a CO2, and ribulose 5-phosphate.

The overall reaction for this process is:

Glucose 6-phosphate + 2 NADP + + H2O → ribulose 5-phosphate + 2 NADPH + 2 H + + CO2

Non-oxidative phase Edit

Net reaction: 3 ribulose-5-phosphate → 1 ribose-5-phosphate + 2 xylulose-5-phosphate → 2 fructose-6-phosphate + glyceraldehyde-3-phosphate

Regulation Edit

Glucose-6-phosphate dehydrogenase is the rate-controlling enzyme of this pathway. It is allosterically stimulated by NADP + and strongly inhibited by NADPH. [6] The ratio of NADPH:NADP + is normally about 100:1 in liver cytosol [ citation needed ] . This makes the cytosol a highly-reducing environment. An NADPH-utilizing pathway forms NADP + , which stimulates Glucose-6-phosphate dehydrogenase to produce more NADPH. This step is also inhibited by acetyl CoA. [ citation needed ]

G6PD activity is also post-translationally regulated by cytoplasmic deacetylase SIRT2. SIRT2-mediated deacetylation and activation of G6PD stimulates oxidative branch of PPP to supply cytosolic NADPH to counteract oxidative damage or support de novo lipogenesis. [7] [8]

Several deficiencies in the level of activity (not function) of glucose-6-phosphate dehydrogenase have been observed to be associated with resistance to the malarial parasite Plasmodium falciparum among individuals of Mediterranean and African descent. The basis for this resistance may be a weakening of the red cell membrane (the erythrocyte is the host cell for the parasite) such that it cannot sustain the parasitic life cycle long enough for productive growth. [9]


When Does Apoptosis Occur?

Apoptosis occurs when a cell’s existence is no longer useful to the organism. This can occur for a few reasons.

If a cell has become badly stressed or damaged, it may commit apoptosis to prevent itself from becoming dangerous to the organism as a whole. Cells with DNA damage, for example, may become cancerous, so it is better for them to commit apoptosis before that can happen.

Other cellular stresses, such as oxygen deprivation, can also cause a cell to “decide” that it is dangerous or costly to the host. Cells that can’t function properly may initiate apoptosis, just like cells that have experienced DNA damage.

In a third scenario, cells may commit apoptosis because the organism doesn’t need them anymore due to its natural development.

One famous example is that of the tadpole, whose gill, fin, and tail cells commit apoptosis as the tadpole metamorphoses into a frog. These structures are needed when the tadpole lives in water – but become costly and harmful when it moves onto dry land.

1. Which of the following would you NOT expect to trigger apoptosis?
A. Damage to a cell’s DNA
B. Long-term oxygen deprivation
C. An organism moving to a new stage of its life cycle, rendering some cells obsolete
D. None of the above

2. Which of the following might occur if a mutation made apoptosis impossible?
A. The nervous system might not develop properly
B. Cancer might become much more likely
C. An insect might not be able to undergo metamorphoses
D. All of the above

3. What is the difference between the extrinsic and intrinsic pathways of apoptosis?
A. The extrinsic pathway is triggered by a signal from outside the cell, while the intrinsic pathway is triggered by events inside the cell.
B. The extrinsic pathway has more steps because the signal must be relayed from the cell membrane.
C. The extrinsic pathway activates BAK and BAX, while the intrinsic pathway does not.
D. A and B