Information

Which server to use for volume and accessible surface area calculation of proteins

Which server to use for volume and accessible surface area calculation of proteins


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I want to find the accessible surface area and the volume of a protein, giving PDB file as the input for the protein. I used two servers, 3vee and Vadar. For the PDB id, 1LTM, the websites are giving me different volume and surface area measures as follows:
3vee (vol: 50951 A^3; Surface area: 10837 A^2)
Vadar(vol: 40937.8 A^3; Surface area: 14680.6 A^2)
Can anyone tell me which server to use, which server is more accurate? If you know of any other server giving accurate result, please let me know that too.
Note: Both the publications are from Nucleic Acids Research, I guess I do not need to validate them. 3vee has been published in 2010 with citation of 164 and Vadar was published in 2003 with citations of 416. Is it okay if I take 3vee since it is the latest tool?


You can use PDBSum and PDBe-PISA servers for calculating accessible surface area and buried surface area and Castp server for finding pockets within your protein molecule. If your protein is globular, you can measure the diameter in pymol and calculate its volume. Hope it helps.


The results depend on the atomic radii and on the probe radii. Without them, it is impossible to have an opinion about the volumes and surfaces values.

Using C=1.75, N=1.55, O=1.4, S=1.8, and adding 1.4 for the probe, running the Monte-Carlo method interactively available in ASV outputs for 10000000 random points (only ATOM records were input): V = 57335.548996 +/- 1.96*34.439 ; S = 14887.818374 +/- 1.96*20.345

You can download ASV from: http://petitjeanmichel.free.fr/itoweb.petitjean.freeware.html ; No installation procedure is required, just flag the binary as executable with the chmod command. Monte-Carlo f77 source routines are supplied too.

References:
M. Petitjean, J. Comput. Chem. 1994, 15[5], 507-523;
M. Petitjean, Spheres Unions and Intersections and Some of their Applications in Molecular Modeling. In: Distance Geometry: Theory, Methods, and Applications, chap.4, pp.61-83, Springer, 2013.


Which server to use for volume and accessible surface area calculation of proteins - Biology

Area/Volume from Web uses the StrucTools server to calculate atomic surface areas and Voronoi volumes from molecular coordinates. The calculated values are assigned as atom attributes. See also: Measure Volume and Area, CASTp Data, surface, measure buriedArea

Note that when a molecular surface is generated in Chimera, the total analytical solvent-accessible and solvent-excluded areas are automatically reported in the Reply Log, and the values per atom and residue are assigned as attributes named areaSAS and areaSES. These values correspond directly to the molecular surface in Chimera and depend on the same parameters (VDW radii and various molecular surface settings). By contrast, only the settings in the Area/Volume from Web dialog are sent to the StrucTools server, and other parameters such as VDW radii are determined solely by the process underlying the server.

There are several ways to start Area/Volume from Web, a tool in the Surface/Binding Analysis category. In the resulting dialog, one or more molecule models can be chosen from the Molecules list, or chains from the Chains list.

    Accessible Surface (Gerstein) [details at StrucTools]

  • Surface probe size (1.3/1.4/1.5/1.6)
  • Atoms to use (No Hetatms/Atoms + Hetatms/All atoms except waters)
    *hydrogens are ignored
  • Surface probe size (1.3/1.4/1.5/1.6)
  • Atoms to use (No Hetatms/Atoms + Hetatms/All atoms except waters)
  • Method (Normal Voronoi/Method B/Radical Plane/Modified Method B)
  • Radii (Chothia Radii/Richards Radii)
  • Atoms to use (No Hetatms/Atoms + Hetatms/All atoms except waters)
    *hydrogens are ignored
  • Open Render/Select by Attribute - whether to start Render/Select by Attribute, which shows a histogram of attribute values and allows mapping them to color or using them as selection criteria. The atomic values and totals per residue will be listed automatically in the dialog as atom and residue attributes, respectively. Even if this option is not used, the attribute(s) will still be available in Render/Select by Attribute if it is started later in the session.
  • Save server output to file - whether to save the HTML output from the server as a file
  • Show server output in browser - whether to show the HTML output in a browser window

The transformed coordinates of atoms in the chosen model(s) or chain(s) are sent to the StrucTools server, where they are handled collectively. For example, if proteins in two different models are positioned to form a complex, the results obtained for each protein when both models are chosen will differ from the results obtained for each protein separately. The coordinates will include modifications made in Chimera, such as bond rotations. However, other inputs such as atomic and probe radii are determined only by the server and the settings in the Area/Volume from Web dialog. The atomic radii and molecular surface settings in Chimera are not used.

For each attribute, the atoms are assigned the values calculated by the server (surface area in Å 2 , volume in Å 3 ). The total area or volume and the number of atoms assigned values are reported in the Reply Log and status line. When the output contains data for fewer atoms than were sent to the server, a warning message will report the number of &ldquomissing&rdquo values. This is quite common and rarely a cause for concern for example, if the No Hetatms option is chosen, there will be no output for such atoms. Atoms missing from the server output will not be assigned attribute values, and atoms with uncalculatable Voronoi volumes will not be assigned a volume value. Attribute values can be summed over sets of atoms with the Attribute Calculator.


Findings

Background

Protein folding is a process by which a polypeptide transitions from an unfolded state to a native state. While native states are well studied, unfolded states are more difficult to characterize. The hydrophobic effect is the driving force in protein folding wherein hydrophobic groups move away from water into a solvent-shielded hydrophobic core. When folded, solvent accessible surface area (ASA) is lost between the native (folded) and unfolded state. While we can readily compute the ASA for the native state (for example using the algorithm of Lee and Richards [1] or equivalent ones in programs like Chimera [2]), the calculation for ASA for the unfolded protein is more difficult. Several papers have been published claiming to have the best models for calculating the ASA of the unfolded protein or have compared such models or adapted proposed models [3–10]. We want to consider these methods and determine one that is most appropriate for use in a new database of proteins (ACPro, the Amherst College Protein Folding Kinetics Database, available at: https://www.ats.amherst.edu/protein/) organized based on those with folding and unfolding information. Briefly, we explore the literature in which these ASA calculation methods for unfolded states are proposed.

Robertson and Murphy published a review that focused on the relationship between protein stability and structure that was established with the thermodynamic parameters derived from calorimetric and spectroscopic studies and the structural models derived from X-ray crystallography and NMR spectroscopy [11]. As part of their analysis, accessible surface area changes between native and unfolded state for a set of proteins are examined, where the unfolded ASA is based on an Ala-Xaa-Ala extended tripeptide for each amino acid type, where Xaa is a placeholder for that amino acid. Corrections are made for termini effects. The Ala-Xaa-Ala tripeptide method is one of the simplest methods of determining an estimate for unfolded ASA, and was originally proposed by Zielenkiewisz and Saenger [3].

There are many more methods for determining unfolded estimates of ASA. We examine a few methods that are computationally fast and easy to understand because we want to select one for use in a database that will be available to the public. Creamer, Srinivasan, and Rose propose alternatives to the tripeptide model [5, 6]. As alternatives, they propose two models that bracket the expected behavior of an unfolded protein. The first model provides upper bound values on the unfolded ASA. These upper bound values are based on simulated flexible peptides modeled from hard-sphere approximation for ASA and chain dimensions. The use of the hard-sphere approximation results in expanded peptides that explore available conformational space freely, as compared to actual peptides, which experience intramolecular attractive forces, leading to further chain collapse. Consequently, these simulated peptides exclude volume effects and are more expanded than actual unfolded peptides. The second model is for lower bound values of ASA. The lower bound values are modeled from protein fragments excised from fully folded structures. Due to being determined from fragments excised from folded proteins, ASA values in this model will provide a lower bound for unfolded ASA. The conformational behavior of unfolded peptides, thus, lies between the two limits. In their analysis, by comparing the upper and lower bounds to tripeptide models, Creamer et al. argue that the tripeptide models overestimate the area loss [5]. For example, they show that the alanine side chain in the center of an 11-residue, unfolded polyalanyl peptide loses little to no area upon helix formation and a valine side chain gains area in the helix, on average. A tripeptide model would conclude that both alanine and valine side chains lose surface area with helix formation. In 1997, Creamer et al. adjusted the upper bound model when extending it to the backbone case, using the approach from Spolar et al. [4], and stating that this yielded similar values to their previous approach and is less computationally intensive [6]. To compromise between the upper and lower bound models proposed by Creamer et al., other researchers used the average of the two bounds, effectively providing a third model for unfolded ASA [7, 8].

More recently, Gong and Rose proposed a new method to calculate solvent-dependent ASAs of amino acid residues in unfolded proteins [10], which they contrast primarily with that of Creamer et al. [6]. They argue that the method of averaging the ASA residues of the unfolded states between the upper limit and lower limit is unsatisfying because it lacks a rigorous physical basis. Gong and Rose’s own method, on the other hand, is physically based to calculate backbone and side-chain residue surface areas by using data from peptides generated by varying the possible dihedral angles to coincide with allowed regions of conformational space. They use intramolecular hydrogen bond strengths to model solvent-dependent effects by a Boltzmann-weighted distribution of solvent quality through a “hydrogen-bond dial”. When plotted as a function of hydrogen bond strength, the Boltzmann-weighed distribution of conformers describes a sigmoidal curve, with a transition midpoint near -1.5 kcal/mol per hydrogen bond. For the backbone, these midpoint ASA values are similar to Creamer’s upper bounds and in some cases, even exceed the upper bounds set in [6]. The authors argue that this is due to increased flexibility in this new model. Gong and Rose do admit that their model is imperfect because the hydrogen bond dial does not use all possible energetic terms [10]. Due to the “dial”, Gong and Rose provide ASA values when the “dial” is “off” and at the transition midpoint. For terminology, we call the “off” values the upper bounds and we call the transition midpoint values the lower bounds for this method, to be analogous to [6]. Averaging the values at the two bounds yields an average value for Gong and Rose’s proposal.

Finally we examine a more computationally intensive method, ProtSA, proposed by Bernado et al. [9]. The method is made available by a web application [12] (available at: http://webapps.bifi.es/protsa/#Xbernado:2006) and calculates sequence specific protein solvent accessibilities in the unfolded ensemble by simulating the unfolded protein many times and combining the results. In the simulations, the structural model to describe the unfolded conformations representative of the unfolded protein is generated by the Flexible-Meccano algorithm. The analytical software ALPHASURF is applied to calculate atom solvent accessibilities. The researchers report the average ASA for each amino acid over many examples (and simulations) in [9], but the web application allows for non-static values to be generated as well [12].

While this list of methods is not complete (the reader is directed to [9] for a more complete review), we believe it is a representative sample of methods to compare. In this note, we use statistical analysis to compare the ASA values generated by these methods to find significant differences between the methods, if present. We compare the tripeptide method (Ala-Xaa-Ala), Creamer et al. upper bound, lower bound, and average methods, Gong and Rose average and lower bound (transition midpoint) methods, ProtSA static (based on average values) and web server values. For details on computations, please see the Methods section. We also compare the resulting changes in solvent accessible surface area and their relationships with established variables in the literature from [11].

Unfolded ASA results

To demonstrate the differences between the seven unfolded ASA methods (not including the tripeptide model), we examine the values they assign to individual amino acids in Table 1. It is fairly evident that the individual amino acid values vary a great deal between methods, but we do not know if that variety results in significantly different total unfolded ASA values for proteins. To attain a total unfolded ASA value for each protein, as described in detail in Methods, we assign the values from Table 1 to the corresponding amino acids in each protein or we attain the values from the ProtSA web server, depending on the method, and sum them (after accounting for termini effects).

Next we examine a series of boxplots showing the total unfolded ASA values across a set of 51 proteins (Figure 1) chosen to align with the data set of [11]. This data set is moderate in size, and naturally, we would like as much data as possible to aid in our selection of an unfolded ASA calculation method. While the data set size may impose some limitations on conclusions, we have not been able to find a larger set with the necessary information in order to expand our analysis. Our analysis is still able to demonstrate method differences and assist us in a decision about method selection for our database.

Boxplots of unfolded solvent accessible surface area. Unfolded ASA values are provided for comparison across methods for the set of 51 proteins.

We note several key characteristics in Figure 1. First, there are a few outliers which are the same proteins under each method (protein databank files (PDBs): 1ABE, 2CAB, 5PEP, 3PSG, 3SIC, and 2ST1). Next, the ProtSA static and ProtSA (web server) distributions look to be very similar, but the ProtSA static values are shifted up a bit relative to the web server values. We do see evidence to confirm what was stated in [10], that the lower bound (transition midpoint) method of [10] results in similar values to those obtained from the Creamer et al. upper bound method from [6]. The lower bound Creamer et al. method seems to give values most similar to those from ProtSA (static and web server).

Next, we examine boxplots of change in ASA across the eight methods (includes previous seven methods and tripeptide values from [11]) as shown in Figure 2. Please see Methods for computational details. The tripeptide (Ala-Xaa-Ala) values do appear to be a little higher than those of upper limit Creamer et al. method (as proposed in [5]), but not by much. This leads to a natural question. Are the differences observed in the boxplots significant? Hence we turn to our statistical analysis.

Boxplots of change in solvent accessible surface area. Changes in ASA values are provided for comparison across methods including tripeptide results for the subset of 44 proteins.

Change in ASA results

We computed change in ASA values from the unfolded to folded state after acquiring folded ASA estimates using Chimera [2]. For details on computations, please see Methods. Paired t-tests to look for differences in mean change in ASA values were performed to address whether or not the differences observed in the boxplots are significant (similar results are obtained if such an analysis is performed on just the mean unfolded ASA values due to the only difference in the values being a distinct constant shift for each protein) with adjustments on determining significance due to multiple testing. This analysis and all subsequent analyses are performed on the subset of 44 proteins where the protein size matched the number of residues value from the review work of [11] so that comparable values were being compared. The paired t-tests indicated that only four contrasts (pairs of methods) resulted in an insignificant result. The insignificant contrasts were between the tripeptide and average Gong/Rose methods, the tripeptide and lower Gong/Rose methods, the tripeptide and upper bound Creamer et al. methods, and the lower Gong/Rose and upper bound Creamer et al. methods. All other pairs of methods resulted in statistically significantly different mean change in ASA values for the proteins examined. The ProtSA static change in ASA values were about 225 units above the ProtSA web server change in ASA values per protein, on average, so this was a significant difference despite their similarity in the boxplots. As many of the methods result in significantly different changes in ASA per protein, we need to determine what method we want to use in the database, ACPro. Next we consider which method (if any) is “best” relative to the performance of the tripeptide method in the relationships with changes in ASA examined in [11], as the newer methods have a stronger physical basis.

To set a baseline threshold of performance, we compute the R-squared value (from a simple linear regression) between the tripeptide change in ASA value on the 44 protein subset and each variable examined in a relationship with change in ASA in [11]: number of residues (Nres), heat capacity change upon unfolding (ΔCp), enthalpy of unfolding at 60 degrees C (ΔH(60)) and at 100 degrees C (ΔH*), and entropy of unfolding at 60 degrees C (ΔS(60)) and at 112 degrees C (ΔS*). Then, we compute R-squared values from regressions using the other method’s change in ASA values and the same variables. The resulting R-squared values are provided in Table 2.

Based on the results in Table 2, we note that between methods where the change in ASA is identified as being in the best three predictors for each response variable, the differences in R-squared values (which are equivalent to slight differences in correlations), are not large enough to be statistically significant. Indeed, across most of the methods (not even restricting ourselves to the three strongest relationships), this would be the case. The method that most closely matches the performance of the tripeptide method in terms of these relationships is the lower bound (transition midpoint) Gong/Rose method.

Based on our results, the freely available ACPro database containing protein folding kinetics information makes use of the lower bound (transition midpoint) Gong/Rose method for computing unfolded ASA for the proteins reported. The rationale is as follows: the method has a strong physical basis as provided in [10], is not computationally intensive, and termini effects are easily dealt with. While this method yields significantly different estimates of ASA than some of the other methods, it does not suffer in terms of its performance in key relationships with thermodynamic variables previously studied.


CASTp

The CASTp (Computed Atlas of Surface Topography of proteins) web server is an online tool that locates, measure and characterizes the pockets on the protein surfaces and the voids in the interior of proteins. These surface pockets and voids are the concave regions of proteins that are usually correlated with binding activities. CASTp uses the alpha shape and the pocket algorithm developed in computational geometry (Jie Liang et al., 2003) to delineate and measure the surface pockets and voids in proteins. In CASTp, the surface pockets are explained as concave regions of proteins with binding sites at the opening. These pockets also allow easy access of water molecules from exterior. The voids are described as hidden vacant spaces in the interior of proteins, which are inaccessible to water molecules after removing all hetero atoms from exterior.

Figure 1: The binding pocket (green) of HIV-1 protease (1hte). The ligand Gr12397(yellow) occupies the binding site.

The CASTp online tool analyzes all surface pockets and interior voids on three dimensional structure of a protein and also it gives a detailed characterization of all the atoms associated in the formation of these voids and pockets. CASTp uses both the solvent accessible surface model (Richards' surface) and molecular surface model (Connolly's surface) to analytically measure the area and volume of each void and pocket. Solvent accessible surface area otherwise known as Richards’ molecular surface is the surface area of a biomolecule that is accessible to a solvent. Besides this, CASTp also measures the size of each pockets and mouth openings. This enables to determine the accessibility of binding sites to different ligands and substrates. The computation or surface analysis of proteins using CASTp has a number of advantages in biological studies.

The annotated functional information of proteins is also included in the new version of CASTp. These annotations are derived from the Protein Data Bank (PDB), Swiss-Prot, as well as Online Mendelian Inheritance in Man (OMIM), and the latter contains information on the variant single nucleotide polymorphisms (SNPs) that are known to cause disease.

This experiment uses the CASTp online resource, available throughhttp://cast.engr.uic.edu

Joe Dundas, Zheng Ouyang, Jeffery Tseng, Andrew Binkowski, Yaron Turpaz, and Jie Liang. 2006. CASTp: computed atas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucl. Acids Res., 34:W116-W118.


Abstract

It is of great interest in modern drug design to accurately calculate the free energies of protein–ligand or nucleic acid–ligand binding. MM-PBSA (molecular mechanics Poisson–Boltzmann surface area) and MM-GBSA (molecular mechanics generalized Born surface area) have gained popularity in this field. For both methods, the conformational entropy, which is usually calculated through normal-mode analysis (NMA), is needed to calculate the absolute binding free energies. Unfortunately, NMA is computationally demanding and becomes a bottleneck of the MM-PB/GBSA-NMA methods. In this work, we have developed a fast approach to estimate the conformational entropy based upon solvent accessible surface area calculations. In our approach, the conformational entropy of a molecule, S, can be obtained by summing up the contributions of all atoms, no matter they are buried or exposed. Each atom has two types of surface areas, solvent accessible surface area (SAS) and buried SAS (BSAS). The two types of surface areas are weighted to estimate the contribution of an atom to S. Atoms having the same atom type share the same weight and a general parameter k is applied to balance the contributions of the two types of surface areas. This entropy model was parametrized using a large set of small molecules for which their conformational entropies were calculated at the B3LYP/6-31G* level taking the solvent effect into account. The weighted solvent accessible surface area (WSAS) model was extensively evaluated in three tests. For convenience, TS values, the product of temperature T and conformational entropy S, were calculated in those tests. T was always set to 298.15 K through the text. First of all, good correlations were achieved between WSAS TS and NMA TS for 44 protein or nucleic acid systems sampled with molecular dynamics simulations (10 snapshots were collected for postentropy calculations): the mean correlation coefficient squares (R 2 ) was 0.56. As to the 20 complexes, the TS changes upon binding TΔS values were also calculated, and the mean R 2 was 0.67 between NMA and WSAS. In the second test, TS values were calculated for 12 proteins decoy sets (each set has 31 conformations) generated by the Rosetta software package. Again, good correlations were achieved for all decoy sets: the mean, maximum, and minimum of R 2 were 0.73, 0.89, and 0.55, respectively. Finally, binding free energies were calculated for 6 protein systems (the numbers of inhibitors range from 4 to 18) using four scoring functions. Compared to the measured binding free energies, the mean R 2 of the six protein systems were 0.51, 0.47, 0.40, and 0.43 for MM-GBSA-WSAS, MM-GBSA-NMA, MM-PBSA-WSAS, and MM-PBSA-NMA, respectively. The mean rms errors of prediction were 1.19, 1.24, 1.41, 1.29 kcal/mol for the four scoring functions, correspondingly. Therefore, the two scoring functions employing WSAS achieved a comparable prediction performance to that of the scoring functions using NMA. It should be emphasized that no minimization was performed prior to the WSAS calculation in the last test. Although WSAS is not as rigorous as physical models such as quasi-harmonic analysis and thermodynamic integration (TI), it is computationally very efficient as only surface area calculation is involved and no structural minimization is required. Moreover, WSAS has achieved a comparable performance to normal-mode analysis. We expect that this model could find its applications in the fields like high throughput screening (HTS), molecular docking, and rational protein design. In those fields, efficiency is crucial since there are a large number of compounds, docking poses, or protein models to be evaluated. A list of acronyms and abbreviations used in this work is provided for quick reference.


Other files and links

  • APA
  • Standard
  • Harvard
  • Vancouver
  • Author
  • BIBTEX
  • RIS

In: Bioinformatics , Vol. 23, No. 24, 12.2007, p. 3397-3399.

Research output : Contribution to journal › Article › peer-review

T2 - A web server for predicting interacting sites on protein surfaces

N1 - Funding Information: This project is supported by NIH grants R21 AI055746 and R01 AI064913 to W.B. and NIH grant (5UO1-AI053858-03 Johnny Peterson, PI) to C.H.S.

N2 - A new web server, InterProSurf, predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex. The prediction method is based on solvent accessible surface area of residues in the isolated subunits, a propensity scale for interface residues and a clustering algorithm to identify surface regions with residues of high interface propensities. Here we illustrate the application of InterProSurf to determine which areas of Bacillus anthracis toxins and measles virus hemagglutinin protein interact with their respective cell surface receptors. The computationally predicted regions overlap with those regions previously identified as interface regions by sequence analysis and mutagenesis experiments.

AB - A new web server, InterProSurf, predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex. The prediction method is based on solvent accessible surface area of residues in the isolated subunits, a propensity scale for interface residues and a clustering algorithm to identify surface regions with residues of high interface propensities. Here we illustrate the application of InterProSurf to determine which areas of Bacillus anthracis toxins and measles virus hemagglutinin protein interact with their respective cell surface receptors. The computationally predicted regions overlap with those regions previously identified as interface regions by sequence analysis and mutagenesis experiments.


Which server to use for volume and accessible surface area calculation of proteins - Biology

Method

The POPScomp server invokes the POPS program to compute the Solvent Accessible Surface Area (SASA) of a given PDB structure. For protein or RNA/DNA complexes, the POPScomp server creates internally all pair combinations of chains to compute the buried SASA upon complexation. Details of those functionalities are explained in the published papers on implicit solvent, POPS and POPSCOMP see 'About' tab for the list of publications.

SASA tables are initialised without any values therefore, before 'run POPScomp' execution, the user sees only the table header and below the notice 'Showing 0 to 0 of 0 entries'. After selecting a PDB file and pressing 'run POPScomp', the server runs the POPS program on the selected PDB file. The output is SASA tables, which are automatically loaded into the respective tabs. The success of the computation is returned as exit code and shown below the 'run POPScomp' button: 'Exit code: 0' means success and that is what you should expect to see, otherwise consult the 'Exit Codes' tab.

Results

The SASA result tabs are 'Atom', 'Residue', 'Chain' and 'Molecule'. Those tabs contain a second layer of tabs to accommodate the POPSCOMP functionality, as follows. 'Input Structure': SASA values of the PDB structure as input. 'DeltaSASA': The SASA difference between isolated chains and chain pair complexes. 'Isolated Chains': SASA values of isolated chains. Only structures containing multiple chains will yield values for 'DeltaSASA' and 'Isolated Chains' tabs.

Results will be stored on the server until midnight GMT time and then automatically removed. Please use the 'Download . ' buttons under the tables to save your results in 'csv' format. The 'Download All Results' button on the side panel returns the zipped content of the entire output directory, i.e. all results produced for a given POPScomp job.

In case the server does not work as expected or server-related issues need clarification, please email the maintainers: Jens Kleinjung ([email protected]) and Franca Fraternali ([email protected]). For software and output errors, feature suggestions and similar topics, please add an entry to the Issues tab on the POPScomp GitHub page .

Shiny App

This is version 3.1.7 of the POPScomp Shiny App.

For detailed information about the software visit Fraternali Lab's POPScomp GitHub repository the Wiki pages contain detailed installation and usage instructions.

References

Users publishing results obtained with the program and its applications should acknowledge its use by citation.

Implicit solvent

Fraternali, F. and van Gunsteren, W.F. An efficient mean solvation force model for use in molecular dynamics simulations of proteins in aqueous solution. Journal of Molecular Biology 256 (1996) 939-948. DOI Pubmed

Kleinjung, J. and Fraternali, F. Design and Application of Implicit Solvent Models in Biomolecular Simulations. Current Opinion in Structural Biology 25 (2014) 126-134. DOI Pubmed

POPS method

Fraternali, F. and Cavallo, L. Parameter optimized surfaces (POPS): analysis of key interactions and conformational changes in the ribosome. Nucleic Acids Research 30 (2002) 2950-2960. DOI Pubmed

POPS server

Cavallo, L., Kleinjung, J. and Fraternali, F. POPS: A fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Research 31 (2003) 3364-3366. DOI Pubmed

POPSCOMP server

Kleinjung, J. and Fraternali, F. POPSCOMP: an automated interaction analysis of biomolecular complexes. Nucleic Acids Research 33 (2005) W342-W346. DOI Pubmed

License and Copyright

Usage of the software and server is free under the GNU General Public License v3.0.

Copyright Holders, Authors and Maintainers

2002-2020 Franca Fraternali (author, maintainer)

2008-2020 Jens Kleinjung (author, maintainer)

Contributors

2002 Kuang Lin and Valerie Hindie (translation to C)

2002 Luigi Cavallo (parametrisation)

Overview

POPScomp uses a combination of *Shell* (system) calls and R *Shiny* routines. Therefore, the return value shown as exit code may come from *Shell* or *Shiny*. A successful run will return 'Exit code: 0'. Any error will return an exit code different from '0'. A commented list of exit codes is given below together with troubleshooting tips. In case you get stuck, please contact the maintainers.

Shell command exit codes

* 1 - Catchall for general errors

* 2 - Misuse of shell builtins (according to Bash documentation

* 126 - Command invoked cannot execute

* 128 - Invalid argument to exit

* 128+n - Fatal error signal 'n'

* 130 - Script terminated by Control-C

* 255* - Exit status out of range

Shiny exit codes

* No PDB source input! - Enter PDB identifier or upload PDB file from local file system at the top of the side panel.

* Two PDB sources input! - Only one PDB source is accepted per computation. Refresh the browser page and either speficy a PDB identifier or upload a PDB file, not both.

Troubleshooting Errors

Exit code: 1 AND Error: Cannot open the connection

The PDB file could not be read, most possibly because something went wrong during up/down-loading. If you used the 'Enter PDB entry' field, check your internet connection.


Abstract

We propose a pairwise and readily parallelizable SASA-based nonpolar solvation approach for protein simulations, inspired by our previous pairwise GB polar solvation model development. In this work, we developed a novel function to estimate the atomic and molecular SASAs of proteins, which results in comparable accuracy as the LCPO algorithm in reproducing numerical icosahedral-based SASA values. Implemented in Amber software and tested on consumer GPUs, our pwSASA method reasonably reproduces LCPO simulation results, but accelerates MD simulations up to 30 times compared to the LCPO implementation, which is greatly desirable for protein simulations facing sampling challenges. The value of incorporating the nonpolar term in implicit solvent simulations is explored on a peptide fragment containing the hydrophobic core of HP36 and evaluating thermal stability profiles of four small proteins.


PROTEIN TERTIARY STRUCTURE

Sites are offered for calculating and displaying the 3-D structure of oligosaccharides and proteins. With the two protein analysis sites the query protein is compared with existing protein structures as revealed through homology analysis.

Background: "Principles of Protein Structure, Comparative Protein Modelling and Visualization" by N. Guex & M.C. Peitsch (GlaxoWellcome, Switzerland) is here . To obtain PDB coordinates for a protein of your interest, go to the Protein Data Bank or Molecules to Go or NCBI.

PHYRE2 - P rotein Homology/analogY Recognition Engine - this is my favourite site for the prediction of the 3D structure of proteins. In each case I have used this site it has provide me with a model. Phyre2 uses the alignment of hidden Markov models via HHsearch to significantly improve accuracy of alignment and detection rate. It also incorporates a n ew ab initio folding simulation called Poing to model regions of your proteins with no detectable homology. ( Reference: Kelley LA et al. Nature Protocols 10: 845-858 (2015) .

CPHModels (Center for Biological Sequence Analysis, Technical University of Denmark) - currently consists of the following tools: Sowhat: A neural network based method to predict contacts between C-alpha atoms from the amino acid sequence. RedHom: A tool to find a subset with low sequence similarity in a database. Databases: Subsets of the Brookhaven Protein Data Bank (PDB) database with low sequence similarity produced using the RedHom tool.

SWISS-MODEL - (Glaxo-Wellcome Experimental Research, Switzerland) An automated comparative protein modelling server. Choose "First Approach mode". N.B. results come by E-mail and require a viewer such as DeepView - Swiss-PdbViewer, Rasmol, or Cn3D.

ORION - is a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. ( Reference: Ghouzam Y et al. (2016) Scientific Reports 6: 28268).

I-TASSER ONLINE - 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER simulations function inslights are then derived by matching the predicted models with protein function databases. I-TASSER was ranked as the No 1 server for protein structure prediction in recent CASP7 and CASP8 experiments. ( Reference: A. Roy et al. 2010. Nature Protocols 5: 725-738)

ESyPred3D - this automated homology modeling program derives benefit from a new alignment strategy using neural networks. Alignments are obtained by combining, weighting and screening the results of several multiple alignment programs. The final three dimensional structure is built using the modeling package MODELLER. ( Reference: C. Lambert et al. 2002. Bioinformatics 18: 1250-1256).

Robetta - is a protein structure prediction service that is continually evaluated through CAMEO. It features include an interactive submission interface that allows custom sequence alignments for homology modeling, constraints, local fragments, and more. It can model multi-chain complexes and provides the option for large scale sampling. It uses the PDB100 template database, which is updated weekly, a co-evolution based model database (MDB), and also provides the option for custom templates. ( Reference: Kim DE et al. (2004) Nucleic Acids Res 32(Web Server issue): W526-531).

PEP-FOLD 3 is a de novo approach aimed at predicting peptide structures from amino acid sequences. This method, based on structural alphabet SA letters to describe the conformations of four consecutive residues, couples the predicted series of SA letters to a greedy algorithm and a coarse-grained force field. ( Reference: Lamiable A, et al. Nucleic Acids Res. 2016 44(W1): W449-54).

(PS)2: protein structure prediction server predicts the three-dimensional structures of protein complexes based on comparative modeling furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain.( Reference: T-T. Huang et al. 2015. Nucl. Acids Res. 43 (W1): W338-W342).

BetaCavityWeb: computing molecular voids and channels with their mass properties. The output consists of three components: 1) the number of cavities, 2) the atoms contributing to the boundary of each cavity, and 3) the geometric property of each cavity. Computational statistics are also reported. The probe radius of zero corresponds to the cavities existing in the van der Waals molecules. If the probe radius is nonzero, the cavities are thoese existing in the Lee-Richards (solvent accessible) model. ( Reference: J-K. Kim et al. 2015. Nucl. Acids Res. 43 (W1): W413-W418).

AS2TS system - offers a variety of resources for protein structural analysis using the LGA (local-global alignment) program to search for regions of local similarity and to evaluate the level of structural similarity between compared protein structures. To facilitate the homology-based protein structure modeling process, the AL2TS service translates given sequence&ndashstructure alignment data into the standard PDB coordinates ( Reference: A. Zemla et al. 2005. Nucl. Acids Res. 33: W111-W115).

3D-JIGSAW (Biomolecular Modelling Laboratory, Cancer Research UK, England) - homology modelling. Save email results as *.pdb and view with Rasmol etc.

RaptorX - consists of four major modules: single-template threading, alignment quality assessment, and multiple-template threading. ( Reference: Källberg, M. et al. 2012. Nat Protoc. 7(8):1511-1522).

WHAT IF Web Interface (Centre for Molecular and Biomolecular Informatics, University of Nijmegen, Holland) offers one a large number of tools for examining PDB files.

Protein Peeling - an approach for splitting a 3D protein structure into compact fragments - a method to identify small compact units (protein units (PU)) that compose protein three-dimensional structures. ( Reference: J.-C. Gelly et al. 2006. Bioinformatics 22: 129-133)

InterProSurf - predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex. The prediction method is based on solvent accessible surface area of residues in the isolated subunits, a propensity scale for interface residues and a clustering algorithm to identify surface regions with residues of high interface propensities. ( Reference: S.S. Negi et al. 2007. Bioinformatics. 23: 3397-3399)

ProtSkin converts a protein sequence alignment in BLAST, CLUSTAL or MSF format to a property file used to map the sequence conservation onto the structure of a protein using the GRASP, MOLMOL or PyMOL. A pseudo-PDB file with the sequence conservation score in place of the temperature factor is also provided, to use with programs such as InsightII (accelrys). ( Reference: Deprez, C. et al. 2005. J. Mol. Biol. 346: 1047-1057).

PSIPRED Protein Sequence Analysis Workbench - includes PSIPRED v3.3 (Predict Secondary Structure) DISOPRED3 & DISOPRED2 (Disorder Prediction) pGenTHREADER (Profile Based Fold Recognition) MEMSAT3 & MEMSAT-SVM (Membrane Helix Prediction) BioSerf v2.0 (Automated Homology Modelling) DomPred (Protein Domain Prediction) FFPred 3 (Eukaryotic Function Prediction) GenTHREADER (Rapid Fold Recognition) MEMPACK (SVM Prediction of TM Topology and Helix Packing) pDomTHREADER (Fold Domain Recognition) and, DomSerf v2.0 (Automated Domain Modelling by Homology). ( Reference: Buchan DWA et al. 2013. Nucl. Acids Res. 41 (W1): W340-W348).

MODELLER - comparative protein structure modelling by satisfaction od spacial constrains

Structures derived from NMR coordinates:

GeNMR (GEnerate NMR structure) - generates 3D protein structures using NOE-derived distance restraints and NMR chemical shifts. ( Reference: M. Berjanskii et al. 2009. Nucl. Acids Res. 37(Web Server issue):W670-W677)

Once you have a structure you may want to superimpose it on other molecules. To obtain PDB accession codes for a
protein of your interest, go to the Protein Data Bank

FATCAT (Flexible structure AlignmenT by Chaining Aligned fragment pairs allowing Twists) is an approach for flexible protein
structure comparison. It simultaneously addresses the two major goals of flexible structure alignment optimizing the
alignment and minimizing the number of rigid-body movements (twists) around pivot points (hinges) introduced in the reference
structure. ( Reference: Y.Ye & A. Godzik. (2003) Bioinformatics 19: suppl. 2. ii246-ii255). This website provides access to a wide
range of protein tools.

Search for Similar Protein Structures by FATCAT - either upload local PDB files or simply provide PDB codes. For close homologs go here ( Reference: Y. Ye & A. Godzik. 2003. Bioinformatics 19(Suppl 2):II246-II255)

SuperPose - is a protein superposition server. It calculates protein superpositions using a modified quaternion approach. From a superposition of two or more structures, SuperPose generates sequence alignments, structure alignments, PDB coordinates, RMSD statistics, Difference Distance Plots, and interactive images of the superimposed structures. The SuperPose web server supports the submission of either PDB-formatted files or PDB accession numbers. ( Reference: Maiti, R. et al. 2004. Nucleic Acids Res. 32 (Web Server issue: W590-594).

MulPBA (multiple Protein Block Alignment) - is a tool for comparison of protein structures based on similarity in the local backbone conformation. The local backbone conformation is defined as pentapeptide dihedrals ( Reference: Léonard S et al. (2014) J Biomol Struct Dyn. 32(4): 661-668).

3D-Match - Comparing 3D structures of two proteins (Softberry)

iPBA - is a tool for comparison of protein structures based on similarity in the local backbone conformation. It presents an improved alignment approach using (i) specialized PB Substitution Matrices (SM) and (ii) anchor-based alignment methodology. ( Reference: Gelly, J.C. et al. 2011. Nucleic Acids Res. 39(Web Server issue):W18-23).

MAPSCI Multiple Alignment of Protein Structures and Consensus Identification. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed protein. ( Reference: Ilinkin, I. et al. 2010. BMC Bioinformatics. 11:71).

Rclick - this web server that is capable of superimposing RNA 3D structures by using clique matching and 3D least-squares fitting. Rclick has been benchmarked and compared with other popular servers and methods for RNA structural alignments. In most cases, Rclick alignments were better in terms of structure overlap. It also recognizes conformational changes between structures. ( References: Nguyen MN, & Verma C. 2015. Bioinformatics 31:966-968).

The NGL Viewer is a web application for the visualization of macromolecular structures. By fully adopting capabilities of modern web browsers, such as WebGL, for molecular graphics, the viewer can interactively display large molecular complexes and is also unaffected by the retirement of third-party plug-ins like Flash and Java Applets. Beautiful output. ( Reference: A.S. Rose & P.W. Hildebrand. 2015. Nucl. Acids Res. 43 (W1): W576-W579).

CEP: a conformational epitope prediction server - provides a web interface to conformational epitope prediction. The algorithm, apart from predicting conformational epitopes, also predicts antigenic determinants and sequential epitopes. The epitopes are predicted using 3D structure data of protein antigens, which can be visualized graphically. ( Reference: U. Kulkarni-Kale et al. 2005. Nucl. Acids Res. 33: W168-W171). The following is an example of an epitope (space-filled) mapped onto the partial surface (stick) of lysozyme:

PDB2MultiGIF (Central Spectroscopy Department - Molecular Modeling, Deutsches Krebsforschungszentrum, Germany) - if you want to present your model on a webpage in a similar manner to the three pictures on the entry page use this program. It takes the 3D structure (PDB file) and generates an animated image which can be displayed using any browser. There is considerable choice on the image size, number of frames and direction of rotation.

MovieMaker - a web server that allows short (

10 sec), downloadable movies to be generated of protein dynamics. It accepts PDB files or PDB accession numbers as input and automatically outputs colorful animations covering a wide range of protein motions and other dynamic processes. Users have the option of animating 1) simple rotation 2) morphing between two end conformers 3) short-scale, picosecond vibrations 4) ligand docking 5) protein oligomerization 6) mid-scale nanosecond (ensemble) motions and 7) protein folding/unfolding. MovieMaker is not a molecular dynamics server and does not perform MD calculations. ( Reference: R. Maiti et al. 2005. Nucl. Acids Res. 33: W358-W362)

COMBOSA3D - Coloring Of Molecules Based On Sequence Alignment (Paul Stoddard) - accepts a group of pre-aligned sequences in FASTA format (one of the sequences must have a solved three-dimensional structure), and it uses the alignment information to highlight conserved residues on the molecule. Three-dimensional structures are shown using a Chime plugin or one may use RasMol to view and color a molecule..

ProFunc: a server for predicting protein function from 3D structure - this program takes PDB files and carries out an awesome array of analyses including scans against PDB and motif databases, determination of surface morphology and conserved residues, and potential ligand-binding sites. ( Reference: R. A. Laskowski et al. 2005. Nucl. Acids Res. 33: W89-W93).

Predict Function from Structure:

ProFunc - is been developed to help identify the likely biochemical function of a protein from its three-dimensional structure using a variety of sequence- and structure-based methods ( Reference: Laskowski R.A. et al. 2005. Nucleic Acids Res.33, W89-W93).

PROVEAN (Protein Variation Effect Analyzer) is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. ( Reference: Choi Y et al. 2012. PLoS One 7:e46688).

CUPSAT - Cologne University Protein Stability Analysis Tool - predicts changes in protein stability upon point mutations. The prediction model uses amino acid-atom potentials and torsion angle distribution to assess the amino acid environment of the mutation site. Additionally, the prediction model can distinguish the amino acid environment using its solvent accessibility and secondary structure specificity. ( Reference: Parthiban V, et al. (2006) Nucleic Acids Research, 34: W239-42.

Eris - is a protein stability prediction server. This server calculates the change of the protein stability induced by mutations (&Delta&DeltaG) utilizing the recently developed Medusa modeling suite. In our test study, the &Delta&DeltaG values of a large dataset (>500) were calculated and compared with the experimental data and significant correlations are found. The correlation coefficients vary from 0.5 to 0.8. Eris also allows refinement of the protein structure when high-resolution structures are not available. ( Reference: Yin, F. et al. Nature Methods 4: 466-467 2007). Requires registration.

AUTO-MUTE - AUTOmated server for predicting. functional consequences of amino acid MUTations in protEins - is a suite of programs measuring stability changes (&Delta&DeltaG, &Delta&DeltaGH2O, and &DeltaTm). ( Reference: Masso M. & Vaisman I.I. (2010) Protein Eng. Des. Sel. 23: 683-687).

DUET - Protein Stability Change Upon Mutation - a web server for an integrated computational approach for studying missense mutations in proteins. DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimised predictor using Support Vector Machines (SVM). ( Reference: D.E.V. Pires et al. 2015. Nucl. Acids Res. 42(1): W314-W319)

Pockets (active sites) in 3D structures of proteins:

metaPocket - Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction ( Reference: Zhang, Z. et al. 2011. Bioinformatics, 27: 2083-2088).

POCASA (POcket-CAvity Search Application) is an automatic program that implements the algorithm named Roll which can predict binding sites by detecting pockets and cavities of proteins of known 3D structure. ( Reference: Yu, J. et al. 2010. Bioinformatics 26: 46-52.)

Fpocket suite - three servers are available: (a) Fpocket: perform simple pocket detection (b) MDpocket: track pockets in molecular dynamics and, (c) Hpocket: view conserved pockets withing homologous proteins ( Reference: Schmidtke P et al. 2009. Nucleic Acids Res.10:168).

Mutation and crystallization:

SERp Server - Surface Entropy Reduction prediction (SERp) is an exploratory tool to aid identification of sites that are most suitable for mutation designed to enhance crystallizability by a Surface Entropy Reduction approach. ( Reference: Goldschmidt L et al. 2007. Protein Sci. 16:1569-76).

Scratch Protein Predictor - (Institute for Genomics and Bioinformatics, University California, Irvine) - programs include: ACCpro: the relative solvent accessibility of protein residues CMAPpro: Prediction of amino acid contact maps COBEpro: Prediction of continuous B-cell epitopes CONpro: predicts whether the number of contacts of each residue in a protein is above or below the average for that residue DIpro: Prediction of disulphide bridges DISpro: Prediction of disordered regions DOMpro: Prediction of domains SSpro: Prediction of protein secondary structure SVMcon: Prediction of amino acid contact maps using Support Vector Machines and, 3Dpro: Prediction of protein tertiary structure (Ab Initio).


Accessible surface area from NMR chemical shifts

Accessible surface area (ASA) is the surface area of an atom, amino acid or biomolecule that is exposed to solvent. The calculation of a molecule’s ASA requires three-dimensional coordinate data and the use of a “rolling ball” algorithm to both define and calculate the ASA. For polymers such as proteins, the ASA for individual amino acids is closely related to the hydrophobicity of the amino acid as well as its local secondary and tertiary structure. For proteins, ASA is a structural descriptor that can often be as informative as secondary structure. Consequently there has been considerable effort over the past two decades to try to predict ASA from protein sequence data and to use ASA information (derived from chemical modification studies) as a structure constraint. Recently it has become evident that protein chemical shifts are also sensitive to ASA. Given the potential utility of ASA estimates as structural constraints for NMR we decided to explore this relationship further. Using machine learning techniques (specifically a boosted tree regression model) we developed an algorithm called “ShiftASA” that combines chemical-shift and sequence derived features to accurately estimate per-residue fractional ASA values of water-soluble proteins. This method showed a correlation coefficient between predicted and experimental values of 0.79 when evaluated on a set of 65 independent test proteins, which was an 8.2 % improvement over the next best performing (sequence-only) method. On a separate test set of 92 proteins, ShiftASA reported a mean correlation coefficient of 0.82, which was 12.3 % better than the next best performing method. ShiftASA is available as a web server (http://shiftasa.wishartlab.com) for submitting input queries for fractional ASA calculation.

This is a preview of subscription content, access via your institution.


Watch the video: Beginners guide to protein modelling (May 2022).


Comments:

  1. Moogugrel

    I agree with all of the above-said.

  2. Tye

    Thanks to the author for the nice post. I read it in full and learned a lot of interesting things for myself.

  3. Vigul

    Tired of the critical days - change sex !!!!! Figure caption: “Ass. Front view ”Seven nannies have ... fourteen boobs No matter how much vodka you take, you still run twice! (wisdom). He put on a slight fright. Drink seven times - drink once! The place of the enema cannot be changed. Girls lack femininity, and women lack virginity. Sculptural group: Hercules tearing the mouth of a peeing boy. Badge on a 150-kilogram man: Progress made sockets inaccessible to most children - the most gifted die.

  4. Daren

    It's fun :)

  5. Miller

    Cute phrase

  6. Adamnan

    I can not remember.

  7. Shen

    It's okay, it's the entertaining phrase

  8. Diktilar

    Pure Truth!



Write a message