We are searching data for your request:
Upon completion, a link will appear to access the found materials.
In my molecular dynamics lecture, our prof said, that we always have to add hydrogen atoms to titratable groups, before we start the force field simulations, and that it is especially important for Histidine. But why is that? Why do we always have to add the hydrogen atoms? And do we only have to do that for titratable groups, or for all?
The RCSB Protein Data Bank (PDB) presently has 137,917 structures deposited in it, of which less than 10% have been solved by nuclear magnetic resonance (NMR). Most techniques in structural biology simply cannot resolve the positions of hydrogens in macromolecular structures; only NMR and neutron diffraction provide the positions of hydrogens "by default." Therefore, all-atom molecular dynamics force fields, the most notable of which are the OPLS, CHARMM, and AMBER series, require that the positions of hydrogens be deduced from the starting macromolecular structures and geometric criteria and patterns, that hydrogen bonds are known to generally follow, if the starting structure doesn't have hydrogens. "Generally" is an important word to point out here, as there often is no unique, unambiguous way to place the hydrogens in a structure. This is especially true of histidine, where the local environment of residues and/or cofactors may perturb its pKa in often unexpected ways. Simulating histidine in the wrong protonation state, when that histidine is an active site and crucial to the mechanism and process under study, may lead to spurious, misleading, and/or potentially nonsensical results. Hydrogens are required for a faithful, (relatively) realistic description of macromolecular structure, simply because they are always present in macromolecules and are crucial for macromolecular structure and functioning - recall that hydrogen bonds stabilize the secondary structure of proteins, the DNA double helix, etc., moreover, they often mediate intermacromolecular recognition and function. Thus, an explicit representation of hydrogens is required. The so-called coarse grained force fields, the most popular of which I believe are the various versions of MARTINI, group hydrogens along with heavier atoms to achieve an increase in the simulation time step and to reduce the number of particles in the system, but these force fields are a very poor choice when modeling specific recognition and interactions.
The ambiguities of protonation are not a matter of opinion, I'm afraid. One can go through all the structures in the PDB, only to find that the majority of them have no hydrogens at all. One can explore the link to the PDB I've provided; one will find that out of the 131,917 structures in the PDB, 123,461 have been resolved by X-ray crystallography, of which only 10,749 have a resolution below 1.5 Å. To resolve hydrogens, a resolution below 1 Å is required. Indeed, high resolution crystal structures, NMR structures, and neutron diffraction structures (only 60 in the entire PDB!) are less than 20% of the entire structural archive. Techniques like electron microscopy have resolutions between 4 and 10 Å; in such cases, people are unsure about the positions of the heavy atoms; they don't even dare think about the hydrogens. So, it's not just the titratable groups that are problematic, in most structures one can't see ANY hydrogens. Of course, for backbones and alanine side chains, one probably has a pretty good guess where the hydrogens are, given the positions of the heavy atoms. But to answer the question - in MD, most often one needs to add hydrogens everywhere, not just in the titratable groups, because the starting PDB structure doesn't have any hydrogens. In fact, often times structures from the PDB have missing heavy backbone or side chain atoms, because those couldn't be resolved; sometimes termini are missing, sometimes large stretches of protein; these often need to be modeled-in.
Visualizing Protein Dynamics
00:00:07.06 I'm Dorothee Kern and today I want to expose you to the topic of protein dynamics in biology.
00:00:15.22 Protein motion, change of things over time, is a fundamental characteristic of biological
00:00:23.03 function and, actually, life.
00:00:25.00 Since I was an undergrad, my big vision has been to be able to watch proteins in
00:00:31.27 real time at atomic resolution, as I can show you right. now, over here, arriving at the
00:00:39.09 real time movie of, for instance, an enzyme doing catalysis.
00:00:42.22 But why should you care about protein dynamics?
00:00:47.06 Let me use a macroscopic analogy to illustrate to you why we have to go beyond static snap.
00:00:53.23 snapshots to understand biological function.
00:00:56.20 From this little picture here, you don't know my game.
00:01:02.06 So, you don't know whether the ball goes in.
00:01:05.02 But, macroscopically, to watch action is very simple -- you just buy a video camera.
00:01:11.08 However, that's of course not possible on a microscopic level.
00:01:16.21 In fact, what I want to tell you is that macroscopic protein dynamics is actually rooted in the
00:01:22.21 microscopic dancing of individual little atoms.
00:01:26.10 And what I would like to do today is show you how protein dynamics is linked to
00:01:30.24 biological function, and which biophysical techniques -- the most prominent biophysical techniques --
00:01:35.05 you can use to visualize it.
00:01:37.27 If you keep asking the question, how do proteins work?, you arrive at the atomic resolution level,
00:01:44.21 the smallest building blocks of biology.
00:01:49.04 Structural biology has revolutionized our understanding of proteins on this atomic level.
00:01:57.07 However, I would argue that these beautiful structures I show you will not tell you
00:02:01.26 how they work.
00:02:06.01 What my lab has done in the last years is to add the fourth dimension to structure biology
00:02:11.20 to directly watch proteins in action.
00:02:14.12 So, I just showed you a little overview, here, of the kind of work we are. we have worked on.
00:02:20.24 For instance, on the first one, over there, you actually see a signaling protein.
00:02:26.13 And unlike the textbook's picture that you think of, the protein is in the inactive state
00:02:32.15 and then it gets, for instance, phosphorylated and switches over to an active structure,
00:02:36.25 what we found is that, actually, the protein constantly flips back and forth, jumps between
00:02:42.07 the two states, and so it's a very different picture of how signaling actually works.
00:02:49.08 Down at the bottom, I can show you another example, how. how we visualized how a
00:02:55.06 multidrug transporter transports drugs which you just put in the cell,
00:03:01.05 from the inside back to the outside, via a constant conformational change.
00:03:06.20 Even questions like neurobiology, for instance, long-term memory, which I'm showing you.
00:03:12.04 show you over here, can be explained by the dynamic interactions between an enzyme we're
00:03:19.08 studying and a receptor in the brain.
00:03:22.06 This, for instance, was a quite interesting story where we worked on the cocaine dependence
00:03:26.28 on. on mice.
00:03:29.26 On the right side, over there, you see that for protein/protein interactions it's again.
00:03:36.27 it's not a rigid-body docking, but protein/protein interactions are mediated by a dynamic interplay
00:03:42.10 between the two players.
00:03:44.05 And even. lastly, for drug binding, which was. you know, the dogma has been that it's.
00:03:50.28 this small molecule just docks onto a protein and inhibits, we have shown that the
00:03:55.15 crucial component for rational drug design is actually protein dynamics.
00:04:00.06 So, after I made you probably already dizzy looking. watching all these movies, I better
00:04:05.24 tell you how we're actually arriving at these. at these four-dimensional movies.
00:04:13.12 I show you here a collection of biophysical methods which allow us to see the invisible.
00:04:19.11 And I call it invisible because of course you cannot watch single atoms under a microscope,
00:04:24.04 for sure not dancing in real time on. on their different timescales.
00:04:28.19 So, I will mention some of those methods in my talk.
00:04:33.24 What I show you, for instance, is NMR spectroscopy, which stands for nuclear magnetic resonance,
00:04:39.11 which is our bread and butter method, and I would argue one of the best methods to measure
00:04:43.26 motions on an atomic resolution.
00:04:45.18 We combine that with mass spectrometry, X-ray crystallography, very now.
00:04:53.23 old-fashioned kinetic experiments, for instance, stopped-flow and quenched-flow experiments,
00:04:59.11 and. or single-molecule fluorescence and energy transfer experiments.
00:05:03.11 Very lately, we also have been using bioinformatics, which I show you down on the bottom, for analyzing
00:05:11.19 the evolution of protein function.
00:05:15.03 And finally, all these experimental methods are being combined with computation,
00:05:20.01 which of course plays a very big role to finally come up with our dancing molecules.
00:05:28.18 These dancing molecules are fun to watch, but we have to find a unifying language to
00:05:37.04 describe our biophysical systems, in this case, our proteins.
00:05:40.26 And the unifying language which is unambiguous, whether you are living in India or China or
00:05:46.22 America or Australia, is actually the free energy landscape.
00:05:51.14 I'm sure you've heard about it.
00:05:53.12 The free energy of the systems.
00:05:56.02 And so what I show you here is probably one of the key things for the lecture.
00:06:00.19 And why you actually want to pay attention in your classes to physical chemistry.
00:06:06.13 If you have a protein which, for instance, interconverts between two different states,
00:06:11.24 for instance, state A and state B, which could be the inactive and active states, to describe
00:06:17.07 the entire system we want to do three things.
00:06:19.17 We want to figure out the populations between the states, and so what you see actually.
00:06:25.12 that, of course, in the energy landscape, a state is defined as a minimum in the
00:06:31.24 free energy landscape, over here and over there.
00:06:35.27 The interconversion, meaning going from state A to state B,
00:06:39.23 is going over the transition state.
00:06:42.19 So, in the energy landscape, the transition state is defined as this saddle point and
00:06:48.22 the high point of the free energy landscape.
00:06:51.16 So, what I described right now are the two major states.
00:06:54.24 We call them kinetically distinct states because they are separated by a pretty big free energy
00:07:01.02 And as you know, there's. the height of the free energy barrier that defines
00:07:04.22 the rate of interconversion.
00:07:06.06 Big barrier means slow.
00:07:08.01 Little barrier means fast.
00:07:09.18 So, when we go down from the tier 0 to tier 1, we of course also have smaller.
00:07:16.06 over there, we have actually smaller. smaller barriers, meaning that these interconversions
00:07:21.21 are faster, for instance, on the nanosecond timescale.
00:07:24.13 We call them tier 1.
00:07:26.01 And even further down we have more microstates, which could be hundreds of thousands
00:07:31.12 within that A state, which are interconverting in the picosecond timescale.
00:07:34.26 And we call them entropic substates.
00:07:38.24 And finally, if you want to really completely describe the systems, we would like to know
00:07:42.10 the structures of all these snapshots, and structures which are being sampled.
00:07:48.21 So, that's really the big picture goal, because if you can describe it. our system in this way,
00:07:52.26 we fully have understood it.
00:07:57.07 Now, what kind of motions belong to these picosecond/nanosecond/[microsecond] timescales?
00:08:04.08 The very fast timescales, if you have learned from. in chemistry, are really bond vibrations.
00:08:10.00 Moving up from femtoseconds to picoseconds to nanoseconds, we go into rotations of
00:08:16.09 single bonds, loop motions, and sidechain rotations.
00:08:19.17 And then finally, going all the way up to milliseconds, seconds, or maybe even hours,
00:08:23.06 we are talking about very collective motions of many atoms together.
00:08:27.14 The last question, now, is, what kind of methods could be used?
00:08:32.20 So, I've pinpointed a few methods, which you can see are used to describe different timescales
00:08:39.25 of motions, over here.
00:08:41.02 And this will be, now, part of the talk, to explain to you how to use these methods to
00:08:46.27 figure out motions on these different timescales.
00:08:49.17 For the remainder of the talk, in order to be actually, really, like, hands-on, I'm choosing
00:08:57.04 to describe how to study protein motions and how the motions are dictating biological function
00:09:04.25 on one system, which is the catalytic power of enzymes during catalysis.
00:09:09.08 It has always intrigued me how amazing enzymes are in their catalytic power.
00:09:15.08 This enzyme, for instance, is an absolutely essential enzyme in every cell in every organism
00:09:20.24 because it contr. it catalyzes the interconversion between ATP and AMP to ADP molecules.
00:09:28.15 So, why is this reaction so essential for every cell?
00:09:31.17 It contains. it maintains the right nucleotide concentrations in the cell.
00:09:38.00 You might not know, but if your ATP concentration in the cell drops below 1 millimolar,
00:09:42.07 you're dead.
00:09:43.17 So, this reaction, without the enzyme, would take 7000 years.
00:09:47.27 The enzyme can complete this chemical reaction within 10 milliseconds.
00:09:52.24 So, that's sort of, really, the fascinating power of enzymes.
00:09:56.13 And we are far away from understanding how these sophisticated proteins can do it.
00:10:01.28 So, now, what have scientists done so far?
00:10:04.13 The traditional way of describing enzymatic powers is using, actually, turnover numbers
00:10:10.04 -- how fast the enzyme is turning over a substrate -- or, for instance, a Michaelis-Menten constant,
00:10:14.26 which is really the, sort of, affinity of the protein and the substrate.
00:10:19.22 What I wanted to do -- and that was sort of my, you know, dream since I was an undergrad --
00:10:23.26 was to watch the enzyme in action at atomic resolution, sort of shown with that little
00:10:29.01 blur of. of. of. of the picture, here.
00:10:33.07 So, how do we get there?
00:10:34.20 I told you the very next step is that we have to figure out the free energy landscape.
00:10:40.26 Before we do that, of course I told you, we have to describe all states.
00:10:44.15 In order to figure out which states are actually sampled during catalysis, a very important step
00:10:50.15 is to come from an enzyme to actually putting in the scheme of the reaction.
00:10:57.26 And for the scheme, I mean to write down every single state we can imagine the enzyme is
00:11:04.27 So, what do you see over here, the enzyme, the free enzyme over there, can combine substrates.
00:11:13.25 It has to bind, actually, two substrates, ATP and AMP, to make the ternary complex.
00:11:18.21 Then this enzyme can do conformational change.
00:11:21.13 Then chemistry happens.
00:11:23.00 Then another conformational change.
00:11:24.13 And then, finally, substrates have to be released.
00:11:26.16 So already, on this, you know, only 20 kiloDalton protein, it's going to sample at least 10
00:11:34.24 microscopic steps.
00:11:36.10 Now, from this scheme, we would finally. we'd want to actually arrive at figuring out
00:11:42.06 the relative populations and the conversion rates of structures,
00:11:44.28 which would be the free energy landscape.
00:11:48.12 The first method I want to introduce is NMR spectroscopy, not only because it's
00:11:52.24 close to my heart, because. but I think it is really the unique method to watch every single atom
00:11:59.19 in the protein doing its function, dancing.
00:12:03.02 And the reason is because we can figure out the entire time range of motions from picoseconds
00:12:09.08 all the way to hours by just choosing the right methods -- down, over here --
00:12:14.07 in our NMR toolbox.
00:12:17.06 I told you we can measure sing.
00:12:19.06 every single atom in a protein.
00:12:21.11 Another unique thing is that we actually can measure things at equilibrium.
00:12:25.05 So, in many other spectroscopic methods, you have to disturb the equilibrium and you're
00:12:30.00 watching how you're re-equilibrating, versus in NMR you can actually measure the equilibrium.
00:12:35.11 And most importantly, I would say, we can measure under physiological conditions.
00:12:39.04 We can change temperature, pH, the substrate concentration.
00:12:43.24 We are in solution and having our protein doing what's. what it wants to do and, while
00:12:49.13 it's doing its function, watching.
00:12:51.15 So, this is sort of why I really think that it is a very powerful method for protein dynamics.
00:12:57.08 So, what's the physics behind it?
00:12:59.28 I thought of comparing it to a method, which I'm sure everybody has encountered already,
00:13:06.17 which is magnetic resonance imaging, where, if you have, for instance, an injury,
00:13:11.11 your whole body gets pushed into a big magnet to see what kind of injury, for instance,
00:13:17.10 you have on the. on the. on the knee.
00:13:19.04 So, it's an imaging method.
00:13:20.17 It turns out we use exactly the same method for nuclear magnetic resonance, except that
00:13:25.21 we don't put a whole body in there to image tissue, but rather our individual proteins
00:13:30.26 that we want to actually watch. watch, or look at every single atom.
00:13:35.11 So, the frequency of excitation is in the radio wavelengths, which means you can actually
00:13:41.28 listen to the excitation of the individual atoms, which I'll show you right now.
00:13:54.27 So, what you just heard was actually the free induction decay, the release of the.
00:14:02.13 of the wavelength energies on the radio wave of a single atom.
00:14:06.04 But, of course, a protein, over here, consists of thousands of atoms.
00:14:11.03 And we would like to watch or listen to every single atom.
00:14:15.10 So, for that, we can actually excite all atoms in this protein and record their frequencies,
00:14:20.28 which are different.
00:14:22.07 For instance, here I show you a proton-nit. a proton-nitrogen correlated spectrum.
00:14:26.28 So, each point over here belongs to one amide in your protein.
00:14:31.02 It's a signature which has a unique frequency.
00:14:34.05 So, now we can listen to the entire protein resonating, which I'll play right now.
00:14:50.10 Does it sound just like a song?
00:14:52.22 So, this is some of the physics behind, right?, to figure out at which frequencies it resonates.
00:14:57.24 But now you want to figure out how fast they dance.
00:15:00.17 So, we have to do one more trick to our spins, or our nuclei.
00:15:04.17 So, what we're going to do is we're going to record the relaxation of these spins from
00:15:10.04 an excited state.
00:15:11.22 And I decided, again, to put some cartoons on top, here, for the biologists and then
00:15:17.26 some little equations for the physics, down here.
00:15:21.16 So, let's compare our spins with runners.
00:15:24.02 If we actually put them in the excited state, what we actually do. we put them on the
00:15:29.16 stadium at the starting line and they are all together starting, running around the
00:15:34.04 stadium in their. these. these frequencies you just heart.
00:15:36.27 They're together, so they are on resonance -- that's how you call it in physics.
00:15:41.00 But all the time, some will run faster and some will run slower.
00:15:45.00 And it's this dephasing in our stadium which we call transverse relaxation.
00:15:50.02 And the speed of this relaxation directly tells us about how fast these proteins move.
00:15:56.27 And I've shown you, of course, that these are single exponential decays for the people
00:16:00.22 who like, actually, physics.
00:16:01.28 And so we can measure that.
00:16:03.22 One more thing we have to do to our runners, we don't just want to say, okay, these atoms
00:16:08.27 are moving.
00:16:09.27 We actually want to describe our free energy landscape.
00:16:12.07 Remember, what we. what our goal was, to figure out the relative populations,
00:16:17.00 the rate of interconversions, and give structural information.
00:16:21.05 And very, very exciting, the Palmer lab, now. oop, almost 20 years ago, figured out
00:16:29.05 one more trick we can do to our runners.
00:16:30.24 I told you they are running with different speeds.
00:16:33.21 If we now, during this relaxation time, flip the runners, so that Bolt, who was the fastest,
00:16:39.26 will be last, and the one who was last will be fast. will be the fastest,
00:16:44.18 they will actually finish in the stadium at the same time.
00:16:47.18 And what we call that in physics, we are refocusing our pulses.
00:16:51.13 And it's this extra trick, where we actually refocus our. our relaxation in the XY plane,
00:16:56.22 which allows us to get all these. all the information we need about the timescales,
00:17:03.27 the populations, and structural information.
00:17:07.02 I think I have overloaded you, maybe, with a little bit too much NMR, but I just
00:17:11.10 get excited about it.
00:17:13.10 Let's look at some data now.
00:17:15.00 My postdocs, Magnus and Vu, did these experiments on this enzyme during catalysis.
00:17:21.20 I told you we can watch the enzyme during catalysis.
00:17:24.13 So, we are putting our enzyme in the NMR tube.
00:17:27.21 And as soon as we add substrate, it actually catalyzes this reaction.
00:17:31.15 And it turns out, because it's a fully reversible reaction, it's going to do this forever.
00:17:37.00 We have samples which have been running through this enzymatic cycle for two months.
00:17:41.06 And while. while it's doing its job, we can measure these relaxation times, and therefore
00:17:46.24 get information about protein dynamics.
00:17:49.08 We found a very exciting result.
00:17:52.19 What I show you here is that all these red residues, over here, actually seem to be
00:17:58.09 moving around during catalysis.
00:18:00.06 And what we found is that what we are actually detecting is the opening and closing of
00:18:04.12 this protein, which I show you, here, in the open state and, here, the closed state.
00:18:09.06 And it turns out that this opening and closing. the closing is very fast,
00:18:14.05 the opening is way slower.
00:18:17.08 The opening is 40 per second.
00:18:19.06 If we now compare that rate constant with how fast this enzyme makes ATP, it's actually
00:18:24.18 within experimental error.
00:18:25.24 So, what we learned was our first surprise.
00:18:28.20 The rate-limiting step, the highest barrier in the energy landscape, is actually
00:18:32.28 not the chemical step of breaking a phosphate and putting it over there, but rather a large
00:18:38.23 conformational change of opening and closing.
00:18:41.19 And it turns out that this actually seems to be a reoccurring theme in biology,
00:18:46.26 that very often the limiting step in biology is actually how fast these enzymes or proteins
00:18:53.01 can dance.
00:18:54.21 But the whole point of this enzyme is not to dance around, but to actually make that
00:18:59.28 chemistry, right?
00:19:00.28 This is what classic enzymology is focusing on.
00:19:04.04 How can the pro. can the protein take ATP and AMP to make two ADP molecules?
00:19:11.07 So, break a phosphoanhydride bond and make a new one, right?
00:19:14.01 So, here we have the two ADP molecules.
00:19:17.00 The reason it's hard to measure is because it's not the rate-limiting step.
00:19:20.24 And, not the rate-limiting step, it's basically hard. hard to visualize because it's too fast.
00:19:26.17 So, the way we went around it is we used a different method, X-ray crystallography.
00:19:31.00 So, you could argue, crystallography, you're freezing the crystal, it's a static mech.
00:19:37.07 However, we recorded the crystal. many crystallographic data at room temperature.
00:19:42.17 And it turns out that, at room temperature, things are still moving in the crystal.
00:19:46.03 And what I've shown you over here is two ADP molecules in the active site.
00:19:50.27 One is color. very much colored, because this smells like it's a dynamic molecule.
00:19:56.27 You see the speed of the phosphate?
00:19:59.10 Superimposing all of our crystal structures, it looks like. that this beta phosphate
00:20:04.20 is one which wants to fly over to make ATP.
00:20:07.20 Coinciding with this flexibility which we see in the active site is this arginine,
00:20:12.24 which seems to grab the negatively charged phosphate to move it over.
00:20:16.15 So, even though crystallography has a reputation as a static method, we actually can get quite
00:20:22.10 a lot of dynamic information.
00:20:24.02 So, this state over here is actually our enzyme-substrate complex.
00:20:30.08 Now, of course, you would like to know how it's actually doing the reaction to go over
00:20:34.08 the transition state.
00:20:35.13 The transition state is not populated, because it's on the top of the energy landscape.
00:20:39.22 So, the very next thing we can do is to try to mimic it with a transition state analog,
00:20:44.16 which I'm showing you over here.
00:20:46.04 So, an ADP molecule, a magnesium/aluminum fluoride, and an AMP molecule is mimicking
00:20:54.03 right when this beta phosphate is trying to fly from one. one step to the other.
00:20:59.00 And so Young-Jin finally succeeded to get this high-resolution crystal structure.
00:21:03.24 And what you see is that the magnesium/aluminum fluoride, which mimics the phosphate in the
00:21:09.17 transition state, is directly in the middle between the donor and acceptor.
00:21:14.07 So, this gives us snapshots but still not dynamics.
00:21:18.03 Let me introduce you to a different method, which will give us more dynamic information.
00:21:24.05 So, that is time resolved single molecule fluorescence energy transfer.
00:21:30.23 It was a very fun collaboration with my little brother, Christian, and his grad student, Maria,
00:21:37.21 and my postdoc, Kathi.
00:21:40.16 And it's always good to have a more brilliant brother in the. in the family.
00:21:45.27 So, I called my brother, I said, you know, we have to measure this protein while doing
00:21:51.19 its reaction, and it's a rare event, so I need your help.
00:21:55.12 And so the idea of a single molecule fluorescence energy transfer experiment is it's like a
00:21:59.26 spectroscopic ruler.
00:22:01.17 You're exciting the fluorescence donor, and this energy gets transferred to the acceptor
00:22:08.16 only when the two, donor and acceptor, are very close.
00:22:12.09 So, it's a directly spectroscopic ruler.
00:22:14.15 So, if we excite the green dye and we get a lot of fluorescence on the red one,
00:22:21.09 we know they are close together, whereas if you get very little over here they're further away.
00:22:25.27 So, we can do that on a single molecule level.
00:22:28.16 We can attach, right over here, a single enzyme molecule on a cover slide, and watch in
00:22:36.24 real time the red and green fluorescence correlating.
00:22:44.17 So again, far away means.
00:22:46.14 I mean, if there's very little transfer they're far away.
00:22:49.21 And if they're close together, they're close. they are coming closer together.
00:22:53.17 So, this is the thought experiment.
00:22:55.20 And now I'm showing you the real data.
00:22:59.09 And so here's where our next big surprise was found.
00:23:03.21 We measured first, actually, the enzyme, the happy enzyme in the presence of magnesium.
00:23:09.26 I for. sort of forgot to mention to you -- I showed you in the structure --
00:23:13.16 but it turns out magnesium is very essential for the catalysis for kinases.
00:23:18.14 And it was subscribed. described that magnesium is essential for that chemical step, the phosphotransfer,
00:23:24.17 right?, when the phosphate goes from one step to the other.
00:23:27.01 So. but here we are measuring the opening and closing of the enzyme, right?
00:23:32.13 So, if the fluorescent energy transfer is efficient, close to 1, over there,
00:23:38.17 we are in the closed state.
00:23:40.17 And if it's further away, we have very inefficient fluorescence transfer.
00:23:44.10 So, what you can see is that in real time, within milliseconds, this enzyme is opening
00:23:48.18 and closing.
00:23:49.27 But see what happens when you take the magnesium out.
00:23:52.20 We thought, if you take the magnesium out, only the chemical step, which is just the
00:23:56.16 movement of a phosphate by one Angstrom, is stopped.
00:23:59.08 But, in fact, it looks like now the opening and closing is slowed down by about
00:24:04.28 three orders of magnitude, because if you look at that timescale, over here, you're staying
00:24:08.25 in the closed state for several seconds.
00:24:11.15 How can that be?
00:24:12.15 Is the literature wrong?
00:24:13.20 So, since we were so puzzled about that result, that magnesium is actually very essential
00:24:19.06 for protein dynamics, we designed yet another method, an experiment which is based on
00:24:24.15 a new method, which I want to introduce now.
00:24:28.01 It's called quench-flow kinetics.
00:24:29.28 So, what we can measure is a single turnover of a protein, for instance, an enzyme,
00:24:36.08 in which it's doing its first turnover.
00:24:38.24 It's a fast mixing method, so we're using our enzyme over here, mix it with a.
00:24:43.14 with a substrate very quickly, and the reaction goes.
00:24:47.01 Then, in a second step, we can quench the reaction and then collect data to. a sample
00:24:54.11 for data analysis.
00:24:55.15 So, what does it allow us?
00:24:57.05 It allows us to look at the phosphotransfer, over here, in the first turnover, and measure
00:25:03.16 this step even though the later step might be slower.
00:25:07.16 So, we could figure out how fast this fast phosphotransfer is in the first turnover.
00:25:14.02 So, this is what I show you, the data I show you next.
00:25:17.25 And here's the good and bad news.
00:25:19.28 The good news is the experiment worked and the enzyme is wicked fast for its phosphotransfer.
00:25:26.17 The bad news is it's too fast to be even caught, because what you can see over here,
00:25:32.02 within the dead time of the experiment, about 50 milliseconds, the entire enzyme has converted
00:25:38.26 ADP to ATP.
00:25:41.17 So, our method is actually time-limited.
00:25:44.09 So, in other words, nature has perfected the chemical step to be superfast, okay?
00:25:51.15 And then what you see over here are multiple turnovers, now.
00:25:54.07 That is the slow opening of the enzyme.
00:25:56.17 So, what happens when you take the magnesium out?
00:25:59.20 Remember what the single molecule FRET experiment showed us, that the opening is very slow.
00:26:04.25 But what happens to the chemical step?
00:26:06.19 Again, we see a burst, which is the phosphotransfer step followed by the slow opening.
00:26:13.25 Yes, the opening is very slow, but now you see that also the phosphotransfer, that burst.
00:26:19.02 that first burst is also very slow.
00:26:21.12 So, what we have learned is actually a very amazing thing, what nature does.
00:26:27.13 First of all, I can tell you why you have to eat magnesium.
00:26:31.15 I don't know what bio-transformed magnesium means -- you figure that out.
00:26:35.26 But what we have learned is that nature is that efficient or clever that it uses a
00:26:43.26 single atom to catalyze every essential step in this entire reaction.
00:26:49.18 Think about. one single atom is doing this entire job.
00:26:52.27 It can only do it when it's bound to the enzyme.
00:26:55.18 It accelerates the phosphotransfer step by more than five orders of magnitude
00:27:01.25 relative to the uncatalyzed rate.
00:27:03.01 But at the same time, it's the same magnesium which catalyzes the conformational change
00:27:08.21 by three orders of magnitude.
00:27:10.00 So, I would call it moonlighting of a single atom in an enzyme.
00:27:15.12 You, being curious, may ask the question, how can a single atom do that?
00:27:22.01 And for this we sort of tried to do a couple of clever experiments.
00:27:26.08 So, the first thing we did is we replaced the magnesium with other divalent cations.
00:27:30.10 And what we learned is, by doing enzyme kinetic experiments, that the rate of chemistry is
00:27:36.28 significantly slower for all the other divalent cations. cations.
00:27:40.17 So, nature has optimized the chemical step, really, for magnesium.
00:27:45.19 In contrast, the conformational change, which was our new surprise, that it's also essential
00:27:51.14 for this reaction, is actually independent on which divalent metal you use, which means
00:27:57.04 it seems to be just a pure electrostatic effect.
00:27:59.27 So, these are NMR data collected at different. with different metals in the active site.
00:28:05.03 Which brings me to the next method, which you do want to know about using. for protein
00:28:13.14 dynamics, which are computational methods.
00:28:16.22 Of course, in the computer, you can calculate everything.
00:28:18.27 So, the first thing I'm going to show you is the calculation of the electrostatic potential
00:28:24.17 in the active site.
00:28:26.00 You can see our two ADP molecules, over here and over there.
00:28:30.24 You see our arginines coming down.
00:28:32.22 Arginines, as you know, are positively charged, interacting with the negatively charged phosphates.
00:28:37.13 I told you we need that for chemistry, to move the phosphate around.
00:28:41.26 But, at the end of the day, these interactions need to be broken to open up the enzyme to
00:28:47.01 release the substrate.
00:28:48.07 And remember, that's our rate-limiting step.
00:28:50.20 Very strong charge interactions are very, very hard to break.
00:28:53.20 So, what is nature doing?
00:28:55.11 It's putting another positively charged magnesium on the opposite side of them. of this interaction,
00:29:01.27 thereby weakening the electrostatic interactions.
00:29:04.05 A very clever mechanism.
00:29:07.15 What else can we do with computational methods?
00:29:11.02 One of the most famous things, of course, you've all heard about is
00:29:15.23 molecular dynamic simulations,
00:29:17.27 where we can finally get these dancing. dancing proteins, which looks very.
00:29:22.18 very. very interesting.
00:29:23.27 What is actually the derived. the physics behind it?
00:29:26.23 All we are going to do is calculate the laws.
00:29:30.13 Newton's laws of motion between the atoms.
00:29:33.06 So, it's a classical description of the system.
00:29:36.14 And so what I show you over here is actually the opening and closing of the enzyme,
00:29:41.02 that rate-limiting step.
00:29:42.05 There's one other thing I want to mention over there.
00:29:45.12 We can also visualize, in the computer, single water molecules, which of course we
00:29:50.26 cannot do in an experiment.
00:29:52.13 And that answered to me a very puzzling question.
00:29:57.02 My question really has been, how can this enzyme efficiently do phosphotransfer but
00:30:03.20 avoid hydrolysis, which is the energetically favorable reaction?
00:30:07.08 And it turns out, if you think about enzymes, the hardest thing is actually to suppress
00:30:13.19 energetically favorable reactions, and not to catalyze the reaction you want to.
00:30:18.26 And the favored reaction by far is hydrolysis.
00:30:20.24 I told you if we hydrolyze our ATP we will be dead in a minute.
00:30:24.22 So, this enzyme has to be extremely good at suppressing hydrolysis.
00:30:29.12 I thought what's going to happen is that it's going to strip out all the waters
00:30:33.21 from the active site.
00:30:34.22 But that's not what happens.
00:30:36.09 All these little balls over here, these are all water molecules right in the cavity of
00:30:40.20 the active site.
00:30:42.11 How is it avoiding hydrolysis?
00:30:45.00 It's actually keeping the water molecules far enough away from the phosphate that
00:30:50.15 it cannot do hydrolysis.
00:30:54.22 And finally, using quantum mechanical calculations, we can directly watch that phosphotransfer
00:31:00.22 step, watch. you're breaking, over here, the phosphate bond, and you put it over there.
00:31:06.17 Of course, we need quantum mechanical calculations there because we have to break bonds.
00:31:10.04 So, this is just giving you just a, really, flavor of what kind of questions you can answer
00:31:17.27 with computational methods.
00:31:19.21 Clearly, computation methods need to be combined with experiments.
00:31:23.12 Coming back, what we're showing you here is how combining an array, a really complementary
00:31:32.17 array of biophysical methods, where we're allowed. where we're able to figure out
00:31:37.20 on a free energy landscape how this enzyme works.
00:31:40.14 And how it's doing its amazing power of catalysis.
00:31:43.13 At the end, I want to actually come to something which was very surprising.
00:31:49.16 People thought that if you take these enzymes and they don't see a substrate they are
00:31:54.04 basically resting, they are asleep. which is not true.
00:31:56.28 If you look, actually, at this enzyme in the asleep state, meaning that there's no substrate
00:32:02.12 bound, there's a lot of motion going on.
00:32:04.22 And in fact, the motions are already mimicking what the enzyme needs to do during catalysis.
00:32:10.20 So, what we are calling it is like that the dynamic personality is built. it's built
00:32:15.11 into the enzyme and it underlies catalysis.
00:32:18.08 So, let me just quickly show you how we actually saw that the motions which are vitally important
00:32:23.23 for catalysis are really an intrinsic property of the enzyme.
00:32:27.09 Over there, I show you in red, orange, and yellow three structures which from this one.
00:32:34.17 from adenylate kinase, from this kinase, in the resting state, meaning there's no substrate bound.
00:32:39.24 What you can see is that the lids are already partially closing.
00:32:44.01 So, what we had were three molecules in the asymmetric unit cell, and they had different structures.
00:32:49.27 And they moved exactly along the trajectory where it should go during catalysis,
00:32:55.04 which is the green structure.
00:32:56.11 So, there's a partial closure already, directly along the direction where it wants to go
00:33:02.08 during catalysis.
00:33:03.22 So, the question is, of course, crystallography gets us snapshots but not timescales.
00:33:08.20 That's why the used molecular dynamics simulations to ask the question, what is the timescale
00:33:15.00 of this kind of motion?
00:33:16.12 And it turns out these partial lid closures happens in the nanosecond timescale.
00:33:20.20 All good. except we did one too many experiments.
00:33:24.06 We actually now looked at this free enzyme by NMR spectroscopy I just showed you.
00:33:29.13 And what we found there was that there's motion on the millisecond timescale.
00:33:33.15 That's orders of magnitude slower.
00:33:34.27 So, what are we missing?
00:33:37.13 So, again, our single molecule FRET experiments answered that question, because what we
00:33:41.27 could see. that, in real time, the enzyme indeed is doing a full closure but it is much slower.
00:33:48.00 So, in other words, you have a picture where you have fast timescale partial closure,
00:33:54.25 and then a rare event, which is much slower, of full closure.
00:34:02.26 Which brings me to this. to the. to the beginning of my talk, to talk about the
00:34:08.27 hierarchy in space and time.
00:34:10.21 What do I mean, the hierarchy in space in time?
00:34:12.11 We want to link the slow motions, over here, to the fast motions.
00:34:17.18 And they're in fact linked, okay?
00:34:20.02 So, I told you a lot about experiments to look at how we go over the big mountain, right?,
00:34:25.04 the slow motions.
00:34:26.04 But how about measuring these fast motions, here?
00:34:29.05 For this, we can measure how fast a bond vector, shown over here, an amide bond vector,
00:34:35.22 is actually wiggling.
00:34:37.19 And not only do we get the timescale but we also get the amplitude.
00:34:40.13 So, we have measured those by NMR, again, that's now pico to nanoseconds, going down
00:34:45.26 to these faster motions.
00:34:48.04 And I show you those data as a continuous color scale, blue meaning that it's a very
00:34:54.27 small amplitude -- so 1 would mean there's no. no amplitude change and 0 would be
00:35:00.13 a completely isotropic motion.
00:35:01.22 So, what you see is that there are hotspots, dynamic hotspots.
00:35:06.21 The red ones, right?
00:35:07.21 So, one more thing I want to explain to you, on the left side I have data on a mesophilic protein,
00:35:13.08 which had. which lives on. in normal, happy environments, and on the right side
00:35:18.19 is actually a hyperthermophilic enzyme, which actually lives in nature
00:35:23.20 in the hot vents.
00:35:24.20 So, what happens to these dynamic hotspots, here?
00:35:28.07 It turns out these are my elbows.
00:35:31.06 These are the hinges which need to be flexible so that that my lid can move.
00:35:35.16 So, what we see is fast timescale motions in my. in my elbow, so that, actually,
00:35:40.25 my hand can move.
00:35:42.13 So, fast timescale motions lead to slow timescale motions.
00:35:47.11 Interestingly, this is directly linked to activity.
00:35:51.09 Because if you look at the mesophilic and thermophilic enzymes, well, we know that thermophilic
00:35:55.27 enzymes at low temperatures, at 20 degrees, are very lousy.
00:35:59.11 And the reason why they are slow is because they're fast timescale motions in their hinges,
00:36:04.20 they are not red and yellow yet.
00:36:06.22 They have no large amplitudes.
00:36:09.13 But see what happens when you go to a temperature where, now, the activities are matched.
00:36:17.02 The activities from here and here are matched.
00:36:21.01 And so are the dynamics in my elbow.
00:36:24.06 So, that allows us now to put down, together, this hierarchy in space of time of fast timescale
00:36:31.22 motions, more localized, leading to slower motions of more global changes.
00:36:37.23 Of course, knowing the structures, we can ask the question, what dictates the differential
00:36:42.13 flexibility in these hinges?
00:36:44.26 And I'm just mentioning two things.
00:36:47.03 One is aromatic packing -- we rigidify the thermophile by packing a lot of aromatics
00:36:52.08 together and therefore rigidly holding it together.
00:36:55.05 And the second trick nature uses are prolines, because prolines, as you know, are imides
00:37:02.13 and restrict the backbone.
00:37:06.14 To summarize what I would like you to take home, I hope I have stimulated your appetite
00:37:16.09 to be intrigued about protein dynamics.
00:37:21.08 Coming back to the. to the beginning, there is no life without dynamics.
00:37:27.23 And of course what is moving are structural ensembles.
00:37:30.26 So, whether it's signaling, membrane transport, neurobiology, drug binding, what we
00:37:38.00 need to do is figure out the free energy landscapes of the systems to understand how the healthy,
00:37:44.06 happy proteins in our body work.
00:37:46.23 And then to finally apply that to do rational drug design to intervene with diseases,
00:37:53.18 which will be the second part of my lecture series.
00:37:56.27 The most importance slide is of course my team.
00:38:01.08 I am standing you here giving you the story, but in fact these are the guys who did
00:38:06.08 all the work.
00:38:07.14 My amazing team at Brandeis.
00:38:09.07 And I must say, they are the reason why I love to go to work every morning.
00:38:13.06 Thank you.
Variants in the TGFBR2 kinase domain cause several human diseases and can increase propensity for cancer. The widespread application of next generation sequencing within the setting of Individualized Medicine (IM) is increasing the rate at which TGFBR2 kinase domain variants are being identified. However, their clinical relevance is often uncertain. Consequently, we sought to evaluate the use of molecular modeling and molecular dynamics (MD) simulations for assessing the potential impact of variants within this domain. We documented the structural differences revealed by these models across 57 variants using independent MD simulations for each. Our simulations revealed various mechanisms by which variants may lead to functional alteration some are revealed energetically, while others structurally or dynamically. We found that the ATP binding site and activation loop dynamics may be affected by variants at positions throughout the structure. This prediction cannot be made from the linear sequence alone. We present our structure-based analyses alongside those obtained using several commonly used genomics-based predictive algorithms. We believe the further mechanistic information revealed by molecular modeling will be useful in guiding the examination of clinically observed variants throughout the exome, as well as those likely to be discovered in the near future by clinical tests leveraging next-generation sequencing through IM efforts.
Citation: Zimmermann MT, Urrutia R, Oliver GR, Blackburn PR, Cousin MA, Bozeck NJ, et al. (2017) Molecular modeling and molecular dynamic simulation of the effects of variants in the TGFBR2 kinase domain as a paradigm for interpretation of variants obtained by next generation sequencing. PLoS ONE 12(2): e0170822. https://doi.org/10.1371/journal.pone.0170822
Editor: Freddie Salsbury Jr, Wake Forest University, UNITED STATES
Received: October 14, 2016 Accepted: January 11, 2017 Published: February 9, 2017
Copyright: © 2017 Zimmermann et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: We thank the Mayo Clinic Center for Individualized Medicine for funding. RU was supported by Grants from NIDDK: National Institute of Diabetes and Digestive and Kidney Diseases - RO1 52913, - P30 084567 - P50CA102701 and the Mayo Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The time step of atomistic molecular dynamics (MD) simulations is determined by the fastest motions in the system and is typically limited to 2 fs. An increasingly popular approach is to increase the mass of the hydrogen atoms to ∼3 amu and decrease the mass of the parent atom by an equivalent amount. This approach, known as hydrogen-mass repartitioning (HMR), permits time steps up to 4 fs with reasonable simulation stability. While HMR has been applied in many published studies to date, it has not been extensively tested for membrane-containing systems. Here, we compare the results of simulations of a variety of membranes and membrane–protein systems run using a 2 fs time step and a 4 fs time step with HMR. For pure membrane systems, we find almost no difference in structural properties, such as area-per-lipid, electron density profiles, and order parameters, although there are differences in kinetic properties such as the diffusion constant. Conductance through a porin in an applied field, partitioning of a small peptide, hydrogen-bond dynamics, and membrane mixing show very little dependence on HMR and the time step. We also tested a 9 Å cutoff as compared to the standard CHARMM cutoff of 12 Å, finding significant deviations in many properties tested. We conclude that HMR is a valid approach for membrane systems, but a 9 Å cutoff is not.
Three-dimensional structures of proteins have contributed to detailed knowledge about biological processes at the molecular level. During the last two decades, advances in experimental techniques such as X-ray crystallography, NMR, and cryo-electron microscopy led to the determination of almost 170,000 atomic resolution structures . However, the number of solved structures is still many orders of magnitude smaller than the 195 million sequence entries in UniProt . Computational structure prediction could bridge this gap, but models with near experimental quality are required to study structure-function relationships and enable rational drug design.
Numerous computational approaches for protein structure prediction have been developed. A driving force of method development has been the Critical Assessment of protein Structure Prediction (CASP) experiments. The CASP assessments have challenged the research community to predict the 3D structures of proteins based on amino acid sequences. Models submitted to CASP are compared with experimentally determined structures, which are not available to participants during the assessment. The results enable comparisons of the performance of available methods and thereby benchmark the progress in the field. Protein structure prediction methods can broadly be divided into two classes, template-based (e.g. homology modeling and fold recognition) and ab initio (e.g. fragment assembly and physics-based methods, also known as template-free modeling) . Both template-based and ab initio models will contain errors (e.g. in secondary structure, side chain packing, and loop regions) and need to be improved to be useful in applications that are sensitive to molecular details. For this reason, there is an increasing interest in methods that can enhance model quality. However, the CASP assessments have demonstrated that model refinement is very challenging. In CASP8, 70% of the participants in an evaluation of protein structure refinement failed to improve agreement with experimental data compared to initial models . In the last decade, access to more computational power led to an increase of the use of physics-based methods such as molecular dynamics (MD) simulations and force fields in protein structure prediction [5–9]. Whereas physics-based methods initially had very limited success in CASP, performance has improved in recent assessments. In fact, MD simulation with physics-based force fields was used by eight out of the ten top-performing refinement methods in the recent CASP12 and all of these yielded an average improvement of structural accuracy . Consistent improvement of model quality with physics-based techniques could have a major impact on studies of protein function and enhance the predictive power of structure-based drug design.
Two research areas that have generally not been evaluated by the CASP assessments are prediction of membrane protein structures and refinement of protein-drug complexes. As many membrane proteins (e.g. ion channels and G protein-coupled receptors) are therapeutically relevant [11,12] and structural coverage is still relatively limited for this group, there is a great need for accurate modeling approaches. The utility of protein models in rational drug design will be dependent on if structures of complexes with ligands can be predicted. The three GPCR dock assessments (GPCR Dock 2008, 2010, and 2013) [13–15] challenged the research community to predict structures of G protein-coupled receptors (GPCRs) in complex with ligands. The results of the first assessment in 2008, which focused on predicting the structure of the A2A adenosine receptor in complex with an antagonist ligand, were discouraging. The vast majority of the participating research groups (93%) failed to predict the ligand binding mode accurately (RMSD > 3 Å). As expected, the best predictions of the receptor structures in the three GPCR Dock assessments were obtained with homology modeling and model accuracy was correlated with how closely related the target was to available GPCR crystal structures. However, even if templates with high sequence identity were available in the two last assessments, prediction of ligand binding modes remained challenging. In fact, around 60%, 80%, and 70% of the research groups failed to model the ligand binding mode accurately (RMSD > 3 Å) for the D3 dopamine, 5-HT1B serotonin, and 5-HT2B serotonin receptors, respectively. In the GPCR Dock assessments, prediction of the receptor-ligand complexes was typically performed based on a static homology model with no additional refinement steps. Only a few of the participating research groups used MD refinement to generate the models (based on supporting methods from the assessment participants [13–15]) and the impact of using simulations on the structural accuracy of the receptor-ligand complex was not evaluated.
In this study, we assessed the use of MD refinement to improve structural models of GPCRs, ligand binding modes, and virtual screening performance. Simulations of the D3 dopamine receptor (D3R) in complex with the ligand eticlopride, which was one of the targets of GPCR Dock 2010 , were performed for 30 of the models submitted to the assessment. Two protocols based on different force fields were used to generate a total MD simulation time of close to 60 μs. Snapshots from the simulation trajectories were compared to the crystal structure of the D3R-eticlopride complex  to determine the accuracy of the transmembrane (TM) region, loops, binding site, and ligand binding mode. Strategies for improving the MD refinement protocols by using restraints in the simulations were also explored. Advantages and drawbacks of using MD refined GPCR-ligand complexes were analyzed based on comparisons to the models used as starting structure, simulations of the crystal structure, and the virtual screening performance of the models.
The current study provides information that the surfactant stabilizes the ubiquitin conformation at low temperatures and high SDS concentrations. The Rg and DSSP analyses revealed that ubiquitin loses its native conformation and adopts a random coil structure over the entire simulation time. The results also suggested that the partial atomic charges not only can affect the type and level of interactions in the protein-SDS complex but also can change the orientation, distribution, and assembly of SDS molecules. Moreover, we demonstrated that the SDS surfactant aggregates to form a membrane-like structure and induces global unfolding in the protein conformation at high temperatures. This study demonstrates that maintaining the hydration shell plays an important role in the unfolding mechanism of ubiquitin. The MD simulations also indicated that neither SDS molecules nor temperature can be used alone for inducing the fully unfolded state in the protein structure and both are required. We believe that these findings could be useful in protein biochemistry, protein folding/unfolding and structural biology. Additionally, the SDS surfactant can mimic the biological membrane environment, and investigating its interactions with proteins are of interest in the field of membrane biology. Therefore, our findings can be productive and helpful for any direct examinations of ubiquitin-membrane interactions.
Anton, A Special-Purpose Machine For Molecular Dynamics Simulation
The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, chemistry, and medicine. A wide range of biologically interesting phenomena, however, occur over timescales on the order of a millisecondseveral orders of magnitude beyond the duration of the longest current MD simulations.
We describe a massively parallel machine called Anton, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems. The machine, which is scheduled for completion by the end of 2008, is based on 512 identical MD-specific ASICs that interact in a tightly coupled manner using a specialized highspeed communication network. Anton has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation. The remainder of the simulation algorithm is executed by a programmable portion of each chip that achieves a substantial degree of parallelism while preserving the flexibility necessary to accommodate anticipated advances in physical models and simulation methods.
Molecular dynamics (MD) simulations can be used to model the motions of molecular systems, including proteins, cell membranes, and DNA, at an atomic level of detail. A sufficiently long and accurate MD simulation could allow scientists and drug designers to visualize for the first time many critically important biochemical phenomena that cannot currently be observed in laboratory experiments, including the "folding" of proteins into their native three-dimensional structures, the structural changes that underlie protein function, and the interactions between two proteins or between a protein and a candidate drug molecule. Such simulations could answer some of the most important open questions in the fields of biology and chemistry, and have the potential to make substantial contributions to the process of drug development.
Many of the most important biological processes occur over timescales on the order of a millisecond. MD simulations on this timescale, however, lie several orders of magnitude beyond the reach of current technology only a few MD runs have thus far reached even a microsecond of simulated time, and the vast majority have been limited to the nanosecond timescale. Millisecond-scale simulations of a biomo-lecular system containing tens of thousands of atoms will in practice require that the forces exerted by all atoms on all other atoms be calculated in just a few microsecondsa process that must be repeated on the order of 10 12 times. These requirements far exceed the current capabilities of even the most powerful commodity clusters or generalpurpose scientific supercomputers.
This paper describes a specialized, massively parallel machine, named Anton, that is designed to accelerate MD simulations by several orders of magnitude, bringing millisecond-scale simulations within reach for molecular systems involving tens of thousands of atoms. The machine, which is scheduled for completion by the end of 2008, will comprise 512 processing nodes in its initial configuration, each containing a specialized MD computation engine implemented as a single ASIC. The molecular system to be simulated is decomposed spatially among these processing nodes, which are connected through a specialized high-performance network to form a three-dimensional torus. Anton's expected performance advantage stems from a combination of MD-specific hardware that achieves a very high level of arithmetic density and novel parallel algorithms that enhance scalability by reducing both intra- and inter-chip communication. Figure 1 is a photograph of one of the first Anton ASICs.
In designing Anton and its associated software, we have attempted to attack a somewhat different problem than the ones addressed by several other projects that have deployed significant computational resources for MD simulations. The [email protected] project, 16 for example, has obtained a number of significant and interesting scientific results by using as many as 250,000 PCs (made available over the Internet by volunteers) to simulate a very large number of separate molecular trajectories, each of which is limited to the timescale accessible on a single PC. While a great deal can be learned from a large number of independent MD trajectories, many other important problems require the examination of a single, very long trajectorythe principal task for which Anton is designed. Other projects, such as FASTRUN, 6 MDGRAPE, 22 and MD Engine, 23 have produced special-purpose hardware to accelerate the most computationally expensive elements of an MD simulation. Such hardware reduces the cost of MD simulations, particularly for large molecular systems, but Amdahl's law and communication bottlenecks prevent the efficient use of enough such chips in parallel to extend individual simulations beyond microsecond timescales.
Anton is named after Anton van Leeuwenhoek, whose contributions to science and medicine we hope to emulate in our own work. In the 17th century, van Leeuwenhoek, often referred to as the "father of microscopy," built high-precision optical instruments that allowed him to visualize for the first time an entirely new biological world that had previously been unknown to the scientists of his day. We view Anton (the machine) as a sort of "computational microscope." To the extent that we and other researchers are able to increase the length of MD simulations, we would hope to provide contemporary biological and biomedical researchers with a tool for understanding organisms and their diseases on a still smaller length scale.
2. MD Computation on Anton
An MD computation simulates the motion of a collection of atoms (the chemical system) over a period of time according to the laws of classical physics. 1 Time is broken into a series of discrete time steps, each representing a few femtoseconds of simulated time. A time step has two major phases. Force calculation computes the force on each particle due to other particles in the system. Integration uses the net force on each particle to update that particle's position and velocity.
Interatomic forces are calculated based on a molecular mechanics force field (or simply force field), which models the forces on each atom as a function of the spatial coordinates of all atoms. In commonly used biomolecular force fields, 9,11,15 the forces consist of three components: bond forces, involving groups of atoms separated by no more than three covalent bonds van der Waals forces, computed between pairs of atoms separated by less than some cutoff radius (usually chosen between 5 and 15 Å) and electrostatic forces, which are the most computationally intensive as they must be computed between all pairs of atoms.
Anton uses the k-space Gaussian split Ewald method (k-GSE) 18 to reduce the computational workload associated with the electrostatic interactions. This method divides the electrostatic force calculation into two components. The first decays rapidly with particle separation and is computed directly for all particle pairs separated by less than a cutoff radius. We refer to this contribution, together with the van der Waals interactions, as range-limited interactions. The second component, long-range interactions, decays more slowly, but can be computed efficiently by mapping charge from particles to a regular mesh (charge spreading), taking the fast Fourier transform (FFT) of the mesh charges, multiplying by an appropriate function in Fourier space, performing an inverse FFT, and then computing forces on the particles from the resulting mesh values (force interpolation).
To parallelize range-limited interactions, our machine uses an algorithm we developed called the NT method. 19 The NT method achieves both asymptotic and practical reductions in required interprocessor communication bandwidth relative to traditional parallelization methods. It is one of a number of neutral territory methods that employ a spatial assignment of particles to nodes, but that often compute the interaction between two particles using a node on which neither particle resides. 4,7,10,14,17,21
The integration phase uses the results of force calculation to update atomic positions and velocities, numerically integrating a set of ordinary differential equations describing the motion of the atoms. The numerical integrators used in MD are nontrivial for several reasons. First, the integration algorithm and the manner in which numerical issues are handled can have a significant effect on accuracy. Second, some simulations require the integrator to calculate and adjust global properties such as temperature and pressure. Finally, one can significantly accelerate most simulations by incorporating constraints that eliminate the fastest vibrational motions. For example, constraints are typically used to fix the lengths of bonds to all hydrogen atoms and to hold water molecules rigid.
3. Why Specialized Hardware?
A natural question is whether a specialized machine for molecular simulation can gain a significant performance advantage over general-purpose hardware. After all, history is littered with the corpses of specialized machines, spanning a huge gamut from Lisp machines to database accelerators. Performance and transistor count gains predicted by Moore's law, together with the economies of scale behind the development of commodity processors, have driven a history of general-purpose microprocessors outpacing special-purpose solutions. Any plan to build specialized hardware must account for the expected exponential growth in the capabilities of general-purpose hardware.
We concluded that special-purpose hardware is warranted in this case because it leads to a much greater improvement in absolute performance than the expected speedup predicted by Moore's law over our development time period, and because we are currently at the cusp of simulating timescales of great biological significance. We expect Anton to run simulations over 1000 times faster than was possible when we began this project. Assuming that transistor densities continue to double every 18 months and that these increases translate into proportionally faster processors and communication links, one would expect approximately a tenfold improvement in commodity solutions over the five-year development time of our machine (from conceptualization to bring-up). We therefore expect that a specialized solution will be able to access biologically critical millisecond timescales significantly sooner than commodity hardware.
To simulate a millisecond within a couple of months, we must complete a time step every few microseconds, or every few thousand clock ticks. The sequential dependence of successive time steps in an MD simulation makes speculation across time steps extremely difficult. Fortunately, specialization offers unique opportunities to accelerate an individual time step using a combination of architectural features that reduce both computational latency and communication latency.
For example, we reduced computational latency by designing:
- Dedicated, specialized hardware datapaths and control logic to evaluate the range-limited interactions and to perform charge spreading and force interpolation. In addition to packing much more computational logic on a chip than is typical of general-purpose architectures, these pipelines use customized precision for each operation.
- Specialized, yet programmable, processors to compute bond forces and the FFT and to perform integration. The instruction set architecture (ISA) of these processors is tailored to the calculations they perform. Their programmability provides flexibility to accommodate various force fields and integration algorithms.
- Dedicated support in the memory subsystem to accumulate forces for each particle.
We reduced communication latency by designing:
- A low-latency, high-bandwidth network, both within an ASIC and between ASICs, that includes specialized routing support for common MD communication patterns such as multicast and compressed transfers of sparse data structures.
- Support for choreographed "push"-based communication. Producers send results to consumers without the consumers having to request the data beforehand.
- A set of autonomous direct memory access (DMA) engines that offload communication tasks from the computational units, allowing greater overlap of communication and computation.
- Admission control features that prioritize packets carrying certain algorithm-specific data types.
We balance our design very differently from a generalpurpose supercomputer architecture. Relative to other high-performance computing applications, MD uses much communication and computation but surprisingly little memory. The entire architectural state of an MD simulation of 25,000 particles, for example, is just 1.6 MB, or 3.2 KB per node in a 512-node system. We exploit this property by using only SRAMs and small L1 caches on our ASIC, with all code and data fitting on-chip in normal operation. Rather than spending silicon area on large caches and aggressive memory hierarchies, we instead dedicate it to communication and computation.
It is serendipitous that the most computationally intensive parts of MDin particular, the electrostatic interactionsare also the most well established and unlikely to change as force field models evolve, making them particularly amenable to hardware acceleration. Dramatically accelerating MD simulation, however, requires that we accelerate more than just an "inner loop."
Calculation of electrostatic and van der Waals forces accounts for roughly 90% of the computational time for a representative MD simulation on a single general-purpose processor. Amdahl's law states that no matter how much we accelerate this calculation, the remaining computations, left unaccelerated, would limit our maximum speedup to a factor of 10. Hence, we dedicated a significant fraction of silicon area to accelerating other tasks, such as bond force computation, constraint computation, and velocity and position updates, incorporating programmability as appropriate to accommodate a variety of force fields and integration methods.
4. System Architecture
The building block of the system is a node, depicted in Figure 2. Each node comprises an MD-specific ASIC, attached DRAM, and six ports to the system-wide interconnection network. Each ASIC has four major subsystems, which are described briefly in this section. The nodes, which are logically identical, are connected in a three-dimensional torus topology (which maps naturally to the periodic boundary conditions frequently used in MD simulations). The initial version of Anton will be a 512-node torus with eight nodes in each dimension, but our architecture also supports larger and smaller toroidal configurations. The ASICs are clocked at a modest 400 MHz, with the exception of one double-clocked component in the high-throughput interaction subsystem (HTIS), discussed in the following section.
4.1. High-Throughput Interaction Subsystem
The HTIS calculates range-limited interactions and performs charge spreading and force interpolation. The HTIS, whose internal structure is shown in Figure 3, applies massive parallelism to these operations, which constitute the bulk of the calculation in MD. It provides tremendous arithmetic throughput using an array of 32pairwisepoint interaction modules (PPIMs) (Figure 3), each of which includes a force calculation pipeline that runs at 800 MHz and is capable of computing the combined electrostatic and van der Waals interactions between a pair of atoms at every cycle. This 26-stage pipeline (Figure 4) includes adders, multipliers, function evaluation units, and other specialized datapath elements. Inside this pipeline, we use customized numerical precisions: functional unit width varies across the different pipeline stages but still produces a sufficiently accurate 32-bit result.
In order to keep the pipelines busy with useful computation, the remainder of the HTIS must determine pairs of atoms that need to interact, feed them to the pipelines, and aggregate the pipelines' outputs. This proves a formidable challenge given communication bandwidth limitations between ASICs, between the HTIS and other subsystems on the same ASIC, and between pipelines within the HTIS. We address this problem using an architecture tailored for direct product selection reduction operations (DPSRs), which take two sets of points and perform computation proportional to the product of the set sizes but only require input and output volume proportional to the sum of their sizes. The HTIS considers interactions between all atoms in a region called the tower and all atoms in a region called the plate. Each atom in the tower is assigned to one PPIM, while each atom in the plate streams by all the PPIMs. Eight match units in each PPIM perform several tests, including a low-precision distance check, to determine which pairs of plate and tower particles are fed to the force calculation pipeline. Because the HTIS is a streaming architecture, with no feedback in its computational path, it is simple to scale the PPIM array to any number of PPIMs. The HTIS also includes an interaction control block processor, which controls the flow of data through the HTIS. More detail about the HTIS and about DPSR operations can be found in the proceedings of this years's HPCA conference. 13
The PPIMs are the most hard-wired component of our architecture, reflecting the fact that they handle the most computationally intensive parts of the MD calculation. That said, even the PPIMs include programmability where we anticipate potential future changes to force fields. For instance, the functional forms for van der Waals and electrostatic interactions are specified using SRAM lookup tables, whose contents are determined at runtime.
4.2. Flexible Subsystem
The flexible subsystem controls the ASIC and handles all other computations, including the bond force calculations, the FFT, and integration. Figure 5 shows the components of the flexible subsystem. Four identical processing slices form the core of the flexible subsystem. Each slice comprises a general-purpose core with its caches, a remote access unit (RAU) that performs autonomous data transfers, and two geometry cores (GCs), which are programmable cores that perform most of the flexible subsystem's computation. The RAU is a programmable data transfer engine that enables the flexible subsystem to participate in "push" communication, both offloading messages sent from the processor cores and tracking incoming messages to determine when work is ready to be done. Each GC is a dual-issue, statically scheduled, 4-way SIMD processor with pipelined multiply accumulate support and instruction set extensions to support common MD calculations. Other components of the flexible subsystem include a correction pipeline, which computes force correction terms a racetrack, which serves as a local, internal interconnect for the flexible subsystem components and a ring interface unit, which allows the flexible subsystem components to transfer packets to and from the communication subsystem. More detail about the flexible subsystem is given in a second paper at this year's HPCA conference. 12
4.3. Communication Subsystem
The communication subsystem provides high-speed, low-latency communication both between ASICs and among the subsystems within an ASIC. Between chips, each torus link provides 5.3 GB/s full-duplex communication with a hop latency around 50 ns. Within a chip, two 256-bit, 400 MHz communication rings link all subsystems and the six inter-chip torus ports. The communication subsystem supports efficient multicast, provides flow control, and provides class-based admission control with rate metering. The communication subsystem also allows access to an external host computer system for input and output of simulation data.
The memory subsystem provides access to the ASIC's attached DRAM. In addition to basic memory read//write access, the memory subsystem supports accumulation and synchronization. Special memory write operations numerically add incoming write data to the contents of the memory location specified in the operation. These operations implement force, energy, potential, and spread charge accumulations, reducing the computation and communication load on the flexible subsystem. By taking advantage of the attached DRAM, Anton will be able to simulate chemical systems with billions of atoms.
5. Performance and Accuracy Measurements
In this section, we show that the performance of Anton significantly exceeds that of other MD platforms, and that Anton is capable of performing simulations of high numerical accuracy. Because we do not yet have a working 512-node segment, performance estimates for our machine come from our performance simulator. The cycle fidelity of this simulator varies across components, but we expect overall fidelity better than ±20%.
5.1. Performance Comparison
We compare the performance of various MD platforms in terms of simulation rate (nanoseconds of simulated time per day of execution) on a particular chemical system. In this section and in Section 5.2, we use a system with 23,558 atoms in a cubic box measuring 62.2Å on a side. This system represents dihydrofolate reductase (DHFR), a protein targeted by various cancer drugs, surrounded by water.
The highest-performing MD codes achieve a simulation rate of a few nanoseconds per day for DHFR on a single state-of-the-art commodity processor core. 8 Existing multiprocessor machines with high-performance interconnects achieve simulation rates up to a few hundred nanoseconds per day using many hundreds or thousands of processor cores. 2,3,5
We expect a 512-node Anton system to achieve a simulation rate of approximately 14,500 nanoseconds per day for DHFR, enabling a millisecond simulation in just over two months. While the performance of general-purpose machines will undoubtedly continue to improve, Anton's performance advantage over other MD platforms significantly exceeds the speedup predicted by Moore's law over the next few years. A more detailed performance comparison of Anton and other MD platforms is given in the proceedings of last year's ISCA conference. 20
To quantify the accuracy of force computation on Anton, we measured the relative rms force error, defined as the rms error in the force on all particles divided by the rms force. 18 For the DHFR system with typical simulation parameters, Anton achieves a relative rms force error of 1.5 × 10 -4 . A relative rms force error below 10 -3 is generally considered sufficiently accurate for biomolecular MD simulations. 25
We also measured energy drift to quantify the overall accuracy of our simulations. An exact MD simulation would conserve energy exactly. Errors in the simulation generally lead to an increase in the overall energy of the simulated system with time, a phenomenon Renown as energy drift. We measured energy drift over 5 ns of simulated time (2 million time steps) for DHFR using a bit-accurate numerical emulator that exactly duplicates Anton's arithmetic. While the simulation exhibited short-term energy fluctuations of a few kcal/mol (about 0.001% of the total system energy), there was no detectable long-term trend in total energy. MD studies are generally considered more than adequate even with a significantly higher energy drift. 24
5.3. Scaling with Chemical System Size
Figure 6 shows the scaling of performance with chemical system size. Within the range where chemical systems fit in on-chip memory, we expect performance to scale roughly linearly with the number of atoms, albeit with occasional jumps as different operating parameters change to optimize performance while maintaining accuracy. The largest discontinuity in simulation rate occurs at a system volume of approximately 500,000 Å 3 when we change from a 32 × 32 × 32 FFT grid to a 64 × 64 × 64 FFT grid, reflecting the fact that our code supports only power-of-two-length FFTs. This lengthens the long-range calculation because the number of grid points increases by a factor of 8. Overall, the results are consistent with supercomputer scaleup studiesas we increase chemical system size, Anton's efficiency improves because of better overlap of communication and computation, and because calculation pipelines operate closer to peak efficiency.
We are currently in the process of building a specialized, massively parallel machine, called Anton, for the high-speed execution of MD simulations. We expect Anton to be capable of simulating the dynamic, atomic-level behavior of proteins and other biological macromolecules in an explicitly represented solvent environment for periods on the order of a millisecondabout three orders of magnitude beyond the reach of current MD simulations. The machine uses specialized ASICs, each of which performs a very large number of application-specific calculations during each clock cycle. Novel architectural and algorithmic techniques are used to minimize intra- and inter-chip communication, providing an unusually high degree of scalability.
While it contains programmable elements that could in principle support the parallel execution of algorithms for a wide range of other applications, Anton was not designed to function as a general-purpose scientific supercomputer, and would not in practice be well suited for such a role. Rather, we envision Anton serving as a computational microscope, allowing researchers to observe for the first time a wide range of biologically important structures and processes that have thus far proven inaccessible to both computational modeling and laboratory experiments.
1. Adcock, S.A. and McCammon, J.A. Molecular dynamics: Survey of methods for simulating the activity of proteins. Chemical Review, 106:15891615, 2006.
2. Bhatele, A., Kumar, S., Mei, C., Phillips, J.C., Zheng, G., and Kale, L.V. Overcoming scaling challenges in biomolecular simulations across multiple platforms, to appear in Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, FL, 2008.
3. Bowers, K.J., Chow, E., Xu, H., Dror, R.O., Eastwood, M.P., Gregersen, B.A., Klepeis, J.L., Kolossvary, I., Moraes, M.A., Sacerdoti, F.D., Salmon, J.K., Shan, Y., and Shaw, D.E. Scalable algorithms for molecular dynamics simulations on commodity clusters. Proceedings of the ACM//IEEE Conference on Supercomputing (SC06). Tampa, FL, 2006.
4. Bowers, K.J., Dror, R.O., and Shaw, D.E. Zonal methods for the parallel execution of range-limited N-body problems. Journal of Computational Physics, 221(1):303329, 2007.
5. Fitch, B.G., Rayshubskiy, A., Eleftheriou, M., Ward, T.J.C., Giampapa, M.E., Pitman, M.C., Pitera, J.W., Swope, W.C., and Germain, R.S. Blue matter: scaling of N-body simulations to one atom per node. IBM Journal of Research and Development, 52(1/2), 2008.
6. Fine, R.D., Dimmler, G., and Levinthal, C. FASTRUN: A special purpose, hardwired computer for molecular simulation. Proteins: Structure, Function, and Genetics, 11(4):242253, 1991 (erratum: 14(3):421422, 1992).
7. Germain, R.S., Fitch, B., Rayshubskiy, A., Eleftheriou, M., Pitman, M.C., Suits, F., Giampapa, M., and Ward, T.J.C. Blue matter on blue gene/L: Massively parallel computation for biomolecular simulation. Proceedings of the Third IEEE/ACM/ IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES + ISSS '05), New York, NY, 2005.
8. Hess, B., Kutzner, C., van der Spoel, D., and Lindahl, E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation, 4(2):435447, 2008.
9. Jorgensen, W.L., Maxwell, D.S., and Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. Journal of the American Chemical Society, 118(45):1122511236, 1996.
10. Kalé, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., Phillips, J., Shinozaki, A., Varadarajan, K., and Schulten, K., NAMD2: Greater scalability for parallel molecular dynamics. Journal of Computational Physics, 151(1):283312, 1999.
11. Kollman, P.A., Dixon, R.W., Cornell, W.D., Fox, T., Chipot, C., and Pohorille, A. The development/ application of a "Minimalist" organic/ biomolecular mechanic forcefield using a combination of ab initio calculations and experimental data, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, van Gunsteren, W.F. and Weiner, P.K. eds., Dordrecht, Netherlands:ESCOM, pp. 8396, 1997.
12. Kuskin, J.S., Young, C., Grossman, J.P., Batson, B., Deneroff, M.M., Dror, R.O., and Shaw, D.E. Incorporating flexibility in Anton, a specialized machine for molecular dynamics simulation. Proceedings of the 14th International Symposium on HighPerformance Computer Architecture (HPCA-14), Salt Lake City, UT, 2008.
13. Larson, R.H., Salmon, J.K., Dror, R.O., Deneroff, M.M., Young. C., Grossman, J.P., Shan, Y., Klepeis, J.L., and Shaw, D.E. High-throughput pairwise point interactions in Anton, a specialized machine for molecular dynamics simulation. Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA-14), Salt Lake City, UT, 2008.
14. Liem, S.Y., Brown, D., and Clarke, J.H.R. Molecular dynamics simulations on distributed memory machines. Computer Physics Communications, 67(2):261267, 1991.
15. MacKerell, A.D. Jr., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D., Kuchnir, L., Kuczera, K., Lau, F.T.K., Mattos, C., Michnick, S., Ngo, T., Nguyen, D.T., Prodhom, B., Reiher, III, W.E., Roux, B., Schlenkrich, M., Smith, J.C., Stote, R., Straub, J., Watanabe, M., Wiórkiewicz-Kuczera, J., Yin, D., and Karplus, M.J. All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B, 102(18):35863616, 1998.
16. Pande, V.S., Baker, I., Chapman, J., Elmer, S.P., Khaliq, S., Larson, S.M., Rhee, Y.M., Shirts, M.R., Snow, C.D., Sorin, E.J., and Zagrovic, B. Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. Biopolymers, 68(1):91109, 2003.
17. Plimpton, S.J., Attaway, S., Hendrickson, B., Swegle, J., Vaughan, C., and Gardner, D. Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics. Proceedings of the ACM//IEEE Conference on Supercomputing (Supercomputing '96), Pittsburgh, PA, 1996.
18. Shan, Y., Klepeis, J.L., Eastwood, M.P., Dror, R.O., and Shaw, D.E. Gaussian split Ewald: A fast Ewald mesh method for molecular simulation. Journal of Chemical Physics, 122:054101, 2005.
19. Shaw, D.E. A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions. Journal of Computational Chemistry, 26(13):13181328, 2005.
20. Shaw, D.E., Deneroff, M.M., Dror, R.O., Kuskin, J.S., Larson, R.H., Salmon, J.K., Young, C., Batson, B., Bowers, K.J., Chao, J.C., Eastwood, M.P., Gagliardo, J., Grossman, J.P., Ho, C.R., lerardi, D.J., Kolossváry, I., Klepeis, J.L., Layman, T., McLeavey, C., Moraes, M.A., Mueller, R., Priest, E.C., Shan, Y., Spengler, J., Theobald, M., Towles, B., and Wang, S.C., Anton, a special purpose machine for molecular dynamics simulation. Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA '07), San Diego, CA, 2007.
21. Snir, M. A Note on N-body computations with cutoffs. Theory of Computing Systems, 37:295318, 2004.
22. Taiji, M., Narumi, T., Ohno, Y., Futatsugi, N., Suenaga, A., Takada, N., and Konagaya, A., Protein explorer: A petaflops special-purpose computer system for molecular dynamics simulations. Proceedings of the ACM/IEEE Conference on Supercomputing (SC03), Phoenix, AZ, 2003.
23. Toyoda, S., Miyagawa, H., Kitamura, K., Amisaki, T., Hashimoto, E., Ikeda, H., Kusumi, A., and Miyakawa, N. Development of MD engine: High-speed accelerator with parallel processor design for molecular dynamics simulations. Journal Computational Chemistry, 20(2): 185199, 1999.
3. Statistical Mechanics
Molecular dynamics simulations generate information at the microscopic level, including atomic positions and velocities. The conversion of this microscopic information to macroscopic observables such as pressure, energy, heat capacities, etc., requires statistical mechanics. Statistical mechanics is fundamental to the study of biological systems by molecular dynamics simulation. In this section, we provide a brief overview of some main topics. For more detailed information, refer to the numerous excellent books available on the subject.
Introduction to Statistical Mechanics:
In a molecular dynamics simulation, one often wishes to explore the macroscopic properties of a system through microscopic simulations, for example, to calculate changes in the binding free energy of a particular drug candidate, or to examine the energetics and mechanisms of conformational change. The connection between microscopic simulations and macroscopic properties is made via statistical mechanics which provides the rigorous mathematical expressions that relate macroscopic properties to the distribution and motion of the atoms and molecules of the N-body system molecular dynamics simulations provide the means to solve the equation of motion of the particles and evaluate these mathematical formulas. With molecular dynamics simulations, one can study both thermodynamic properties and/or time dependent (kinetic) phenomenon.
D. McQuarrie, Statistical Mechanics (Harper & Row, New York, 1976)
D. Chandler, Introduction to Modern Statistical Mechanics (Oxford University Press, New York, 1987)
Statistical mechanics is the branch of physical sciences that studies macroscopic systems from a molecular point of view. The goal is to understand and to predict macroscopic phenomena from the properties of individual molecules making up the system. The system could range from a collection of solvent molecules to a solvated protein-DNA complex. In order to connect the macroscopic system to the microscopic system, time independent statistical averages are often introduced. We start this discussion by introducing a few definitions.
The thermodynamic state of a system is usually defined by a small set of parameters, for example, the temperature, T, the pressure, P, and the number of particles, N. Other thermodynamic properties may be derived from the equations of state and other fundamental thermodynamic equations.
The mechanical or microscopic state of a system is defined by the atomic positions, q, and momenta, p these can also be considered as coordinates in a multidimensional space called phase space. For a system of N particles, this space has 6N dimensions. A single point in phase space, denoted by G , describes the state of the system. An ensemble is a collection of points in phase space satisfying the conditions of a particular thermodynamic state. A molecular dynamics simulations generates a sequence of points in phase space as a function of time these points belong to the same ensemble, and they correspond to the different conformations of the system and their respective momenta. Several different ensembles are described below.
An ensemble is a collection of all possible systems which have different microscopic states but have an identical macroscopic or thermodynamic state.
There exist different ensembles with different characteristics.
- Microcanonical ensemble (NVE) : The thermodynamic state characterized by a fixed number of atoms, N, a fixed volume, V, and a fixed energy, E. This corresponds to an isolated system.
- Canonical Ensemble (NVT): This is a collection of all systems whose thermodynamic state is characterized by a fixed number of atoms, N, a fixed volume, V, and a fixed temperature, T.
- Isobaric-Isothermal Ensemble (NPT): This ensemble is characterized by a fixed number of atoms, N, a fixed pressure, P, and a fixed temperature, T.
- Grand canonical Ensemble ( m VT): The thermodynamic state for this ensemble is characterized by a fixed chemical potential, m , a fixed volume, V, and a fixed temperature, T.
Calculating Averages from a Molecular Dynamics Simulation
An experiment is usually made on a macroscopic sample that contains an extremely large number of atoms or molecules sampling an enormous number of conformations. In statistical mechanics, averages corresponding to experimental observables are defined in terms of ensemble averages one justification for this is that there has been good agreement with experiment. An ensemble average is average taken over a large number of replicas of the system considered simultaneously.
In statistical mechanics, average values are defined as ensemble averages.
The ensemble average is given by
is the observable of interest and it is expressed as a function of the momenta, p, and the positions, r, of the system. The integration is over all possible variables of r and p.
The probability density of the ensemble is given by
where H is the Hamiltonian, T is the temperature, kB is Boltzmanns constant and Q is the partition function
This integral is generally extremely difficult to calculate because one must calculate all possible states of the system. In a molecular dynamics simulation, the points in the ensemble are calculated sequentially in time, so to calculate an ensemble average, the molecular dynamics simulations must pass through all possible states corresponding to the particular thermodynamic constraints.
Another way, as done in an MD simulation, is to determine a time average of A, which is expressed as
where t is the simulation time, M is the number of time steps in the simulation and A(pN,rN) is the instantaneous value of A.
The dilemma appears to be that one can calculate time averages by molecular dynamics simulation, but the experimental observables are assumed to be ensemble averages. Resolving this leads us to one of the most fundamental axioms of statistical mechanics, the ergodic hypothesis, which states that the time average equals the ensemble average.
The basic idea is that if one allows the system to evolve in time indefinitely, that system will eventually pass through all possible states. One goal, therefore, of a molecular dynamics simulation is to generate enough representative conformations such that this equality is satisfied. If this is the case, experimentally relevant information concerning structural, dynamic and thermodynamic properties may then be calculated using a feasible amount of computer resources. Because the simulations are of fixed duration, one must be certain to sample a sufficient amount of phase space.
Some examples of time averages:
Average potential energy
where M is the number of configurations in the molecular dynamics trajectory and Vi is the potential energy of each configuration.
Average kinetic energy
where M is the number of configurations in the simulation, N is the number of atoms in the system, mi is the mass of the particle i and vi is the velocity of particle i.
A molecular dynamics simulation must be sufficiently long so that enough representative conformations have been sampled.
Why add hydrogens in molecular dynamics simulations? - Biology
Much of science can be explained by the movement and interaction of molecules. Molecular dynamics (MD) is a computational technique used to explore these phenomena, from noble gases to biological macromolecules. Molly.jl is a pure Julia package for MD, and for the simulation of physical systems more broadly.
At the minute the package is a proof of concept for MD in Julia. It is not production ready. It can simulate a system of atoms with arbitrary interactions as defined by the user. Implemented features include:
- Interface to allow definition of new forces, simulators, thermostats, neighbor finders, loggers etc.
- Read in pre-computed Gromacs topology and coordinate files with the OPLS-AA forcefield and run MD on proteins with given parameters. In theory it can do this for any regular protein, but in practice this is untested.
- Non-bonded interactions - Lennard-Jones Van der Waals/repulsion force, electrostatic Coulomb potential, gravitational potential, soft sphere potential, Mie potential.
- Bonded interactions - covalent bonds, bond angles, torsion angles.
- Andersen thermostat.
- Velocity Verlet and velocity-free Verlet integration.
- Explicit solvent.
- Periodic boundary conditions in a cubic box.
- Neighbor list to speed up calculation of non-bonded forces.
- Automatic multithreading.
- GPU acceleration on CUDA-enabled devices.
- Run with Float64 or Float32.
- Some analysis functions, e.g. RDF.
- Physical agent-based modelling.
- Visualise simulations as animations.
- Differentiable molecular simulation on an experimental branch - see the relevant docs.
Features not yet implemented include:
- Protein force fields other than OPLS-AA.
- Water models.
- Energy minimisation.
- Other temperature or pressure coupling methods.
- Cell-based neighbor list.
- Protein preparation - solvent box, add hydrogens etc.
- Trajectory/topology file format readers/writers.
- Quantum mechanical modelling.
- High test coverage.
Julia is required, with Julia v1.6 or later required to get the latest version of Molly. Install Molly from the Julia REPL. Enter the package mode by pressing ] and run add Molly .
Some examples are given here, see the documentation for more on how to use the package.
Simulation of a Lennard-Jones gas:
The above 1 ps simulation looks something like this when you view it in VMD:
Molecular Dynamics Simulations of the Gramicidin Channel
Many bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated (Cas) systems employ the dual RNA–guided DNA endonuclease Cas9 to defend against invading phages and conjugative plasmids by introducing site-specific . Read More
Supplemental Videos 1 and 2 Read More
Figure 1: CRISPR–Cas9-mediated DNA interference in bacterial adaptive immunity. (a) A typical CRISPR locus in a type II CRISPR–Cas system comprises an array of repetitive sequences (repeats, brown dia.
Figure 2: The mechanism of CRISPR–Cas9–mediated genome engineering. The synthetic sgRNA or crRNA–tracrRNA structure directs a Cas9 endonuclease to almost arbitrary DNA sequence in the genome through a.
Figure 3: Overall structure of Streptococcus pyogenes Cas9 (SpyCas9) in the apo state. (a) Ribbon representation of the crystal structure of SpyCas9 (PDB ID 4CMP). Individual Cas9 domains are colored .
Figure 4: Guide RNA loading enables Cas9 to form a DNA recognition–competent conformation for target search. (a) Ribbon diagram showing the apo structure of SpyCas9 aligned in the same orientation as .
Figure 5: Structures of CRISPR–Cas9 bound to DNA substrates, showing the same view as in Figure 4c after superposition. (a) Crystal structure of SpyCas9 (surface representation) in complex with sgRNA .
Figure 6: Schematic representations of the proposed mechanisms of CRISPR–Cas9-mediated target DNA recognition and cleavage. Upon sgRNA loading, Cas9 undergoes a large conformational rearrangement to r.
Figure 7: Structures of Cas9 orthologs reveal both the conserved and divergent structural features among orthologous CRISPR–Cas9 systems. Individual Cas9 domains are colored according to the scheme in.