Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm
Biological macromolecules function in highly crowded cellular environments. The structure and dynamics of proteins and nucleic acids are well characterized in vitro, but in vivo crowding effects remain unclear. Using molecular dynamics simulations of a comprehensive atomistic model cytoplasm we found that protein-protein interactions may destabilize native protein structures, whereas metabolite interactions may induce more compact states due to electrostatic screening. Protein-protein interactions also resulted in significant variations in reduced macromolecular diffusion under crowded conditions, while metabolites exhibited significant two-dimensional surface diffusion and altered protein-ligand binding that may reduce the effective concentration of metabolites and ligands in vivo. Metabolic enzymes showed weak non-specific association in cellular environments attributed to solvation and entropic effects. These effects are expected to have broad implications for the in vivo functioning of biomolecules. This work is a first step towards physically realistic in silico whole-cell models that connect molecular with cellular biology.
Graphics Processing Unit Acceleration and Parallelization of GENESIS for Large-Scale Molecular Dynamics Simulations
The graphics processing unit (GPU) has become a popular computational platform for molecular dynamics (MD) simulations of biomolecules. A significant speedup in the simulations of small- or medium-size systems using only a few computer nodes with a single or multiple GPUs has been reported. Because of GPU memory limitation and slow communication between GPUs on different computer nodes, it is not straightforward to accelerate MD simulations of large biological systems that contain a few million or more atoms on massively parallel supercomputers with GPUs. In this study, we develop a new scheme in our MD software, GENESIS, to reduce the total computational time on such computers. Computationally intensive real-space nonbonded interactions are computed mainly on GPUs in the scheme, while less intensive bonded interactions and communication-intensive reciprocal-space interactions are performed on CPUs. On the basis of the midpoint cell method as a domain decomposition scheme, we invent the single particle interaction list for reducing the GPU memory usage. Since total computational time is limited by the reciprocal-space computation, we utilize the RESPA multiple time-step integration and reduce the CPU resting time by assigning a subset of nonbonded interactions on CPUs as well as on GPUs when the reciprocal-space computation is skipped. We validated our GPU implementations in GENESIS on BPTI and a membrane protein, porin, by MD simulations and an alanine-tripeptide by REMD simulations. Benchmark calculations on TSUBAME supercomputer showed that an MD simulation of a million atoms system was scalable up to 256 computer nodes with GPUs.
Dimensionality of Collective Variables for Describing Conformational Changes of a Multi-Domain Protein
Collective variables (CVs) are often used in molecular dynamics simulations based on enhanced sampling algorithms to investigate large conformational changes of a protein. The choice of CVs in these simulations is essential because it affects simulation results and impacts the free-energy profile, the minimum free-energy pathway (MFEP), and the transition-state structure. Here we examine how many CVs are required to capture the correct transition-state structure during the open-to-close motion of adenylate kinase using a coarse-grained model in the mean forces string method to search the MFEP. Various numbers of large amplitude principal components are tested as CVs in the simulations. The incorporation of local coordinates into CVs, which is possible in higher dimensional CV spaces, is important for capturing a reliable MFEP. The Bayesian measure proposed by Best and Hummer is sensitive to the choice of CVs, showing sharp peaks when the transition-state structure is captured. We thus evaluate the required number of CVs needed in enhanced sampling simulations for describing protein conformational changes.
Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
Domain Motion Enhanced (DoME) Model for Efficient Conformational Sampling of Multidomain Proteins
Large conformational changes of multidomain proteins are difficult to simulate using all-atom molecular dynamics (MD) due to the slow time scale. We show that a simple modification of the structure-based coarse-grained (CG) model enables a stable and efficient MD simulation of those proteins. “Motion Tree”, a tree diagram that describes conformational changes between two structures in a protein, provides information on rigid structural units (domains) and the magnitudes of domain motions. In our new CG model, which we call the DoME (domain motion enhanced) model, interdomain interactions are defined as being inversely proportional to the magnitude of the domain motions in the diagram, whereas intradomain interactions are kept constant. We applied the DoME model in combination with the Go model to simulations of adenylate kinase (AdK). The results of the DoME–Go simulation are consistent with an all-atom MD simulation for 10 μs as well as known experimental data. Unlike the conventional Go model, the DoME–Go model yields stable simulation trajectories against temperature changes and conformational transitions are easily sampled despite domain rigidity. Evidently, identification of domains and their interfaces is useful approach for CG modeling of multidomain proteins.
Sequential data assimilation for single-molecule FRET photon-counting data
Data assimilation is a statistical method designed to improve the quality of numerical simulations in combination with real observations. Here, we develop a sequential data assimilation method that incorporates one-dimensional time-series data of smFRET (single-molecule Förster resonance energy transfer) photon-counting into conformational ensembles of biomolecules derived from “replicated” molecular dynamics (MD) simulations. A particle filter using a large number of “replicated” MD simulations with a likelihood function for smFRET photon-counting data is employed to screen the conformational ensembles that match the experimental data. We examine the performance of the method using emulated smFRET data and coarse-grained (CG) MD simulations of a dye-labeled polyproline-20. The method estimates the dynamics of the end-to-end distance from smFRET data as well as revealing that of latent conformational variables. The particle filter is also able to correct model parameter dependence in CG MD simulations. We discuss the applicability of the method to real experimental data for conformational dynamics of biomolecules.
Replica state exchange metadynamics for improving the convergence of free energy estimates
Metadynamics (MTD) is a powerful enhanced sampling method for systems with rugged energy landscapes. It constructs a bias potential in a predefined collective variable (CV) space to overcome barriers between metastable states. In bias-exchange MTD (BE-MTD), multiple replicas approximate the CV space by exchanging bias potentials (replica conditions) with the Metropolis–Hastings (MH) algorithm. We demonstrate that the replica-exchange rates and the convergence of free energy estimates of BE-MTD are improved by introducing the infinite swapping (IS) or the Suwa-Todo (ST) algorithms. Conceptually, IS and ST perform transitions in a replica state space rather than exchanges in a replica condition space. To emphasize this, the proposed scheme is called the replica state exchange MTD (RSE-MTD). Benchmarks were performed with alanine polypeptides in vacuum and water. For the systems tested in this work, there is no significant performance difference between IS and ST.
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations
GENESIS (Generalized‐Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all‐atom force‐field models as well as coarse‐grained Go‐like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large‐scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T‐REMD, REUS, multi‐dimensional REMD for both all‐atom and Go‐like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three‐dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real‐space and reciprocal‐space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models.
Hierarchical domain-motion analysis of conformational changes in sarcoplasmic reticulum Ca2+-ATPase
Sarco(endo)plasmic reticulum Ca2+-ATPase transports two Ca2+ per ATP-hydrolyzed across biological membranes against a large concentration gradient by undergoing large conformational changes. Structural studies with X-ray crystallography revealed functional roles of coupled motions between the cytoplasmic domains and the transmembrane helices in individual reaction steps. Here, we employed “Motion Tree (MT),” a tree diagram that describes a conformational change between two structures, and applied it to representative Ca2+-ATPase structures. MT provides information of coupled rigid-body motions of the ATPase in individual reaction steps. Fourteen rigid structural units, “common rigid domains (CRDs)” are identified from seven MTs throughout the whole enzymatic reaction cycle. CRDs likely act as not only the structural units, but also the functional units. Some of the functional importance has been newly revealed by the analysis. Stability of each CRD is examined on the morphing trajectories that cover seven conformational transitions. We confirmed that the large conformational changes are realized by the motions only in the flexible regions that connect CRDs. The Ca2+-ATPase efficiently utilizes its intrinsic flexibility and rigidity to response different switches like ligand binding/dissociation or ATP hydrolysis. The analysis detects functional motions without extensive biological knowledge of experts, suggesting its general applicability to domain movements in other membrane proteins to deepen the understanding of protein structure and function.
CHARMM Force-Fields with Modified Polyphosphate Parameters Allow Stable Simulation of the ATP-Bound Structure of Ca2+-ATPase
Adenosine triphosphate (ATP) is an indispensable energy source in cells. In a wide variety of biological phenomena like glycolysis, muscle contraction/relaxation, and active ion transport, chemical energy released from ATP hydrolysis is converted to mechanical forces to bring about large-scale conformational changes in proteins. Investigation of structure–function relationships in these proteins by molecular dynamics (MD) simulations requires modeling of ATP in solution and ATP bound to proteins with accurate force-field parameters. In this study, we derived new force-field parameters for the triphosphate moiety of ATP based on the high-precision quantum calculations of methyl triphosphate. We tested our new parameters on membrane-embedded sarcoplasmic reticulum Ca2+-ATPase and four soluble proteins. The ATP-bound structure of Ca2+-ATPase remains stable during MD simulations, contrary to the outcome in shorter simulations using original parameters. Similar results were obtained with the four ATP-bound soluble proteins. The new force-field parameters were also tested by investigating the range of conformations sampled during replica-exchange MD simulations of ATP in explicit water. Modified parameters allowed a much wider range of conformational sampling compared with the bias toward extended forms with original parameters. A diverse range of structures agrees with the broad distribution of ATP conformations in proteins deposited in the Protein Data Bank. These simulations suggest that the modified parameters will be useful in studies of ATP in solution and of the many ATP-utilizing proteins.
Midpoint Cell Method for Hybrid (MPI+OpenMP) Parallelization of Molecular Dynamics Simulations
We have developed a new hybrid (MPI+OpenMP) parallelization scheme for molecular dynamics (MD) simulations by combining a cell-wise version of the midpoint method with pair-wise Verlet lists. In this scheme, which we call the midpoint cell method, simulation space is divided into subdomains, each of which is assigned to a MPI processor. Each subdomain is further divided into small cells. The interaction between two particles existing in different cells is computed in the subdomain containing the midpoint cell of the two cells where the particles reside. In each MPI processor, cell pairs are distributed over OpenMP threads for shared memory parallelization. The midpoint cell method keeps the advantages of the original midpoint method, while filtering out unnecessary calculations of midpoint checking for all the particle pairs by single midpoint cell determination prior to MD simulations. Distributing cell pairs over OpenMP threads allows for more efficient shared memory parallelization compared with distributing atom indices over threads. Furthermore, cell grouping of particle data makes better memory access, reducing the number of cache misses. The parallel performance of the midpoint cell method on the K computer showed scalability up to 512 and 32,768 cores for systems of 20,000 and 1 million atoms, respectively. One MD time step for long-range interactions could be calculated within 4.5 ms even for a 1 million atoms system with particle-mesh Ewald electrostatics.
Surface-tension replica-exchange molecular dynamics method for enhanced sampling of biological membrane systems
Conformational sampling is fundamentally important for simulating complex bio-molecular systems. The generalized-ensemble algorithm, especially the temperature replica-exchange molecular dynamics method (T-REMD), is one of the most powerful methods to explore structures of bio-molecules such as proteins, nucleic acids, carbohydrates, and also of lipid membranes. T-REMD simulations have focused on soluble proteins rather than membrane proteins or lipid bilayers, because explicit membranes do not keep their structural integrity at high temperature. Here, we propose a new generalized-ensemble algorithm for membrane systems, which we call the surface-tension REMD method. Each replica is simulated in the NPγT ensemble, and surface tensions in a pair of replicas are exchanged at certain intervals to enhance conformational sampling of the target membrane system. We test the method on two biological membrane systems: a fully hydrated DPPC (1,2-dipalmitoyl-sn-glycero-3-phosphatidylcholine) lipid bilayer and a WALP23–POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) membrane system. During these simulations, a random walk in surface tension space is realized. Large-scale lateral deformation (shrinking and stretching) of the membranes takes place in all of the replicas without collapse of the lipid bilayer structure. There is accelerated lateral diffusion of DPPC lipid molecules compared with conventional MD simulation, and a much wider range of tilt angle of the WALP23 peptide is sampled due to large deformation of the POPC lipid bilayer and through peptide-lipid interactions. Our method could be applicable to a wide variety of biological membrane systems.
Efficient lookup table using a linear function of inverse distance squared
The major bottleneck in molecular dynamics (MD) simulations of biomolecules exist in the calculation of pairwise nonbonded interactions like Lennard-Jones and long-range electrostatic interactions. Particle-mesh Ewald (PME) method is able to evaluate long-range electrostatic interactions accurately and quickly during MD simulation. However, the evaluation of energy and gradient includes time-consuming inverse square roots and complementary error functions. To avoid such time-consuming operations while keeping accuracy, we propose a new lookup table for short-range interaction in PME by defining energy and gradient as a linear function of inverse distance squared. In our lookup table approach, densities of table points are inversely proportional to squared pair distances, enabling accurate evaluation of energy and gradient at small pair distances. Regardless of the inverse operation here, the new lookup table scheme allows fast pairwise nonbonded calculations owing to efficient usage of cache memory.
Reduced Native State Stability in Crowded Cellular Environment Due to Protein–Protein Interactions
The effect of cellular crowding environments on protein structure and stability is a key issue in molecular and cellular biology. The classical view of crowding emphasizes the volume exclusion effect that generally favors compact, native states. Here, results from molecular dynamics simulations and NMR experiments show that protein crowders may destabilize native states via protein–protein interactions. In the model system considered here, mixtures of villin head piece and protein G at high concentrations, villin structures become increasingly destabilized upon increasing crowder concentrations. The denatured states observed in the simulation involve partial unfolding as well as more subtle conformational shifts. The unfolded states remain overall compact and only partially overlap with unfolded ensembles at high temperature and in the presence of urea. NMR measurements on the same systems confirm structural changes upon crowding based on changes of chemical shifts relative to dilute conditions. An analysis of protein–protein interactions and energetic aspects suggests the importance of enthalpic and solvation contributions to the crowding free energies that challenge an entropic-centered view of crowding effects.
Improved constrained optimization method for reaction-path determination in the generalized hybrid orbital quantum mechanical/molecular mechanical calculations
The nudged elastic band (NEB) and string methods are widely used to obtain the reaction path of chemical reactions and phase transitions. In these methods, however, it is difficult to define an accurate Lagrangian to generate the conservative forces. On the other hand, the constrained optimization with locally updated planes (CO-LUP) scheme defines target function properly and suitable for micro-iteration optimizations in quantum mechanical/molecular mechanical (QM/MM) systems, which uses the efficient second order QM optimization. However, the method does have problems of inaccurate estimation of reactions and inappropriate accumulation of images around the energy minimum. We introduce three modifications into the CO-LUP scheme to overcome these problems: (1) An improved tangent estimation of the reaction path, which is used in the NEB method, (2) redistribution of images using an energy-weighted interpolation before updating local tangents, and (3) reduction of the number of constraints, in particular translation/rotation constraints, for improved convergence. First, we test the method on the isomerization of alanine dipeptide without QM/MM calculation, showing that the method is comparable to the string method both in accuracy and efficiency. Next, we apply the method for defining the reaction paths of the rearrangement reaction catalyzed by chorismate mutase (CM) and of the phosphoryl transfer reaction catalyzed by cAMP-dependent protein kinase (PKA) using generalized hybrid orbital QM/MM calculations. The reaction energy barrier of CM is in high agreement with the experimental value. The path of PKA reveals that the enzyme reaction is associative and there is a late transfer of the substrate proton to Asp 166, which is in agreement with the recently published result using the NEB method.
Minimum Free Energy Path of Ligand-Induced Transition in Adenylate Kinase
Large-scale conformational changes in proteins involve barrier-crossing transitions on the complex free energy surfaces of high-dimensional space. Such rare events cannot be efficiently captured by conventional molecular dynamics simulations. Here we show that, by combining the on-the-fly string method and the multi-state Bennett acceptance ratio (MBAR) method, the free energy profile of a conformational transition pathway in Escherichia coli adenylate kinase can be characterized in a high-dimensional space. The minimum free energy paths of the conformational transitions in adenylate kinase were explored by the on-the-fly string method in 20-dimensional space spanned by the 20 largest-amplitude principal modes, and the free energy and various kinds of average physical quantities along the pathways were successfully evaluated by the MBAR method. The influence of ligand binding on the pathways was characterized in terms of rigid-body motions of the lid-shaped ATP-binding domain (LID) and the AMP-binding (AMPbd) domains. It was found that the LID domain was able to partially close without the ligand, while the closure of the AMPbd domain required the ligand binding. The transition state ensemble of the ligand bound form was identified as those structures characterized by highly specific binding of the ligand to the AMPbd domain, and was validated by unrestrained MD simulations. It was also found that complete closure of the LID domain required the dehydration of solvents around the P-loop. These findings suggest that the interplay of the two different types of domain motion is an essential feature in the conformational transition of the enzyme.
Protein Crowding Affects Hydration Structure and Dynamics
The effect of protein crowding on the structure and dynamics of water was examined from explicit solvent molecular dynamics simulations of a series of protein G and protein G/villin systems at different protein concentrations. Hydration structure was analyzed in terms of radial distribution functions, three-dimensional hydration sites, and preservation of tetrahedral coordination. Analysis of hydration dynamics focused on self-diffusion rates and dielectric constants as a function of crowding. The results show significant changes in both structure and dynamics of water under highly crowded conditions. The structure of water is altered mostly beyond the first solvation shell. Diffusion rates and dielectric constants are significantly reduced following linear trends as a function of crowding reflecting highly constrained water in crowded environments. The reduced dynamics of diffusion is expected to be strongly related to hydrodynamic properties of crowded cellular environments while the reduced dielectric constant under crowded conditions has implications for the stability of biomolecules in crowded environments. The results from this study suggest a prescription for modeling solvation in simulations of cellular environments.