• Structural and energetic analysis of metastable intermediate states in the E1P-E2P transition of Ca2+-ATPase.

    Chigusa Kobayashi, Yasuhiro Matsunaga, Jaewoon Jung, Yuji Sugita
    Sarcoplasmic reticulum (SR) Ca2+-ATPase transports two Ca2+ ions from the cytoplasm to the SR lumen against a large concentration gradient. X-ray crystallography has revealed the atomic structures of the protein before and after the dissociation of Ca2+, while biochemical studies have suggested the existence of intermediate states in the transition between E1P⋅ADP⋅2Ca2+ and E2P. Here, we explore the pathway and free energy profile of the transition using atomistic molecular dynamics simulations with the mean-force string method and umbrella sampling. The simulations suggest that a series of structural changes accompany the ordered dissociation of ADP, the A-domain rotation, and the rearrangement of the transmembrane (TM) helices. The luminal gate then opens to release Ca2+ ions toward the SR lumen. Intermediate structures on the pathway are stabilized by transient sidechain interactions between the A- and P-domains. Lipid molecules between TM helices play a key role in the stabilization. Free energy profiles of the transition assuming different protonation states suggest rapid exchanges between Ca2+ ions and protons when the Ca2+ ions are released toward the SR lumen.
  • Optimized hydrogen mass repartitioning scheme combined with accurate temperature/pressure evaluations from thermodynamic and kinetic properties of biological systems.

    Jaewoon Jung, Kento Kasahara, Chigusa Kobayashi, Hiraku Oshima, Takaharu Mori, Yuji Sugita
    In recent years, molecular dynamics (MD) simulations with larger time steps have been performed with the hydrogen-mass-repartitioning (HMR) scheme, where the mass of each hydrogen atom is increased to reduce high-frequency motion while the mass of a non-hydrogen atom bonded to a hydrogen atom is decreased to keep the total molecular mass unchanged. Here, we optimize the scaling factors in HMR and combine them with previously developed accurate temperature/pressure evaluations. The heterogeneous HMR scaling factors are useful to avoid the structural instability of amino acid residues having a five- or six-membered ring in MD simulations with larger time steps. It also reproduces kinetic properties, namely translational and rotational diffusions, if the HMR scaling factors are applied to only solute molecules. The integration scheme is tested for biological systems that include soluble/membrane proteins and lipid bilayers for about 200 μs MD simulations in total and give consistent results in MD simulations with both a small time step of 2.0 fs and a large, multiple time step integration with time steps of 3.5 fs (for fast motions) and 7.0 fs (for slower motions). We also confirm that the multiple time step integration scheme used in this study provides more accurate energy conservations than the RESPA/C1 and is comparable to the RESPA/C2 in NAMD. In summary, the current integration scheme combining the optimized HMR with accurate temperature/pressure evaluations can provide stable and reliable MD trajectories with a larger time step, which are computationally more than 2-fold efficient compared to the conventional methods.
  • New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems.

    Jaewoon Jung, Chigusa Kobayashi, Kento Kasahara, Cheng Tan, Akiyoshi Kuroda, Kazuo Minami, Shigeru Ishiduki, Tatsuo Nishiki, Hikaru Inoue, Yutaka Ishikawa, Michael Feig, Yuji Sugita
    In this paper, we address high performance extreme‐scale molecular dynamics (MD) algorithm in the GENESIS software to perform cellular‐scale molecular dynamics (MD) simulations with more than 100,000 CPU cores. It includes (1) the new algorithm of real‐space nonbonded interactions maximizing the performance on ARM CPU architecture, (2) reciprocal‐space nonbonded interactions minimizing communicational cost, (3) accurate temperature/pressure evaluations that allows a large time step, and (4) effective parallel file inputs/outputs (I/O) for MD simulations of extremely huge systems. The largest system that contains 1.6 billion atoms was simulated using MD with a performance of 8.30 ns/day on Fugaku supercomputer. It extends the available size and time of MD simulations to answer unresolved questions of biomacromolecules in a living cell.
  • Elucidation of Interactions Regulating Conformational Stability and Dynamics of SARS-CoV-2 S-Protein.

    Takahru Mori, Jaewoon Jung, Chigusa Kobayashi, Hisham M. Dokainish, Suyong Re, Yuji Sugita
    Ongoing COVID-19 pandemic caused by new coronavirus, SARS-CoV-2, calls for urgent developments of vaccines and antiviral drugs. The spike protein of SARS-CoV-2 (S-protein), which consists of trimeric polypeptide chains with glycosylated residues on the surface, triggers the virus entry into a host cell. Extensive structural and functional studies on this protein have rapidly advanced our understanding of the S-protein structure at atomic resolutions, while most of structural studies overlook the effect of glycans attached to S-protein on the conformational stability and functional motions between the inactive Down and the active Up forms. Here, we performed all-atom molecular dynamics (MD) simulations of both Down and Up forms of a fully glycosylated S-protein in solution as well as targeted MD (TMD) simulations between them to elucidate key inter-domain interactions for stabilizing each form and inducing the large-scale conformational transitions. The residue-level interaction analysis of the simulation trajectories detects distinct amino-acid residues and N-glycans as determinants on conformational stability of each form. During the conformational transitions between them, inter-domain interactions mediated by glycosylated residues are switched to play key roles on the stabilization of another form. Electrostatic interactions as well as hydrogen bonds between the three receptor binding domains work as driving forces to initiate the conformational transitions toward the active form. This study sheds light on the mechanisms underlying conformational stability and functional motions of S-protein, which are relevant for vaccines and antiviral drugs developments.


  • Group-based evaluation of temperature and pressure for molecular dynamics simulation with a large time step.

    Jaewoon Jung, Yuji Sugita
    Recently, we proposed novel temperature and pressure evaluations in molecular dynamics (MD) simulation to preserve the accuracy up to the third order of a time step, δt (Jung et al., J. Chem. Theory Comput. 15, 84-94 (2019) and Jung et al., J. Chem. Phys. 148, 164109 (2018)). These approaches allow us to extend δt of MD simulation in isothermal-isobaric condition up to 5 fs with velocity Verlet integrator. Here, we further improve the isothermal-isobaric MD integration by introducing the group-based evaluations of system temperature and pressure to our previous approach. The group-based scheme increases the accuracy even using inaccurate temperature and pressure evaluations by neglecting the high-frequency vibrational motions of hydrogen atoms. It also improves the overall performance by avoiding iterations in thermostat and barostat updates and by allowing a multiple time step (MTS) integration such as r-REPSA (reversible reference system propagation algorithm) with our proposed high-precision evaluations of temperature and pressure. Now, the improved integration scheme conserves physical properties of lipid-bilayer systems up to δt = 5 fs with velocity Verlet as well as δt = 3.5 fs for fast motions in r-RESPA, respectively.
  • A singularity-free torsion angle potential for coarse-grained molecular dynamics simulations.

    Cheng Tan, Jaewoon Jung, Chigusa Kobayashi, Yuji Sugita
    Conventional torsion angle potentials used in molecular dynamics (MD) have a singularity problem when three bonded particles are collinearly aligned. This problem is often encountered in coarse-grained (CG) simulations. Here, we propose a new form of the torsion angle potential, which introduces an angle-dependent modulating function. By carefully tuning the parameters for this modulating function, our method can eliminate the problematic angle-dependent singularity while being combined with existing models. As an example, we optimized the modulating function of the torsion angle potential for popular CG models of biomolecules based on the statistics over experimental structures deposited in the Protein Data Bank. By applying our method to designed and natural biomolecules, we show that the new torsion angle potential is able to eliminate the singularity problem while maintaining the structural features in the original models. Furthermore, by comparing our design with previous methods, we found that our new potential has advantages in computational efficiency and numerical stability. We strongly recommend the usage of our new potential in the CG simulations of flexible molecules.
  • Use of single-molecule time-series data for refining conformational dynamics in molecular simulations

    Yasuhiro Matsunaga, Yuji Sugita
    Atomically detailed description of conformational dynamics in biomolecules is often essential to understand biological functions. Combining experimental measurements with molecular simulations significantly improves the outcome. Ensemble refinements, where the simulations are utilized to refine ensemble averaged data in NMR, SAXS, or cryo-EM, are a popular approach in integrative structural biology. Single-molecule time-series data contain rich temporal information of biomolecular dynamics. However, direct usage of the time-series data together with molecular simulations is just beginning. Here, we review data-assimilation approaches linking molecular simulations with experimental time-series data and discuss current limitations and potential applications of this approach in integrative structural biology.
  • Free Energy Analysis of a Conformational Change of Heme ABC Transporter BhuUV-T.

    Koichi Tamura, Yuji Sugita
    The heme ATP-binding cassette (ABC) transporter BhuUV-T of bacterial pathogen Burkholderia cenocepacia is required to transport heme across the inner cell membrane. The current hypothesis is that the binding of two ATPs to the nucleotide-binding domains of the transporter drives the initial steps of the transport cycle in which the empty transport sites are reoriented from the cytosol to the periplasm. Molecular details are missing because the structure of a key occluded intermediate remains hypothetical. Here we perform molecular simulations to analyze the free energy surface (FES) of the first step of the reorientation, namely the transition from an open inward-facing (IF) transport site to an occluded (Occ) conformation. We have modeled the latter structure in silico in a previous study. A simple annealing procedure removes residual bias originating from non-equilibrium targeted molecular dynamics. The calculated FES reveals the role of the ATPs in inducing the IF → Occ conformational change and validates the modeled Occ conformation.
  • Role of the N-Terminal Transmembrane Helix Contacts in the Activation of FGFR3

    Daisuke Matsuoka, Motoshi Kamiya, Takeshi Sato, Yuji Sugita
    Fibroblast growth factor receptor 3 (FGFR3) is a member of receptor tyrosine kinases, which is involved in skeletal cell growth, differentiation, and migration. FGFR3 transduces biochemical signals from the extracellular ligand-binding domain to the intracellular kinase domain through the conformational changes of the transmembrane (TM) helix dimer. Here, we apply generalized replica exchange with solute tempering method to wild type (WT) and G380R mutant (G380R) of FGFR3. The dimer interface in G380R is different from WT and the simulation results are in good agreement with the solid-state nuclear magnetic resonance (NMR) spectroscopy. TM helices in G380R are extended more than WT, and thereby, G375 in G380R contacts near the N‐termini of the TM helix dimer. Considering that both G380R and G375C show the constitutive activation, the formation of the N‐terminal contacts of the TM helices can be generally important for the activation mechanism.


  • Building a macro-mixing dual-basin Gō model using the Multistate Bennett Acceptance Ratio

    Ai Shinobu, Chigusa Kobayashi, Yasuhiro Matsunaga, and Yuji Sugita
  • Chemo-Mechanical Coupling in the Transport Cycle of a Heme ABC Transporter

    Koichi Tamura, Hiroshi Sugimoto, Yoshitsugu Shiro, and Yuji Sugita
    The heme importer from pathogenic bacteria is a member of the ATP-binding cassette (ABC) transporter family, which uses the energy of ATP-binding and hydrolysis for extensive conformational changes. Previous studies have indicated that conformational changes after heme translocation are triggered by ATP-binding to nucleotide binding domains (NBDs) and then, in turn, induce conformational transitions of the transmembrane domains (TMDs). In this study, we applied a template-based iterative all-atom molecular dynamics (MD) simulation to predict the ATP-bound outward-facing conformation of the Burkholderia cenocepacia heme importer BhuUV-T. The resulting model showed a stable conformation of the TMD with the cytoplasmic gate in the closed state and the periplasmic gate in the open state. Furthermore, targeted MD simulation predicted the intermediate structure of an occluded form (Occ) with bound ATP, in which both ends of the heme translocation channel are closed. The MD simulation of the predicted Occ revealed that Ser147 on the ABC signature motifs (LSGG[Q/E]) of NBDs occasionally flips and loses the active conformation required for ATP-hydrolysis. The flipping motion was found to be coupled to the inter-NBD distance. Our results highlight the functional significance of the signature motif of ABC transporters in regulation of ATPase and chemo-mechanical coupling mechanism.
  • Scaling molecular dynamics beyond 100,000 processor cores for large-scale biophysical simulations

    Jaewoon Jung, Wataru Nishima, Marcus Daniels, Gavin Bascom, Chigusa Kobayashi, Adetokunbo Adedoyin, Michael Wall, Anna Lappala, Dominic Phillips, William Fischer, Chang-Shung Tung, Tamar Schlick, Yuji Sugita, and Karissa Y. Sanbonmatsu
    The growing interest in the complexity of biological interactions is continuously driving the need to increase system size in biophysical simulations, requiring not only powerful and advanced hardware but adaptable software that can accommodate a large number of atoms interacting through complex forcefields. To address this, we developed and implemented strategies in the GENESIS molecular dynamics package designed for large numbers of processors. Long‐range electrostatic interactions were parallelized by minimizing the number of processes involved in communication. A novel algorithm was implemented for nonbonded interactions to increase single instruction multiple data (SIMD) performance, reducing memory usage for ultra large systems. Memory usage for neighbor searches in real‐space nonbonded interactions was reduced by approximately 80%, leading to significant speedup. Using experimental data describing physical 3D chromatin interactions, we constructed the first atomistic model of an entire gene locus (GATA4). Taken together, these developments enabled the first billion‐atom simulation of an intact biomolecular complex, achieving scaling to 65,000 processes (130,000 processor cores) with 1 ns/day performance. Published 2019. This article is a U.S. Government work and is in the public domain in the USA.
  • Optimal Temperature Evaluation in Molecular Dynamics Simulations with a Large Time Step

    Jaewoon Jung, Chigusa Kobayashi, and Yuji Sugita
    In molecular dynamics (MD) simulations, an accurate evaluation of temperature is essential for controlling temperature as well as pressure in the isothermal–isobaric conditions. According to the Tolman’s equipartition theorem, all motions of all particles should share a single temperature. However, conventional temperature estimation from kinetic energy does not include Hessian terms properly, and thereby, the equipartition theorem is not satisfied with a large time step. In this paper, we show how to evaluate temperature the most accurately without increasing computational cost. We define two kinds of kinetic energies, evaluated at full- and half-time steps that underestimate or overestimate temperature, respectively. A combination of these two kinetic energies provides an optimal instantaneous temperature up to the third order of the time step. The method is tested for a one-dimensional harmonic oscillator, pure water molecules, a Bovine pancreatic trypsin inhibitor (BPTI) protein in water molecules, and a hydrated 1,2-dispalmitoyl-sn-phosphatidylcholine (DPPC) lipid bilayer in water molecules. In all tests, the optimal temperature estimator fulfills the equipartition theorem better than existing methods and reproduces well the usual physical properties for time steps up to and including 5 fs.


  • Flexible selection of the solute region in replica exchange with solute tempering: Application to protein-folding simulations

    Motoshi Kamiya and Yuji Sugita
    Replica-exchange molecular dynamics (REMD) and their variants have been widely used in simulations of the biomolecular structure and dynamics. Replica exchange with solute tempering (REST) is one of the methods where temperature of a pre-defined solute molecule is exchanged between replicas, while solvent temperatures in all the replicas are kept constant. REST greatly reduces the number of replicas compared to the temperature REMD, while replicas at low temperatures are often trapped under their conditions, interfering with the conformational sampling. Here, we introduce a new scheme of REST, referred to as generalized REST (gREST), where the solute region is defined as a part of a molecule or a part of the potential energy terms, such as the dihedral-angle energy term or Lennard-Jones energy term. We applied this new method to folding simulations of a β-hairpin (16 residues) and a Trp-cage (20 residues) in explicit water. The protein dihedral-angle energy term is chosen as the solute region in the simulations. gREST reduces the number of replicas necessary for good random walks in the solute-temperature space and covers a wider conformational space compared to the conventional REST2. Considering the general applicability, gREST should become a promising tool for the simulations of protein folding, conformational dynamics, and an in silico drug design.
  • Linking time-series of single-molecule experiments with molecular dynamics simulations by machine learning

    Yasuhiro Matsunaga and Yuji Sugita
    Single-molecule experiments and molecular dynamics (MD) simulations are indispensable tools for investigating protein conformational dynamics. The former provide timeseries data, such as donor-acceptor distances, whereas the latter give atomistic information, although this information is often biased by model parameters. Here, we devise a machine-learning method to combine the complementary information from the two approaches and construct a consistent model of conformational dynamics. It is applied to the folding dynamics of the formin-binding protein WW domain. MD simulations over 400 μs led to an initial Markov state model (MSM), which was then “refined” using single-molecule Förster resonance energy transfer (FRET) data through hidden Markov modeling. The refined or data-assimilated MSM reproduces the FRET data and features hairpin one in the transition-state ensemble, consistent with mutation experiments. The folding pathway in the data-assimilated MSM suggests interplay between hydrophobic contacts and turn formation. Our method provides a general framework for investigating conformational transitions in other proteins.
  • Refining Markov state models for conformational dynamics using ensemble-averaged data and time-series trajectories

    Yasuhiro Matsunaga and Yuji Sugita
    A data-driven modeling scheme is proposed for conformational dynamics of biomolecules based on molecular dynamics (MD) simulations and experimental measurements. In this scheme, an initial Markov State Model (MSM) is constructed from MD simulation trajectories, and then, the MSM parameters are refined using experimental measurements through machine learning techniques. The second step can reduce the bias of MD simulation results due to inaccurate force-field parameters. Either time-series trajectories or ensemble-averaged data are available as a training data set in the scheme. Using a coarse-grained model of a dye-labeled polyproline-20, we compare the performance of machine learning estimations from the two types of training data sets. Machine learning from time-series data could provide the equilibrium populations of conformational states as well as their transition probabilities. It estimates hidden conformational states in more robustways compared to that from ensemble-averaged data although there are limitations in estimating the transition probabilities between minor states. We discuss how to use the machine learning scheme for various experimental measurements including single-molecule time-series trajectories.
  • Kinetic energy definition in velocity Verlet integration for accurate pressure evaluation

    Jaewoon Jung, Chigusa Kobayash and Yuji Sugita
    In molecular dynamics (MD) simulations, a proper definition of kinetic energy is essential for controlling pressure as well as temperature in the isothermal-isobaric condition. The virial theorem provides an equation that connects the average kinetic energy with the product of particle coordinate and force. In this paper, we show that the theorem is satisfied in MD simulations with a larger time step and holonomic constraints of bonds, only when a proper definition of kinetic energy is used. We provide a novel definition of kinetic energy, which is calculated from velocities at the half-time steps (t − Δt/2 and t + Δt/2) in the velocity Verlet integration method. MD simulations of a 1,2-dispalmitoyl-sn-phosphatidylcholine (DPPC) lipid bilayer and a water box using the kinetic energy definition could reproduce the physical properties in the isothermal-isobaric condition properly. We also develop a multiple time step (MTS) integration scheme with the kinetic energy definition. MD simulations with the MTS integration for the DPPC and water box systems provided the same quantities as the velocity Verlet integration method, even when the thermostat and barostat are updated less frequently.
  • Energetics and conformational pathways of functional rotation in the multidrug transporter AcrB

    Yasuhiro Matsunaga, Tsutomu Yamane, Tohru Terada, Kei Moritsugu, Hiroshi Fujisaki, Satoshi Murakami, Mitsunori Ikeguchi, and Akinori Kidera
    The multidrug transporter AcrB transports a broad range of drugs out of the cell by means of the proton-motive force. The asymmetric crystal structure of trimeric AcrB suggests a functionally rotating mechanism for drug transport. Despite various supportive forms of evidence from biochemical and simulation studies for this mechanism, the link between the functional rotation and proton translocation across the membrane remains elusive. Here, calculating the minimum free energy pathway of the functional rotation for the complete AcrB trimer, we describe the structural and energetic basis behind the coupling between the functional rotation and the proton translocation at atomic resolution. Free energy calculations show that protonation of Asp408 in the transmembrane portion of the drug-bound protomer drives the functional rotation. The conformational pathway identifies vertical shear motions among several transmembrane helices, which regulate alternate access of water in the transmembrane as well as peristaltic motions that pump drugs in the periplasm.


  • GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms

    Chigusa Kobayashi*, Jaewoon Jung*, Yasuhiro Matsunaga, Takaharu Mori, Tadashi Ando, Koichi Tamura, Motoshi Kamiya, and Yuji Sugita (*equally contributed)
    GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc..
  • Flexible fitting to cryo-EM density map using ensemble molecular dynamics simulations

    Osamu Miyashita, Chigusa Kobayashi, Takaharu Mori, Yuji Sugita, and Florence Tama
    Flexible fitting is a computational algorithm to derive a new conformational model that conforms to low-resolution experimental data by transforming a known structure. A common application is against data from cryo-electron microscopy to obtain conformational models in new functional states. The conventional flexible fitting algorithms cannot derive correct structures in some cases due to the complexity of conformational transitions. In this study, we show the importance of conformational ensemble in the refinement process by performing multiple fittings trials using a variety of different force constants. Application to simulated maps of Ca2+ ATPase and diphtheria toxin as well as experimental data of release factor 2 revealed that for these systems, multiple conformations with similar agreement with the density map exist and a large number of fitting trials are necessary to generate good models. Clustering analysis can be an effective approach to avoid over-fitting models. In addition, we show that an automatic adjustment of the biasing force constants during the fitting process, implemented as replica-exchange scheme, can improve the success rate.
  • Multiple program/multiple data molecular dynamics method with multiple time step integrator for large biological systems

    Jaewoon Jung and Yuji Sugita
    Parallelization of molecular dynamics (MD) simulation is essential for investigating conformational dynamics of large biological systems, such as ribosomes, viruses, and multiple proteins in cellular environments. To improve efficiency in the parallel computation, we have to reduce the amount of data transfer between processors by introducing domain decomposition schemes. Also, it is important to optimize the computational balance between real-space non-bonded interactions and reciprocal-space interactions for long-range electrostatic interactions. Here, we introduce a novel parallelization scheme for large-scale MD simulations on massively parallel supercomputers consisting of only CPUs. We make use of a multiple program/multiple data (MPMD) approach for separating the real-space and reciprocal-space computations on different processors. We also utilize the r-RESPA multiple time step integrator on the framework of the MPMD approach in an efficient way: when the reciprocal-space computations are skipped in r-RESPA, processors assigned for them are utilized for half of the real-space computations. The new scheme allows us to use twice as many as processors that are available in the conventional single program approach. The best performances of all-atom MD simulations for 1 million (STMV), 8.5 million (8_STMV), and 28.8 million (27_STMV) atom systems on K computer are 65, 36, and 24 ns/day, respectively. The MPMD scheme can accelerate 23.4, 10.2, and 9.2 ns/day from the maximum performance of single-program approach for STMV, 8_STMV, and 27_STMV systems, respectively, which correspond to 57%, 39%, and 60% speed up. This suggests significant speedups by increasing the number of processors without losing parallel computational efficiency.