Merge pull request #398 from choderalab/multistate

Bring multistate samplers into openmmtools
choderalab · Feb 3, 2019 · 8db070e · 8db070e
2 parents 384c555 + 5b3a36e
commit 8db070e
Show file tree

Hide file tree

Showing 26 changed files with 30,956 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -19,10 +19,27 @@ Features include:
  - enhanced sampling methods, including replica-exchange (REMD) and self-adjusted mixture sampling (SAMS)
  - factories for generating [alchemically-modified](http://alchemistry.org) systems for absolute and relative free energy calculations
  - a suite of test systems for benchmarking, validation, and debugging
- - user-friendly storage interface layer to remove requirement that user know how to store all their data-types on disk 
+ - user-friendly storage interface layer to remove requirement that user know how to store all their data-types on disk
 
 See the [documentation](http://openmmtools.readthedocs.io) at [ReadTheDocs](http://openmmtools.readthedocs.io).
 
 #### License
 
-OpenMMTools is distributed under the MIT License.
+OpenMMTools is distributed under the [MIT License](https://opensource.org/licenses/MIT).
+
+#### Contributors
+
+A complete list of contributors can be found [here](https://github.com/choderalab/openmmtools/graphs/contributors)
+
+Major contributors include:
+
+* Andrea Rizzi `<[email protected]>` (WCMC)
+* John D. Chodera `<[email protected]>` (MSKCC)
+* Levi N. Naden `<[email protected]>` (MSKCC)
+* Patrick Grinaway `<[email protected]>` (MSKCC)
+* Kyle A. Beauchamp `<[email protected]>` (MSKCC)
+* Josh Fass `<[email protected]>` (MSKCC)
+* Bas Rustenburg `<[email protected]>` (MSKCC)
+* Gregory Ross `<[email protected]>` (MSKCC)
+* David W.H. Swenson `<[email protected]>`
+* Hannah Bruce Macdonald `<hannah.brucemacdonald>` (MSKCC)
diff --git a/devtools/conda-recipe/meta.yaml b/devtools/conda-recipe/meta.yaml
@@ -13,19 +13,25 @@ requirements:
   build:
     - python
     - setuptools
-    - openmm ==7.3
+    - openmm >=7.3
+    - cython
 
   run:
     - python
     - numpy
     - scipy
     - six
-    - openmm ==7.3
+    - openmm >=7.3
     - parmed
     - mdtraj
-    - netcdf4
+    - netcdf4 >=1.4.2 # after bugfix: "always return masked array by default, even if there are no masked values"
+    - libnetcdf >=4.6.2 # workaround for libssl issues
+    - pyyaml
+    - cython
+    - sphinxcontrib-bibtex
+    - mpiplus
+    - pymbar
     - pyyaml
-
 
 test:
   requires:

diff --git a/docs/conf.py b/docs/conf.py
@@ -20,6 +20,7 @@
 import os
 import sys
 sys.path.insert(0, os.path.abspath('..'))
+import sphinx_rtd_theme
 
 
 # -- General configuration ------------------------------------------------
@@ -40,6 +41,7 @@
     'sphinx.ext.todo',
     'sphinx.ext.coverage',
     'sphinx.ext.viewcode',
+    'sphinxcontrib.bibtex',
     #'sphinx.ext.githubpages'
     ]
 

diff --git a/docs/environment.yml b/docs/environment.yml
@@ -2,16 +2,20 @@ name: openmmtools
 channels:
     - conda-forge
     - omnia
-    - omnia/label/rc
 dependencies:
     - python
     - setuptools
     - openmm >=7.3
+    - cython
     - numpy
     - scipy
     - six
     - parmed
     - mdtraj
     - numpydoc
-    - netCDF4
+    - netcdf4 >=1.4.2 # after bugfix: "always return masked array by default, even if there are no masked values"
+    - libnetcdf >=4.6.2 # workaround for libssl issues
+    - sphinxcontrib-bibtex
+    - mpiplus
+    - pymbar
     - pyyaml
diff --git a/docs/index.rst b/docs/index.rst
@@ -56,7 +56,7 @@ Modules
   states
   cache
   mcmc
-  sampling
+  multistate
   alchemy
   forces
   forcefactories

diff --git a/docs/multistate.rst b/docs/multistate.rst
@@ -0,0 +1,170 @@
+.. _multistate:
+
+Sampling multiple thermodynamic states
+======================================
+
+``openmmtools`` provides several schemes for sampling from multiple thermodynamic states within a single calculation:
+
+* ``MultistateSampler``: Independent simulations at distinct thermodynamic states
+* ``ReplicaExchangeSampler``: Replica exchange among thermodynamic states (also called Hamiltonian exchange if only the Hamiltonian is changing)
+* ``SAMSSampler``: Self-adjusted mixture sampling (also known as optimally-adjusted mixture sampling)
+
+While the thermodynamic states sampled usually differ only in the alchemical parameters, other thermodynamic parameters (such as temperature) can be modulated as well at intermediate alchemical states.
+This may be useful in, for example, experimenting with ways to reduce correlation times.
+
+In all of these schemes, one or more **replicas** is simulated.
+Each iteration includes the following phases:
+ * Allow replicas to switch thermodynamic states (optional)
+ * Allow replicas to sample a new configuration using Markov chain Monte Carlo (MCMC)
+ * Each replica computes the potential energy of the current configuration in multiple thermodynamic states
+ * Data is written to disk
+
+Below, we describe some of the aspects of these samplers.
+
+``MultiStateSampler``: Independent simulations at multiple thermodynamic states
+-------------------------------------------------------------------------------
+
+The ``MultiStateSampler`` allows independent simulations from multiple thermodynamic states to be sampled.
+In this case, the MCMC scheme is used to propagate each replica by sampling from a fixed thermodynamic state.
+
+.. math::
+
+   s_{k,n+1} = s_{k, n} \\
+   x_{k,n+1} \sim p(x | s_{k, n+1})
+
+An inclusive "neighborhood" of thermodynamic states around this specified state can be used to define which thermodynamic states the reduced potential should be computed for after each iteration.
+If all thermodynamic states are included in this neighborhood (the default), the MBAR scheme :cite:`Shirts2008statistically` can be used to optimally estimate free energies and uncertainties.
+If a restricted neighborhood is used (in order to reduce the amount of time spent in the energy evaluation stage), a variant of the L-WHAM (local weighted histogram analysis method) :cite:`kumar1992weighted` is used to extract an estimate from all available information.
+
+.. currentmodule:: openmmtools.multistate
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    MultiStateSampler
+    MultiStateSamplerAnalyzer
+
+``ReplicaExchangeSampler``: Replica exchange among thermodynamic states
+-----------------------------------------------------------------------
+
+The ``ReplicaExchangeSampler`` implements a Hamiltonian replica exchange scheme with Gibbs sampling :cite:`Chodera2011` to sample multiple thermodynamic states in a manner that improves mixing of the overall Markov chain.
+By allowing replicas to execute a random walk in thermodynamic state space, correlation times may be reduced when sampling certain thermodynamic states (such as those with alchemically-softened potentials or elevated temperatures).
+
+In the basic version of this scheme, a proposed swap of configurations between two alchemical states, *i* and *j*, made by comparing the energy of each configuration in each replica and swapping with a basic Metropolis criteria of
+
+.. math::
+    P_{\text{accept}}(i, x_i, j, x_j) &= \text{min}\begin{cases}
+                               1, \frac{ e^{-\left[u_i(x_j) + u_j(x_i)\right]}}{e^{-\left[u_i(x_i) + u_j(x_j)\right]}}
+                               \end{cases} \\
+        &= \text{min}\begin{cases}
+          1, \exp\left[\Delta u_{ji}(x_i) + \Delta u_{ij}(x_j)\right]
+          \end{cases}
+
+where :math:`x` is the configuration of the subscripted states :math:`i` or :math:`j`, and :math:`u` is the reduced potential energy.
+While this scheme is typically carried out on neighboring states only, we also implement a much more efficient form of Gibbs sampling in which many swaps are attempted to generate an approximately uncorrelated sample of the state permutation over all :math:`K` :cite:`Chodera2011`.
+This speeds up mixing and reduces the total number of samples needed to produce uncorrelated samples.
+
+.. currentmodule:: openmmtools.multistate
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    ReplicaExchangeSampler
+    ReplicaExchangeAnalyzer
+
+``SAMSSampler``: Self-adjusted mixture sampling
+-----------------------------------------------
+
+The ``SAMSSampler`` implements self-adjusted mixture sampling (SAMS; also known as optimally adjusted mixture sampling) :cite:`Tan2017:SAMS`.
+This combines one or more replicas that sample from an expanded ensemble with an asymptotically optimal Wang-Landau-like weight update scheme.
+
+.. math::
+
+   s_{k,n+1} = p(s | x_{k,n}) \\
+   x_{k,n+1} \sim p(x | s_{k, n+1})
+
+SAMS state update schemes
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Several state update schemes are available:
+
+* ``global-jump`` (default): The sampler can jump to any thermodynamic state (RECOMMENDED)
+* ``restricted-range-jump``: The sampler can jump to any thermodynamic state within the specified local neighborhood (EXPERIMENTAL; DISABLED)
+* ``local-jump``: Only proposals within the specified neighborhood are considered, but rejection rates may be high (EXPERIMENTAL; DISABLED)
+
+SAMS Locality
+^^^^^^^^^^^^^
+
+The local neighborhood is specified by the ``locality`` parameter.
+If this is a positive integer, the neighborhood will be defined by state indices ``[k - locality, k + locality]``.
+Reducing locality will restrict the range of states for which reduced potentials are evaluated, which can speed up the energy evaluation stage of each iteration at the cost of restricting the amount of information available for free energy estimation.
+By default, the ``locality`` is global, such that energies at all thermodynamic states are computed; this allows the use of MBAR in data analysis.
+
+SAMS weight adaptation algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+SAMS provides two ways of accumulating log weights each iteration:
+
+* ``optimal`` accumulates weight only in the currently visited state ``s``
+* ``rao-blackwellized`` accumulates fractional weight in all states within the energy evaluation neighborhood
+
+SAMS initial weight adaptation stage
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Because the asymptotically-optimal weight adaptation scheme works best only when the log weights are close to optimal, a heuristic initial stage is used to more rapidly adapt the log weights before the asymptotically optimal scheme is used.
+The behavior of this first stage can be controlled by setting two parameters:
+
+* ``gamma0`` controls the initial rate of weight adaptation. By default, this is 1.0, but can be set larger (e.g., 10.0) if the free energy differences between states are much larger.
+* ``flatness_threshold`` controls the number of (fractional) visits to each thermodynamic state that must be accumulated before the asymptotically optimal weight adaptation scheme is used.
+
+.. currentmodule:: openmmtools.multistate
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    SAMSSampler
+    SAMSAnalyzer
+
+Parallel tempering
+------------------
+
+.. currentmodule:: openmmtools.multistate
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    ParallelTemperingSampler
+    ParallelTemperingAnalyzer
+
+Multistate Reporters
+--------------------
+
+.. currentmodule:: openmmtools.multistate
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    MultiStateReporter
+
+Analysis of multiple thermodynamic transformations
+--------------------------------------------------
+
+.. currentmodule:: openmmtools.multistate
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    MultiPhaseAnalyzer
+
+Miscellaneous support classes
+-----------------------------
+
+.. currentmodule:: openmmtools.multistate.multistateanalyzer
+.. autosummary::
+    :nosignatures:
+    :toctree: api/generated/
+
+    ObservablesRegistry
+    CachedProperty
+    InsufficientData
+    PhaseAnalyzer
diff --git a/docs/references.bib b/docs/references.bib
@@ -0,0 +1,45 @@
+@article{Chodera2011,
+   author = {Chodera, John D. and Shirts, Michael R.},
+   title = {Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing},
+   journal = {The Journal of Chemical Physics},
+   year = {2011},
+   volume = {135},
+   number = {19},
+   eid = {194110},
+   url = {http://scitation.aip.org/content/aip/journal/jcp/135/19/10.1063/1.3660669},
+   doi = {http://dx.doi.org/10.1063/1.3660669},
+}
+
+@article{Tan2017:SAMS,
+  title={Optimally adjusted mixture sampling and locally weighted histogram analysis},
+  author={Tan, Zhiqiang},
+  journal={Journal of Computational and Graphical Statistics},
+  volume={26},
+  number={1},
+  pages={54--65},
+  year={2017},
+  publisher={Taylor \& Francis}
+}
+
+
+@article{Shirts2008statistically,
+  title={Statistically optimal analysis of samples from multiple equilibrium states},
+  author={Shirts, Michael R and Chodera, John D},
+  journal={The Journal of chemical physics},
+  volume={129},
+  number={12},
+  pages={124105},
+  year={2008},
+  publisher={AIP}
+}
+
+@article{kumar1992weighted,
+  title={The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method},
+  author={Kumar, Shankar and Rosenberg, John M and Bouzida, Djamal and Swendsen, Robert H and Kollman, Peter A},
+  journal={Journal of computational chemistry},
+  volume={13},
+  number={8},
+  pages={1011--1021},
+  year={1992},
+  publisher={Wiley Online Library}
+}
diff --git a/docs/references.rst b/docs/references.rst
@@ -0,0 +1,64 @@
+.. _references:
+
+**********
+References
+**********
+
+Here are a list of references for the various components and algorithms used in ``openmmtools``.
+
+OpenMM GPU-accelerated molecular mechanics library
+""""""""""""""""""""""""""""""""""""""""""""""""""
+
+  Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing units.
+  J. Comput. Chem. 30:864, 2009.
+  http://dx.doi.org/10.1002/jcc.21209
+
+  Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations.
+  Comput. Sci. Eng. 12:34, 2010.
+  http://dx.doi.org/10.1109/MCSE.2010.27
+
+  Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit.
+  J. Comput. Chem. 31:1268, 2010.
+  http://dx.doi.org/10.1002/jcc.21413
+
+  Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations.
+  J. Chem. Theor. Comput. 6:434, 2010.
+  http://dx.doi.org/10.1021/ct900463w
+
+  Eastman P, Friedrichs M, Chodera JD, Radmer RJ, Bruns CM, Ku JP, Beauchamp KA, Lane TJ, Wang LP, Shukla D, Tye T, Houston M, Stich T, Klein C, Shirts M, and Pande VS.  OpenMM 4: A Reusable, Extensible,
+  Hardware Independent Library for High Performance Molecular Simulation. J. Chem. Theor. Comput. 2012.
+  http://dx.doi.org/10.1021/ct300857j
+
+Replica-exchange with Gibbs sampling
+""""""""""""""""""""""""""""""""""""
+
+  Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing.
+  J. Chem. Phys. 135:19410, 2011.
+  http://dx.doi.org/10.1063/1.3660669
+
+MBAR for estimation of free energies from simulation data
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+  Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states.
+  J. Chem. Phys. 129:124105, 2008.
+  http://dx.doi.org/10.1063/1.2978177
+
+Long-range dispersion corrections for explicit solvent free energy calculations
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+  Shirts MR, Mobley DL, Chodera JD, and Pande VS. Accurate and efficient corrections or missing dispersion interactions in molecular simulations.
+  J. Phys. Chem. 111:13052, 2007.
+  http://dx.doi.org/10.1021/jp0735987
+
+
+Bibliography
+############
+
+.. The :all: directive searches subfolders for uses of :cite: for correct reference
+   However, this has the effect of dropping all citations in the .bib file in here and
+   the compiler complains about unused citations.
+   As such, unused articles in the .bib file are simply commented so as not to delete them if needed in the future.
+
+.. bibliography:: references.bib
+   :style: unsrt
+   :all:
-Original file line number
+Diff line change
@@ Expand Up / @@ -56,7 +56,7 @@ Modules @@
       states
       cache
       mcmc
-      sampling
+      multistate
       alchemy
       forces
       forcefactories
@@ Expand Down @@