diff --git a/notebooks/HMM_01_exercise.ipynb b/notebooks/HMM_01_exercise.ipynb
new file mode 100644
index 0000000..7f11651
--- /dev/null
+++ b/notebooks/HMM_01_exercise.ipynb
@@ -0,0 +1,228 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# MSMs versus HMMs\n",
+    "\n",
+    "In this notebook we will study how to select the lag time of Markov state models or Hidden Markov models, and how the models can be validated in order to decide whether they can be used in order to make reliable predictions about the long-term kinetics of the molecular system studied.\n",
+    "\n",
+    "We start with a few general imports and settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib as mpl\n",
+    "from pyemma import msm\n",
+    "import numpy as np\n",
+    "import pyemma.plots as mplt\n",
+    "mpl.rcParams.update({'font.size': 14})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load double well data\n",
+    "import pyemma.datasets\n",
+    "double_well_data = pyemma.datasets.load_2well_discrete()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Draw double well and discretization\n",
+    "-------\n",
+    "We will use simulation data generated for a double-well potential. This data is part of the standard datasets in PyEMMA. In this notebook, we compare two discretizations: An excellent discretization with six states ('good discretization'), and a poor two-state discretization where the dividing surface is far away from the transition state ('bad discretization').\n",
+    "\n",
+    "Let's look at the double well potential and these two discretizations:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mpl.rcParams.update({'font.size': 10})\n",
+    "fig, axis = plt.subplots(1, 2, figsize=(10, 2))\n",
+    "P = double_well_data.transition_matrix\n",
+    "mu = msm.markov_model(P).stationary_distribution\n",
+    "E = -np.log(mu)\n",
+    "# plot 1\n",
+    "i = 0\n",
+    "axis[i].set_title('good discretization')\n",
+    "axis[i].plot(E - 2.0, linewidth=2, color='black')\n",
+    "axis[i].fill_between(range(len(E)), np.zeros(len(E)), E - 2.0, color='lightgrey')\n",
+    "axis[i].set_xlim(20, 80); axis[i].set_xlabel('x'); axis[i].set_ylim(0,8); axis[i].set_ylabel('free energy / kT')\n",
+    "for b in [40, 45, 50, 55, 60]:\n",
+    "    axis[i].plot([b, b], [0, 8], linewidth=2, linestyle='dashed', color='black')\n",
+    "# plot 2\n",
+    "i = 1\n",
+    "axis[i].set_title('bad discretization')\n",
+    "axis[i].plot(E - 2.0, linewidth=2, color='black')\n",
+    "axis[i].fill_between(range(len(E)), np.zeros(len(E)), E - 2.0, color='lightgrey')\n",
+    "axis[i].set_xlim(20, 80); axis[i].set_xlabel('x'); axis[i].set_ylim(0, 8); axis[i].set_ylabel('free energy / kT')\n",
+    "axis[i].set_ylabel(''); axis[i].yaxis.set_ticklabels([]); \n",
+    "axis[i].plot([40, 40], [0, 8], linewidth=2, linestyle='dashed', color='black')\n",
+    "#\n",
+    "#savefig('figs/fig_selval_ab.png', bbox_inches='tight')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Timescales: MSM\n",
+    "-----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First we want to build Markov state models. Since the discretization is given, the only parameter that still needs to be selected is the lag time. Both the discretization and the lag time are crucial for the approximation quality of a Markov state model [1,2].\n",
+    "\n",
+    "One common approach to select a lagtime is to compute the relaxation timescales implied by the estimated Markov state model [3], shortly called implied timescales: \n",
+    "\n",
+    "$    t_i = -\\tau / \\ln | \\lambda_i (\\tau)| $\n",
+    "\n",
+    "where $\\lambda_i(\\tau)$ are the eigenvalues of the Markov state model estimated at lag time $\\tau$. As the relaxation timescales are physical properties of the system the implied timescales should be independent of the lag time tau. In practice, this will not be fulfilled for short lag times due to discretization errors [2]. It will neither be fulfilled for very long lag times exceeding the implied timescales as the Markov model eigenvalues are then essentially resulting from the projection on random vectors [4]."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "its_good_bmsm = msm.timescales_msm([double_well_data.dtraj_T100K_dt10_n6good], lags = 100, errors='bayes')\n",
+    "its_bad_bmsm = msm.timescales_msm([double_well_data.dtraj_T100K_dt10_n2bad], lags = 100, errors='bayes')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plotting these implied timescales already show that the good discretization converges at a lag time of about 20-40 steps to a timescale somewhat about 300, while the bad discretization doesn't appear to converge. According to the variational principle of conformation dynamics [5], we know in principle that the model with longer timescales are preferable, so at this point we might select the first discretization and a lag time of 20-40 steps..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We see that the uncertainties (by default two sigma's = 95% confidence are shown) are small compared to the tau-dependent changes of the timescales in the bad discretization and we can reject that discretization even without knowing about the good discretization. The good discretization has a timescale that is indistinguishable from a constant af lag times of about 30 steps or longer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mpl.rcParams.update({'font.size': 10})\n",
+    "fig, axis = plt.subplots(1, 2, figsize=(10, 2))\n",
+    "#\n",
+    "mplt.plot_implied_timescales(its_good_bmsm, ax=axis[0], ylog=False)\n",
+    "axis[0].set_ylim(0, 500); \n",
+    "axis[0].set_title('good discretization')\n",
+    "#\n",
+    "mplt.plot_implied_timescales(its_bad_bmsm, ax=axis[1], ylog=False)\n",
+    "axis[1].set_ylim(0, 500); axis[1].set_ylabel(''); axis[1].yaxis.set_ticklabels([]);\n",
+    "axis[1].set_title('bad discretization')\n",
+    "#\n",
+    "#savefig('figs/fig_selval_cd.png', bbox_inches='tight')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Hidden Markov state models\n",
+    "------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let's play with Hidden Markov state models. Discrete Hidden Markov models (HMMs) are describe in [7]. For metastable systems such as this one, discrete HMMs can be shown to be excellent approximations to PMMs, that are exact descriptions of the non-Markovian dynamics on discretized state spaces for metastable dynamics [8]. We estimate the HMM as described in [8], which involves to first estimate a coarse-grained Markov state model and then running a HMM estimation on top of it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1. Compute the implied timescales for the same lag times as above, but using HMSMs. Check out the documentation of the msm package.\n",
+    "2. Compute the implied timescales with Bayesian errors. We do this by using the algorithms described in [9].\n",
+    "3. Estimate a Bayesian HMSM at lag time of 5 steps for both discretizations\n",
+    "4. Conduct a Chapman-Kolmogorov Test for both models. What do you conclude?\n",
+    "5. Analyze the transition matrix, the timescales and the observation probabilities of both models. Try to explain your observations\n",
+    "6. Adapt the lag time in the bad discretization so as to get a successful HMM model. What can be said about this model in comparison to an MSM?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "References\n",
+    "---------\n",
+    "[1] Prinz, J.-H., H. Wu, M. Sarich, B. G. Keller, M. Senne, M. Held, J. D. Chodera, Ch. Schütte and F. Noé: Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105 (2011)\n",
+    "\n",
+    "[2] Sarich, M., F. Noé, Ch. Schütte: On the Approximation Quality of Markov State Models. Multiscale Model. Simul. 8, 1154-1177 (2010)\n",
+    "\n",
+    "[3] Swope, W. C., J. W. Pitera and F. Suits: Describing protein folding kinetics by molecular dynamics simulations: 1. Theory, J. Phys. Chem. B. 108, 6571-6581 (2004)\n",
+    "\n",
+    "[4] Beauchamp, K. A., R. McGibbon, Y. S. Lin and V. S. Pande: Simple few-state models reveal hidden complexity in protein folding. Proc. Natl. Acad. Sci. USA 109, 17807-17813 (2012)\n",
+    "\n",
+    "[5] Noé, F. and F. Nüske: A variational approach to modeling slow processes in stochastic dynamical systems. SIAM Multiscale Model. Simul. 11. 635-655 (2013).\n",
+    "\n",
+    "[6] Trendelkamp-Schroer, B., H. Wu, F. Paul and F. Noé: Estimation and uncertainty of reversible Markov models. arxiv.org/pdf/1507.05990 (2015)\n",
+    "\n",
+    "[7] Rabiner, L. R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 257--286 (1989)\n",
+    "\n",
+    "[8] Noé, F., H. Wu, J.-H. Prinz and N. Plattner, N.: Projected and Hidden Markov Models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys. 139, 184114 (2013)\n",
+    "\n",
+    "[9] Chodera, J. D., P. Elms, F. Noé, B. Keller, C. M. Kaiser, A. Ewall-Wice, S. Marqusee, C. Bustamante, N. Singhal Hinrichs: Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty. http://arxiv.org/abs/1108.1430."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/notebooks/MEMM_01_bias_energies_lecture.ipynb b/notebooks/MEMM_01_bias_energies_lecture.ipynb
new file mode 100644
index 0000000..87d58b1
--- /dev/null
+++ b/notebooks/MEMM_01_bias_energies_lecture.ipynb
@@ -0,0 +1,380 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import matplotlib as mpl\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import pyemma\n",
+    "import pyemma.thermo.util\n",
+    "import mdshare\n",
+    "import shortcuts_thermo as shortcuts"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Analysing Umbrella sampling simulations "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with np.load(mdshare.load('pyemma-tutorial-us-data.npz', working_directory='data')) as fh:\n",
+    "    # load biased data\n",
+    "    order_parameter_trajs = [fh['us_traj_%03d.npy' % i] for i in range(100)]\n",
+    "    rest_positions = fh['umbrella_centers'].tolist()\n",
+    "    spring_constants = fh['force_constants'].tolist()\n",
+    "    # load unbiased data\n",
+    "    adw_md_trajs = [fh['md_traj_%03d.npy' % i] for i in range(5)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We use 20 different harmonic bias potentials (umbrella potentials).\n",
+    "They are defined by their sping constants $k^{(i)}$ and their 20 different rest positions $x^{(i)}$. \n",
+    "\n",
+    "$$b^{(i)}(x) = \\frac{k^{(i)}}{2} \\left\\Vert x - x^{(i)} \\right\\Vert^2.$$\n",
+    "\n",
+    "With each bias potential we ran 5 simulations which results in a total number of 20\\*5=100 trajectories.\n",
+    "\n",
+    "For each simulation $i$ we saved:\n",
+    "\n",
+    "  * The spring constant $k{(i)}$ that was active in the simulation (variable `spring_constants[i]`)\n",
+    "  * The rest position $x{(i)}$ of the bias potential that was ative in the simulation (variable `rest_positions[i]`)\n",
+    "  * The time series $x(t)$ for the order parameter (variable `order_parameter_trajs[i]`)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('len(spring_constants) =', len(spring_constants))\n",
+    "print('len(rest_positions) =', len(rest_positions))\n",
+    "print('len(order_parameter_trajs) =', len(order_parameter_trajs))\n",
+    "print('order_parameter_trajs[0].shape =', order_parameter_trajs[0].shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before we start with the analysis, let's get an overview over the data. \n",
+    "For this, we make a histogram of every order parameter time series.\n",
+    "We also mark the umbrella rest positions in the same plot."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for order_parameter_traj in order_parameter_trajs:\n",
+    "    plt.hist(order_parameter_traj)\n",
+    "plt.plot(rest_positions, np.ones_like(rest_positions)+100, 'xk')\n",
+    "plt.ylabel('counts')\n",
+    "plt.xlabel('order parameter / a.u.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Using the bare metal API"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To run the estimation methods, we have to define three quantities for every simulations:\n",
+    "    \n",
+    "   * (A) The thermodynamics state trajectory (`ttraj`), that contain the index of the bias potential used in the simulation.\n",
+    "   * (B) The bias energy trajectory (`btraj`) that contain the energies of every frame evaluated with all the bias potentials.\n",
+    "   * (C) The discrete microstate trajectory (`dtraj`)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### (A) The thermodynamic state trajectories"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For every trajectory, we compute the index of the active bias.\n",
+    "Until now we didn't assign indices to the different bias potentials. We only have the lists `spring_constants` and `rest_positions` that contain the parameters that were used in the simulations. But because the numbering of the simualtions is arbitray, so is the ordering of `spring_constants` and `rest_positions`.\n",
+    "\n",
+    "So let's assign indices to the potentials. We start by removing duplicates from the list:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "unique_spring_constants = np.unique(spring_constants)\n",
+    "unique_rest_positions = np.unique(rest_positions)\n",
+    "\n",
+    "K = len(unique_rest_positions)\n",
+    "print('number of ensembles:', K)\n",
+    "print('unique_spring_constants', unique_spring_constants)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The order of the parameters in the lists now defines the indices, that is `unique_rest_positions[i]` is a parameter of the `i`'th potential.\n",
+    "\n",
+    "Now that the order of the potentials is defined, we can compute the thermodynamic state trajectories."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ttrajs = []\n",
+    "for i, rest_position in enumerate(rest_positions):\n",
+    "    # get index of active rest position in the list of all possible rest positions\n",
+    "    ensemble_index = np.where(rest_position == unique_rest_positions)[0][0] \n",
+    "    # define the ttraj \n",
+    "    ttraj = np.array([ensemble_index]*len(order_parameter_trajs[i]))\n",
+    "    ttrajs.append(ttraj) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### (B) The bias energy trajectories"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For every simulation we generate a 2-D array `btraj[t, k]` with the elements $$b^{(i)}(x_t) = \\frac{k^{(i)}}{2} \\left\\Vert x_t - x^{(i)} \\right\\Vert^2.$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Such an expression can be writtien very succinctly with `numpy` operations:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "btrajs = []\n",
+    "for order_parameter_traj in order_parameter_trajs:\n",
+    "    btraj = np.zeros((len(order_parameter_traj), K))\n",
+    "    for k in range(K):\n",
+    "        btraj[:, k] = 0.5*unique_spring_constants*(order_parameter_traj[:, 0] - unique_rest_positions[k])**2\n",
+    "    btrajs.append(btraj)\n",
+    "    \n",
+    "# The array indexing in order_parameter_traj[:, 0] is used to get rid of an unused extra array dimension."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### (C) The discrete microstate trajectory\n",
+    "\n",
+    "Here we just use one of the clustering methods from PyEmma.\n",
+    "Since in this example, the state space is one-dimensional, we can just cluster the order parameter. \n",
+    "In general it is not enough to cluster the order parameters but other system coordiantes have to be included into the definition of microstates."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clustering_obj = pyemma.coordinates.cluster_regspace(order_parameter_trajs, max_centers=500, dmin=0.2)\n",
+    "dtrajs = clustering_obj.dtrajs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we're ready to run the estiamtors. We start with `wham`.\n",
+    "As explained in the lecture, WHAM assumes that the bias energy is constant on every microstate. However, above we computed bias energy values for every frame (conformation). We therefore have to coarse-grain the bias energy values. This is achieved by the utility functions `get_averaged_bias_matrix`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wham = pyemma.thermo.wham(ttrajs=ttrajs, dtrajs=dtrajs, \n",
+    "                          bias=pyemma.thermo.util.get_averaged_bias_matrix(btrajs, dtrajs))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To quickly check the result, we plot minus the logarithm of the equilibrium distribution that was estimated by WHAM. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.plot(clustering_obj.clustercenters, -np.log(wham.pi_full_state), 'o')\n",
+    "plt.xlabel('order parameter / a.u.')\n",
+    "plt.ylabel('free energy / kT');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Using the user friendly API"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The umbrella sampling API not only supports one-dimensional umbrella sampling with the bias\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\frac{k^{(i)}}{2} \\left\\Vert \\mathbf{x} - \\mathbf{x}^{(i)} \\right\\Vert^2.$$\n",
+    "\n",
+    "but also the more general case, where the bias energy is given with a quadratic form with a possibly non-symmetric force matrix:\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\frac{1}{2} \\left\\langle \\mathbf{x} - \\mathbf{x}^{(i)} \\middle\\vert \\mathbf{k}^{(i)} \\middle\\vert \\mathbf{x} - \\mathbf{x}^{(i)} \\right\\rangle.$$\n",
+    "\n",
+    "\n",
+    "For these simulation types, the `pyemma.thermo` module provides the API function\n",
+    "\n",
+    "```python\n",
+    "def estimate_umbrella_sampling(\n",
+    "    us_trajs, us_dtrajs, us_centers, us_force_constants,\n",
+    "    md_trajs=None, md_dtrajs=None, kT=None,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    estimator='wham', lag=1, dt_traj='1 step', init=None):\n",
+    "    ...\n",
+    "\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wham = pyemma.thermo.estimate_umbrella_sampling(\n",
+    "    order_parameter_trajs, dtrajs, rest_positions, spring_constants,\n",
+    "    maxiter=100000, maxerr=1.0E-15, save_convergence_info=50, estimator='wham')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note also that we used the optinal parameter `save_convergence_info=50`. This instructs wham to track the convergence of the log-lokelihood and the change in the estimated free energy matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pyemma.plots.plot_convergence_info(wham)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's check the results:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "us_estimator = wham\n",
+    "adw_us_x, adw_us_f = shortcuts.adw_match_reference_to_binning(order_parameter_trajs, clustering_obj.clustercenters)\n",
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * 6, 0.75*6))\n",
+    "ax[0].plot(\n",
+    "    clustering_obj.clustercenters, us_estimator.f_full_state, 's', markersize=10, label=us_estimator.name)\n",
+    "ax[0].plot(adw_us_x, adw_us_f, '-*', linewidth=2, markersize=9, color='black', label='Reference')\n",
+    "ax[0].set_xlabel(r\"configuration state\", fontsize=20)\n",
+    "ax[0].set_ylabel(r\"f / kT\", fontsize=20)\n",
+    "ax[1].plot(unique_rest_positions, us_estimator.f_therm, 's', markersize=10, label=us_estimator.name)\n",
+    "ax[1].set_xlabel(r\"umbrella_center\", fontsize=20)\n",
+    "ax[1].set_ylabel(r\"f_therm / kT\", fontsize=20)\n",
+    "for _ax in ax:\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "    _ax.set_ylim([0, 12])\n",
+    "    _ax.legend(loc=4, fontsize=10, fancybox=True, framealpha=0.5)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/notebooks/MEMM_02_us_lecture.ipynb b/notebooks/MEMM_02_us_lecture.ipynb
new file mode 100644
index 0000000..2bc34cb
--- /dev/null
+++ b/notebooks/MEMM_02_us_lecture.ipynb
@@ -0,0 +1,506 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import print_function\n",
+    "%matplotlib inline\n",
+    "import matplotlib as mpl\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import pyemma\n",
+    "import pyemma.thermo.util\n",
+    "import mdshare\n",
+    "\n",
+    "# import some functions which should not clutter the notebook\n",
+    "import shortcuts_thermo as shortcuts\n",
+    "\n",
+    "# figure size parameters\n",
+    "pw = 6\n",
+    "ph = 0.75 * pw"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Umbrella sampling simulations\n",
+    "\n",
+    "The bias is computed via a harmonic potential based on the deviation of a frame from a reference structure. In the usual one-dimensional case, this reads\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\frac{k^{(i)}}{2} \\left\\Vert \\mathbf{x} - \\mathbf{x}^{(i)} \\right\\Vert^2.$$\n",
+    "\n",
+    "In the more general case, though, one can use a non-symmetric force matrix:\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\frac{1}{2} \\left\\langle \\mathbf{x} - \\mathbf{x}^{(i)} \\middle\\vert \\mathbf{k}^{(i)} \\middle\\vert \\mathbf{x} - \\mathbf{x}^{(i)} \\right\\rangle.$$\n",
+    "\n",
+    "## API functions for umbrella sampling\n",
+    "\n",
+    "For these simulation types, the `pyemma.thermo` module provides the API functions\n",
+    "\n",
+    "```python\n",
+    "def estimate_umbrella_sampling(\n",
+    "    us_trajs, us_dtrajs, us_centers, us_force_constants,\n",
+    "    md_trajs=None, md_dtrajs=None, kT=None,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    estimator='wham', lag=1, dt_traj='1 step', init=None):\n",
+    "    ...\n",
+    "\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example Model 1: one-dimensional asymmetric double well potential\n",
+    "\n",
+    "We start by looking at the stationary distribution and free energy profile which are available analytically."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_x, adw_f, adw_pi = shortcuts.adw_reference(-1, 5, 100)\n",
+    "\n",
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * pw, ph))\n",
+    "ax[0].plot(adw_x, adw_pi, linewidth=3, color='black')\n",
+    "ax[0].set_ylabel(r\"$\\pi(x)$\", fontsize=20)\n",
+    "ax[0].semilogy()\n",
+    "ax[1].plot(adw_x, adw_f, linewidth=3, color='black')\n",
+    "ax[1].set_ylabel(r\"$f(x)$ / kT\", fontsize=20)\n",
+    "for _ax in ax:\n",
+    "    _ax.set_xlabel(r\"$x$ / a.u.\", fontsize=20)\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * pw, ph))\n",
+    "# plot the thermodynamic ground/unbiased state (kT=1.0)\n",
+    "ax[0].plot(adw_x, adw_pi, linewidth=3, color='black', label='unbiased')\n",
+    "ax[1].plot(adw_x, adw_f, linewidth=3, color='black', label='unbiased')\n",
+    "# plot the sixth umbrella\n",
+    "_, adw_f2, adw_pi2 = shortcuts.adw_reference(adw_x[0], adw_x[-1], adw_x.shape[0], k_bias=30.0, x_bias=1.07894737)\n",
+    "ax[0].plot(adw_x, adw_pi2, linewidth=3, color='blue', label='umbrella 6')\n",
+    "ax[1].plot(adw_x, adw_f2, linewidth=3, color='blue', label='umbrella 6')\n",
+    "# plot the 10th umbrella\n",
+    "_, adw_f2, adw_pi2 = shortcuts.adw_reference(adw_x[0], adw_x[-1], adw_x.shape[0], k_bias=30.0, x_bias=2.13157895)\n",
+    "ax[0].plot(adw_x, adw_pi2, linewidth=3, color='green', label='umbrella 10')\n",
+    "ax[1].plot(adw_x, adw_f2, linewidth=3, color='green', label='umbrella 10')\n",
+    "# plot the 14th umbrella\n",
+    "_, adw_f2, adw_pi2 = shortcuts.adw_reference(adw_x[0], adw_x[-1], adw_x.shape[0], k_bias=30.0, x_bias=3.18421053)\n",
+    "ax[0].plot(adw_x, adw_pi2, linewidth=3, color='red', label='umbrella 14')\n",
+    "ax[1].plot(adw_x, adw_f2, linewidth=3, color='red', label='umbrella 14')\n",
+    "# finish the figure\n",
+    "ax[0].set_ylabel(r\"$\\pi^{(j)}(x)$\", fontsize=20)\n",
+    "ax[0].semilogy()\n",
+    "ax[0].set_ylim([1.0E-10, 1.0])\n",
+    "ax[0].legend(loc=3, fontsize=12, fancybox=True, framealpha=0.5)\n",
+    "ax[1].set_ylabel(r\"$f^{(j)}(x) - f^{(j)}$ / kT\", fontsize=20)\n",
+    "ax[1].set_ylim([0.0, 30.0])\n",
+    "ax[1].legend(loc=2, fontsize=12, fancybox=True, framealpha=0.5)\n",
+    "for _ax in ax:\n",
+    "    _ax.set_xlabel(r\"$x$ / a.u.\", fontsize=20)\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "First step: import the data from 100 precomputed umbrella sampling trajectories as listed in the file ``meta.dat``..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with np.load(mdshare.load('pyemma-tutorial-us-data.npz', working_directory='data')) as fh:\n",
+    "    # load biased data\n",
+    "    adw_us_trajs = [fh['us_traj_%03d.npy' % i] for i in range(100)]\n",
+    "    adw_us_umbrella_centers = fh['umbrella_centers'].tolist()\n",
+    "    adw_us_force_constants = fh['force_constants'].tolist()\n",
+    "    # load unbiased data\n",
+    "    adw_md_trajs = [fh['md_traj_%03d.npy' % i] for i in range(5)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(len(adw_us_trajs))\n",
+    "print(len(adw_us_umbrella_centers))\n",
+    "print(len(adw_us_force_constants))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Second step: run a clustering algorithm on the configuration trajectories to define the bins\n",
+    "(and to compute the bin counts later on)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_us_cluster = pyemma.coordinates.cluster_regspace(adw_us_trajs, max_centers=500, dmin=0.2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Third step: run ``WHAM``  estimations using the ``estimate_umbrella_sampling`` API function and plot the convergence info..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_us_estimator = pyemma.thermo.estimate_umbrella_sampling(\n",
+    "    adw_us_trajs, adw_us_cluster.dtrajs, adw_us_umbrella_centers, adw_us_force_constants,\n",
+    "    maxiter=100000, maxerr=1.0E-15, save_convergence_info=50, estimator='wham')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pyemma.plots.plot_convergence_info(adw_us_estimator)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fourth step: plot the free energies ``f`` and ``f_therm``..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_us_x, adw_us_f = shortcuts.adw_match_reference_to_binning(adw_us_trajs, adw_us_cluster.clustercenters)\n",
+    "\n",
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * pw, ph))\n",
+    "ax[0].plot(\n",
+    "    adw_us_cluster.clustercenters[adw_us_estimator.active_set, 0], adw_us_estimator.f, 's', markersize=10, label=adw_us_estimator.name)\n",
+    "ax[0].plot(adw_us_x, adw_us_f, '-*', linewidth=2, markersize=9, color='black', label='Reference')\n",
+    "ax[0].set_xlabel(r\"configuration state\", fontsize=20)\n",
+    "ax[0].set_ylabel(r\"f / kT\", fontsize=20)\n",
+    "ax[1].plot(adw_us_estimator.umbrella_centers[:, 0], adw_us_estimator.f_therm, 's', markersize=10, label=adw_us_estimator.name)\n",
+    "ax[1].set_xlabel(r\"umbrella_center\", fontsize=20)\n",
+    "ax[1].set_ylabel(r\"f_therm / kT\", fontsize=20)\n",
+    "for _ax in ax:\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "    _ax.set_ylim([0, 12])\n",
+    "    _ax.legend(loc=4, fontsize=10, fancybox=True, framealpha=0.5)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Mixed simulations data: US simulations + unbiased simulations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# redo clustering with both, biased and unbiased data\n",
+    "adw_us_cluster = pyemma.coordinates.cluster_regspace(adw_us_trajs + adw_md_trajs, max_centers=500, dmin=0.2)\n",
+    "\n",
+    "# split dtrajs into biased and unbiased\n",
+    "adw_us_dtrajs = adw_us_cluster.dtrajs[:len(adw_us_trajs)]\n",
+    "adw_md_dtrajs = adw_us_cluster.dtrajs[len(adw_us_trajs):]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# plot order parameter trajectories of the unbiased simulations\n",
+    "for t in adw_md_trajs:\n",
+    "    plt.plot(t)\n",
+    "plt.ylabel('x')\n",
+    "plt.xlabel('step')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# run the estimator again for a sequence of lag times\n",
+    "lags = [1, 2, 5, 7, 10, 15, 20, 30, 40, 50, 70, 100]\n",
+    "\n",
+    "memms = pyemma.thermo.estimate_umbrella_sampling(\n",
+    "    adw_us_trajs, adw_us_dtrajs, adw_us_umbrella_centers, adw_us_force_constants,\n",
+    "    md_trajs=adw_md_trajs, md_dtrajs=adw_md_dtrajs,\n",
+    "    lag=lags,\n",
+    "    maxiter=100000, maxerr=1.0E-15, save_convergence_info=50, estimator='dtram')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TRAM\n",
+    "#lags = [1, 10, 50]\n",
+    "#memms = pyemma.thermo.estimate_umbrella_sampling(\n",
+    "#    adw_us_trajs, adw_us_dtrajs, adw_us_umbrella_centers, adw_us_force_constants,\n",
+    "#    md_trajs=adw_md_trajs, md_dtrajs=adw_md_dtrajs,\n",
+    "#    lag=lags,\n",
+    "#    maxiter=100000, maxerr=1.0E-6, init_maxerr=1.0, save_convergence_info=50, estimator='tram', direct_space=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "[ m.name for m in memms ]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# plot implied time scales depending on lag time\n",
+    "pyemma.plots.plot_memm_implied_timescales(memms)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# at 10 steps the implied time scales look converged, pick that model for analysis\n",
+    "print(memms[4].lag)\n",
+    "dtram_estiamtor = memms[4]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# for TRAM\n",
+    "#print(memms[1].lag)\n",
+    "#dtram_estiamtor = memms[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# plot estimate of the stationary distribution\n",
+    "adw_us_x, adw_us_f = shortcuts.adw_match_reference_to_binning(adw_us_trajs, adw_us_cluster.clustercenters)\n",
+    "\n",
+    "plt.figure(figsize=(2 * pw, ph))\n",
+    "plt.plot(\n",
+    "    adw_us_cluster.clustercenters[dtram_estiamtor.active_set, 0], dtram_estiamtor.f, 's', markersize=10, label=dtram_estiamtor.name)\n",
+    "plt.plot(adw_us_x, adw_us_f, '-*', linewidth=2, markersize=9, color='black', label='Reference')\n",
+    "plt.xlabel(r\"configuration state\", fontsize=20)\n",
+    "plt.ylabel(r\"f / kT\", fontsize=20)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The MSM of the unbiased ensemble can be accessed via dtram_estiamtor.msm\n",
+    "unbiased_msm = dtram_estiamtor.msm\n",
+    "\n",
+    "# We can do all the usual MSM analyses now, e. g. coarse-graining with PCCA and computing MFPTs.\n",
+    "pcca = unbiased_msm.pcca(2)\n",
+    "\n",
+    "print(\"MFPT[blue->green] = %7.1f steps\" %  unbiased_msm.mfpt(pcca.metastable_sets[0], pcca.metastable_sets[1]))\n",
+    "print(\"MFPT[green->blue] = %7.1f steps\" %  unbiased_msm.mfpt(pcca.metastable_sets[1], pcca.metastable_sets[0]))\n",
+    "\n",
+    "plt.plot(adw_us_x, adw_us_f, '-*', linewidth=2, markersize=9, color='black')\n",
+    "plt.scatter(\n",
+    "    adw_us_cluster.clustercenters[unbiased_msm.active_set, 0],\n",
+    "    -np.log(unbiased_msm.stationary_distribution[unbiased_msm.active_set]),\n",
+    "    s=120, c=pcca.metastable_assignment, cmap=mpl.cm.brg)\n",
+    "\n",
+    "plt.xlabel(r\"configuration state\", fontsize=20)\n",
+    "plt.ylabel(r\"f / kT\", fontsize=20)\n",
+    "plt.tick_params(labelsize=15)\n",
+    "plt.xlim([-1, 5])\n",
+    "plt.ylim([0, 12])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PyEMMA's general thermo API\n",
+    "\n",
+    "## binned estimators\n",
+    "The `pyemma.thermo` module provides the following API functions to perform ``dTRAM`` and ``WHAM`` estimations:\n",
+    "\n",
+    "```python\n",
+    "def dtram(\n",
+    "    ttrajs, dtrajs, bias, lag,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    dt_traj='1 step', init=None):\n",
+    "    ...\n",
+    "    \n",
+    "def wham(\n",
+    "    ttrajs, dtrajs, bias,\n",
+    "    maxiter=100000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    dt_traj='1 step'):\n",
+    "    ...\n",
+    "```\n",
+    "\n",
+    "- ``ttrajs`` is a list of ``numpy.ndarray`` objects with ``shape=(T_i,)``, where ``T_i`` denotes the number of frames in trajectory ``i``. The entries indicate in which thermodynamic state each frame was created.\n",
+    "- ``dtrajs`` is a list of ``numpy.ndarray`` objects with ``shape=(T_i,)``, where ``T_i`` denotes the number of frames in trajectory ``i``. The entries indicate to which discrete configuration states each frame belongs.\n",
+    "- ``bias`` is a ``numpy.ndarray`` with ``shape=(K, N)``, where ``K`` is the number of thermodynamic states and ``N`` is the number of discrete configuration states. The elements are the dimensionless bias energies for all combinations of discrete configuration and thermodynamic states.\n",
+    "- ``lag`` is the lag time in steps at which transitions are counted.\n",
+    "\n",
+    "\n",
+    "## bin-less estimators\n",
+    "\n",
+    "```python\n",
+    "def tram(\n",
+    "    ttrajs, dtrajs, bias, lag,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    dt_traj='1 step', init=None, direct_space=False):\n",
+    "    ...\n",
+    "    \n",
+    "def mbar(\n",
+    "    ttrajs, dtrajs, bias,\n",
+    "    maxiter=100000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    dt_traj='1 step', direct_space=False):\n",
+    "    ...\n",
+    "```\n",
+    "\n",
+    "The ``bias`` parameter of bin-less estimators has a different formet than for binned estimators:\n",
+    "\n",
+    "\n",
+    "- ``bias`` is a ``(numpy.ndarray(T, num_therm_states)``, or list of ``numpy.ndarray(T_i, num_therm_states))`` – A single reduced bias energy trajectory or a list of reduced bias energy trajectories. For every simulation frame seen in trajectory `i` and time step `t`, ``btrajs[i][t, k]`` is the reduced bias energy of that frame evaluated in the `k`’th thermodynamic state (i.e. at the `k`’th umbrella/Hamiltonian/temperature)\n",
+    "\n",
+    "The parameter ``direct_space`` allows to optimize the calculation for speed.\n",
+    "\n",
+    "- ``direct_space`` is an optional boolean parameter that is false by default. – Whether to perform the self-consitent iteration with Boltzmann factors (direct space) or free energies (log-space). Calculations in direct space are faster. When analyzing data from multi-temperature simulations, direct-space is not recommended.\n",
+    "\n",
+    "To make the preparation of ``ttrajs`` and ``bias`` easier, we provide two further API functions to handle the preparation for certain types of simulations, i.e., multi-temperature and umbrella sampling with harmonic bias potentials."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  },
+  "widgets": {
+   "state": {
+    "50db3d5cc6a04d16bd12570766f0e4ff": {
+     "views": [
+      {
+       "cell_index": 23
+      }
+     ]
+    },
+    "5fdb67a20a6149edae7d492d460f5ad0": {
+     "views": [
+      {
+       "cell_index": 23
+      }
+     ]
+    },
+    "7a51b1fc2b2b41d1abcdb0109735a776": {
+     "views": [
+      {
+       "cell_index": 23
+      }
+     ]
+    },
+    "bca84f0d22cb4f559518403513a679f2": {
+     "views": [
+      {
+       "cell_index": 23
+      }
+     ]
+    },
+    "d82393608eb8499fad40c83caa2dd272": {
+     "views": [
+      {
+       "cell_index": 23
+      }
+     ]
+    }
+   },
+   "version": "1.2.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/notebooks/MEMM_03_mt_lecture.ipynb b/notebooks/MEMM_03_mt_lecture.ipynb
new file mode 100644
index 0000000..bdbe61e
--- /dev/null
+++ b/notebooks/MEMM_03_mt_lecture.ipynb
@@ -0,0 +1,331 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import print_function\n",
+    "%matplotlib inline\n",
+    "import matplotlib as mpl\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import pyemma\n",
+    "import mdshare\n",
+    "\n",
+    "# import some functions which should not clutter the notebook\n",
+    "import shortcuts_thermo as shortcuts\n",
+    "\n",
+    "# figure size parameters\n",
+    "pw = 6\n",
+    "ph = 0.75 * pw"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example - one-dimensional asymmetric double well potential\n",
+    "\n",
+    "We start by looking at the stationary distribution and free energy profile which are available analytically."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_x, adw_f, adw_pi = shortcuts.adw_reference(-1, 5, 100)\n",
+    "\n",
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * pw, ph))\n",
+    "ax[0].plot(adw_x, adw_pi, linewidth=3, color='black')\n",
+    "ax[0].set_ylabel(r\"$\\pi(x)$\", fontsize=20)\n",
+    "ax[0].semilogy()\n",
+    "ax[1].plot(adw_x, adw_f, linewidth=3, color='black')\n",
+    "ax[1].set_ylabel(r\"$f(x)$ / kT\", fontsize=20)\n",
+    "for _ax in ax:\n",
+    "    _ax.set_xlabel(r\"$x$ / a.u.\", fontsize=20)\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multi-temperature (simulated tempering data)\n",
+    "\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\left( \\frac{1}{\\text{k}_\\text{B} T^{(i)}} - \\frac{1}{\\text{k}_\\text{B} T^{\\circ}} \\right) U(\\mathbf{x}) = \\left( \\frac{1}{\\text{k}_\\text{B} T^{(i)}} - \\frac{1}{\\text{k}_\\text{B} T^{\\circ}} \\right) \\text{k}_\\text{B} T^{(j)} u^{(j)}(\\mathbf{x})$$\n",
+    "\n",
+    "Let's plot the stationary distributions and free energy profiles for four different kT values:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * pw, ph))\n",
+    "# plot the thermodynamic ground/unbiased state (kT=1.0)\n",
+    "ax[0].plot(adw_x, adw_pi, linewidth=3, color='black', label='kT=1.0')\n",
+    "ax[1].plot(adw_x, adw_f, linewidth=3, color='black', label='kT=1.0')\n",
+    "# plot the kT=2.0 case\n",
+    "_, adw_f2, adw_pi2 = shortcuts.adw_reference(adw_x[0], adw_x[-1], adw_x.shape[0], kT=2.0)\n",
+    "ax[0].plot(adw_x, adw_pi2, linewidth=3, color='blue', label='kT=2.0')\n",
+    "ax[1].plot(adw_x, adw_f2, linewidth=3, color='blue', label='kT=2.0')\n",
+    "# plot the kT=3.5 case\n",
+    "_, adw_f2, adw_pi2 = shortcuts.adw_reference(adw_x[0], adw_x[-1], adw_x.shape[0], kT=3.5)\n",
+    "ax[0].plot(adw_x, adw_pi2, linewidth=3, color='green', label='kT=3.5')\n",
+    "ax[1].plot(adw_x, adw_f2, linewidth=3, color='green', label='kT=3.5')\n",
+    "# plot the kT=7.5 case\n",
+    "_, adw_f2, adw_pi2 = shortcuts.adw_reference(adw_x[0], adw_x[-1], adw_x.shape[0], kT=7.5)\n",
+    "ax[0].plot(adw_x, adw_pi2, linewidth=3, color='red', label='kT=7.5')\n",
+    "ax[1].plot(adw_x, adw_f2, linewidth=3, color='red', label='kT=7.5')\n",
+    "# finish the figure\n",
+    "ax[0].set_ylabel(r\"$\\pi^{(j)}(x)$\", fontsize=20)\n",
+    "ax[0].semilogy()\n",
+    "ax[0].legend(loc=8, fontsize=12, fancybox=True, framealpha=0.5)\n",
+    "ax[1].set_ylabel(r\"$f^{(j)}(x) - f^{(j)}$ / kT\", fontsize=20)\n",
+    "ax[1].legend(loc=9, fontsize=12, fancybox=True, framealpha=0.5)\n",
+    "for _ax in ax:\n",
+    "    _ax.set_xlabel(r\"$x$ / a.u.\", fontsize=20)\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "``pyemma.thermo`` provides an API function for this simulation type:\n",
+    "\n",
+    "```python\n",
+    "def estimate_multi_temperature(\n",
+    "    energy_trajs, temp_trajs, dtrajs,\n",
+    "    energy_unit='kcal/mol', temp_unit='K', reference_temperature=None,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    estimator='wham', lag=1, dt_traj='1 step', init=None):\n",
+    "    ...\n",
+    "```\n",
+    "\n",
+    "First step: import the data from 20 precomputed trajectories..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with np.load(mdshare.load('pyemma-tutorial-mt-data.npz', working_directory='data')) as fh:\n",
+    "    adw_st_conf_trajs = [fh['conf_traj_%03d.npy' % i] for i in range(20)]\n",
+    "    adw_st_temp_trajs = [fh['temp_traj_%03d.npy' % i] for i in range(20)]\n",
+    "    adw_st_energy_trajs = [fh['energy_traj_%03d.npy' % i] for i in range(20)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For one trajectory, plot all three types of data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_st_index = 4\n",
+    "fig, ax = plt.subplots(3, 1, figsize=(2 * pw, 3 * ph), sharex=True)\n",
+    "ax[0].plot(adw_st_conf_trajs[adw_st_index][:, 0], '-x')\n",
+    "ax[1].plot(adw_st_temp_trajs[adw_st_index], '-x')\n",
+    "ax[2].plot(adw_st_energy_trajs[adw_st_index], '-x')\n",
+    "ax[2].set_xlabel(r\"time / steps\", fontsize=20)\n",
+    "ax[0].set_ylabel(r\"configuration state / a.u.\", fontsize=20)\n",
+    "ax[1].set_ylabel(r\"heat bath / kT\", fontsize=20)\n",
+    "ax[2].set_ylabel(r\"energy / kT\", fontsize=20)\n",
+    "for _ax in ax:\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Second step: run a clustering algorithm on the configuration trajectories..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_st_cluster = pyemma.coordinates.cluster_regspace(adw_st_conf_trajs, max_centers=500, dmin=0.2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Third step: run ``WHAM`` and ``dTRAM`` estimations using the ``estimate_multi_temperature`` API function..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_st_estimator = pyemma.thermo.estimate_multi_temperature(\n",
+    "    adw_st_energy_trajs, adw_st_temp_trajs, adw_st_cluster.dtrajs,\n",
+    "    energy_unit='kT', temp_unit='kT',\n",
+    "    maxiter=10000, maxerr=1.0E-14, save_convergence_info=1, estimator='dtram')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pyemma.plots.plot_convergence_info(adw_st_estimator)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fourth step: plot the free energies ``f`` and ``f_therm``..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "adw_st_x, adw_st_f = shortcuts.adw_match_reference_to_binning(adw_st_conf_trajs, adw_st_cluster.clustercenters)\n",
+    "\n",
+    "fig, ax = plt.subplots(1, 2, figsize=(2 * pw, ph))\n",
+    "ax[0].plot(\n",
+    "    adw_st_cluster.clustercenters[adw_st_estimator.active_set, 0], adw_st_estimator.f, 's', markersize=10, label=adw_st_estimator.name)\n",
+    "ax[0].plot(adw_st_x, adw_st_f, '-*', linewidth=2, markersize=9, color='black', label='Reference')\n",
+    "ax[0].set_xlabel(r\"configuration state\", fontsize=20)\n",
+    "ax[0].set_ylabel(r\"f / kT\", fontsize=20)\n",
+    "ax[0].legend(loc=0, fontsize=10, fancybox=True, framealpha=0.5)\n",
+    "ax[1].plot(adw_st_estimator.temperatures, adw_st_estimator.f_therm, 's', markersize=10, label=adw_st_estimator.name)\n",
+    "ax[1].set_ylabel(r\"f_therm / kT\", fontsize=20)\n",
+    "ax[1].legend(loc=4, fontsize=10, fancybox=True, framealpha=0.5)\n",
+    "ax[1].set_xlabel(r\"kT\", fontsize=20)\n",
+    "for _ax in ax:\n",
+    "    _ax.tick_params(labelsize=15)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plot the stationay distribution of the thermodynamic ground state."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.figure(figsize=(pw, ph))\n",
+    "plt.plot(\n",
+    "    adw_st_cluster.clustercenters[adw_st_estimator.active_set, 0], adw_st_estimator.pi, 's', markersize=10)\n",
+    "plt.xlabel(r\"configuration state\", fontsize=20)\n",
+    "plt.ylabel(r\"pi\", fontsize=20)\n",
+    "plt.ylim([1.0E-11, 1.0])\n",
+    "plt.yscale('log')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Last step: obtain some kinetic information.\n",
+    "In this case, we run ``PCCA`` on the ``MSM`` of the thermodynamic ground state and compute mean first passage times between the two metastable sets..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The MSM of the unbiased ensemble can be accessed via dtram_estiamtor.msm\n",
+    "unbiased_msm = adw_st_estimator.msm\n",
+    "\n",
+    "# We can do all the usual MSM analyses now, e. g. coarse-graining with PCCA and computing MFPTs.\n",
+    "pcca = unbiased_msm.pcca(2)\n",
+    "\n",
+    "print(\"MFPT[blue->green] = %7.1f steps\" %  unbiased_msm.mfpt(pcca.metastable_sets[0], pcca.metastable_sets[1]))\n",
+    "print(\"MFPT[green->blue] = %7.1f steps\" %  unbiased_msm.mfpt(pcca.metastable_sets[1], pcca.metastable_sets[0]))\n",
+    "\n",
+    "plt.plot(adw_st_x, adw_st_f, '-*', linewidth=2, markersize=9, color='black')\n",
+    "plt.scatter(\n",
+    "    adw_st_cluster.clustercenters[unbiased_msm.active_set, 0],\n",
+    "    -np.log(unbiased_msm.stationary_distribution[unbiased_msm.active_set]),\n",
+    "    s=120, c=pcca.metastable_assignment, cmap=mpl.cm.brg)\n",
+    "\n",
+    "plt.xlabel(r\"configuration state\", fontsize=20)\n",
+    "plt.ylabel(r\"f / kT\", fontsize=20)\n",
+    "plt.tick_params(labelsize=15)\n",
+    "plt.xlim([-1, 5])\n",
+    "plt.ylim([0, 12])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  },
+  "widgets": {
+   "state": {
+    "9ae68ce602b741329d88dacb0bae770c": {
+     "views": [
+      {
+       "cell_index": 12
+      }
+     ]
+    }
+   },
+   "version": "1.2.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/notebooks/MEMM_04_exercise.ipynb b/notebooks/MEMM_04_exercise.ipynb
new file mode 100644
index 0000000..d938860
--- /dev/null
+++ b/notebooks/MEMM_04_exercise.ipynb
@@ -0,0 +1,550 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Understanding MEMMs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib as mpl\n",
+    "import numpy as np\n",
+    "import pyemma\n",
+    "import mdshare\n",
+    "\n",
+    "mpl.rcParams.update({'font.size': 14})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Bias energies in multi-temperature simulations\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\left( \\frac{1}{\\text{k}_\\text{B} T^{(i)}} - \\frac{1}{\\text{k}_\\text{B} T^{\\circ}} \\right) U(\\mathbf{x}) = \\left( \\frac{1}{\\text{k}_\\text{B} T^{(i)}} - \\frac{1}{\\text{k}_\\text{B} T^{\\circ}} \\right) \\text{k}_\\text{B} T^{(j)} u^{(j)}(\\mathbf{x})$$\n",
+    "\n",
+    "**Note**: all simulations/fragments at $T=T^\\circ$ are **unbiased** while simulations/fragments at $T\\neq T^\\circ$ are **biased** with respect to the reference temperature.\n",
+    "\n",
+    "## API functions for multi-temperature simulations\n",
+    "\n",
+    "For these simulation types, the `pyemma.thermo` module provides the API function\n",
+    "\n",
+    "```python\n",
+    "def estimate_multi_temperature(\n",
+    "    energy_trajs, temp_trajs, dtrajs,\n",
+    "    energy_unit='kcal/mol', temp_unit='K', reference_temperature=None,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    estimator='wham', lag=1, dt_traj='1 step', init=None):\n",
+    "    ...\n",
+    "```\n",
+    "\n",
+    "Let us revisit the example of the asymmetric doublewell potential:\n",
+    "\n",
+    "### Step 1\n",
+    "We start with loading the data and exploring the different trajectory types and realizations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with np.load(mdshare.load('pyemma-tutorial-mt-data.npz', working_directory='data')) as fh:\n",
+    "    trajs = [fh['conf_traj_%03d.npy' % i] for i in range(20)]\n",
+    "    temp_trajs = [fh['temp_traj_%03d.npy' % i] for i in range(20)]\n",
+    "    energy_trajs = [fh['energy_traj_%03d.npy' % i] for i in range(20)]\n",
+    "\n",
+    "fig, axes = plt.subplots(len(trajs), 3, figsize=(10, 1 * len(trajs)), sharex=True)\n",
+    "for i, (x, t, e) in enumerate(zip(trajs, temp_trajs, energy_trajs)):\n",
+    "    axes[i, 0].plot(x)\n",
+    "    axes[i, 1].plot(t)\n",
+    "    axes[i, 2].plot(e)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 2\n",
+    "We create a regular grid which covers the full range of all simulated trajectories and assign the latter to these grid points."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.linspace(np.min(trajs), np.max(trajs), 61)\n",
+    "centers = 0.5 * (x[:-1] + x[1:]).reshape(-1, 1)\n",
+    "dtrajs = pyemma.coordinates.assign_to_centers(trajs, centers=centers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3\n",
+    "Applying WHAM to a *new* dataset is a good way to get a first impression.\n",
+    "\n",
+    "We pass our three lists with configuration, temperature, and energy trajectories into the `estimate_multi_temperature()` API function. As both, the temperatures and energies, are given in units of $\\text{k}_{\\text{B}}T$, we have to pass this information into the API function, too. Finally, we choose a termination criterion and the level of convergence information output, and we make sure that the API function uses the WHAM estimator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wham = pyemma.thermo.estimate_multi_temperature(\n",
+    "    energy_trajs, temp_trajs, dtrajs,\n",
+    "    energy_unit='kT', temp_unit='kT',\n",
+    "    maxiter=100000, maxerr=5e-14, save_convergence_info=1,\n",
+    "    estimator='wham')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After the estimation process terminates, we can visualize the convergence behaviour using a convenience function from the `pyemma.plots` module."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pyemma.plots.plot_convergence_info(wham);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The WHAM estimator returns the most simple multiensemble model in `pyemma`. The model contains the stationary distribution and free energy profile of the unbiased state - whether we have provided unbiased data or not:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, (ax_pi, ax_f) = plt.subplots(1, 2, figsize=(10, 4))\n",
+    "ax_pi.plot(centers, wham.pi)\n",
+    "ax_pi.set_ylabel('$\\pi(x)$')\n",
+    "ax_f.plot(centers, wham.f)\n",
+    "ax_f.set_ylabel('$f(x)$')\n",
+    "for ax in (ax_pi, ax_f):\n",
+    "    ax.set_xlabel('$x$')\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In addition, we have a model for each of the provided thermodynamic states which all include a stationary distribution and free energy profile."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, (ax_pi, ax_f) = plt.subplots(1, 2, figsize=(10, 4))\n",
+    "ax_pi.plot(centers, wham.pi, ':o', color='black', label='unbiased')\n",
+    "ax_pi.set_ylabel('$\\pi(x)$')\n",
+    "ax_f.plot(centers, wham.f, ':o', color='black')\n",
+    "ax_f.set_ylabel('$f(x)$')\n",
+    "for i, model in enumerate(wham.models):\n",
+    "    ax_pi.plot(centers, model.pi, label='T=%.2f kT' % wham.temperatures[i])\n",
+    "    ax_f.plot(centers, model.f)\n",
+    "for ax in (ax_pi, ax_f):\n",
+    "    ax.set_xlabel('$x$')\n",
+    "ax_pi.legend()\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see in the above figure that the model for $T=1\\text{k}_{\\text{B}}T$ yields the same result as the unbiased thermodynamic state of the WHAM estimation. This is because the `estimate_multi_temperature()` API function use, unless requested otherwise, the lowest temperature as reference and, hence, all simulations/fragments at $T=1\\text{k}_{\\text{B}}T$ are unbiased.\n",
+    "\n",
+    "### Step 4\n",
+    "To obtain kinetic results from our dataset, we apply DTRAM. The call is nearly the same as for WHAM: we just have to change the `estimator` parameter to `dtram` and specify one or more lag times. If we use a single lag time, the function returns a single `MEMM` and if we give more than one lag time, the function instead returns a list of `MEMM` objects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dtram = pyemma.thermo.estimate_multi_temperature(\n",
+    "    energy_trajs, temp_trajs, dtrajs,\n",
+    "    energy_unit='kT', temp_unit='kT',\n",
+    "    maxiter=20000, maxerr=5e-14, save_convergence_info=20,\n",
+    "    estimator='dtram', lag=[i + 1 for i in range(10)])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Agan, we visualize the convergence behaviour. Now that we have estimates for several lag times, the plotting functions draws one set of convergence curves for each MEMM.\n",
+    "\n",
+    "In addition, we can also visualize how the `MEMM` implied timescales (for the unbiased thermodynamic state) converge with the chosen lag times."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, (ax_inc, ax_lli, ax_its) = plt.subplots(3, 1, figsize=(10, 15))\n",
+    "pyemma.plots.plot_convergence_info(dtram, axes=[ax_inc, ax_lli])\n",
+    "pyemma.plots.plot_memm_implied_timescales(dtram, ax=ax_its, nits=5)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Based on the implied timescale plot, we select an `MEMM` at a suitable lag time and continue with our analysis.\n",
+    "\n",
+    "Like WHAM, an MEMM contains the unbiased stationary distribution and free energy profile as well as a list of internal models for all provided thermodynamic states."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "memm = dtram[2]\n",
+    "\n",
+    "fig, (ax_pi, ax_f) = plt.subplots(1, 2, figsize=(10, 4))\n",
+    "ax_pi.plot(centers, memm.pi, ':o', color='black', label='unbiased')\n",
+    "ax_pi.set_ylabel('$\\pi(x)$')\n",
+    "ax_f.plot(centers, memm.f, ':o', color='black')\n",
+    "ax_f.set_ylabel('$f(x)$')\n",
+    "for i, model in enumerate(memm.models):\n",
+    "    ax_pi.plot(centers, model.pi, label='T=%.2f kT' % memm.temperatures[i])\n",
+    "    ax_f.plot(centers, model.f)\n",
+    "for ax in (ax_pi, ax_f):\n",
+    "    ax.set_xlabel('$x$')\n",
+    "ax_pi.legend()\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In contrast to WHAM, each internal model of an `MEMM` is itself a Markov state model object.\n",
+    "\n",
+    "**Note**: only if we have supplied unbised data can the estimated `MEMM` provide an MSM for the unbiased state!\n",
+    "\n",
+    "Now that we have an MSM, we can search for metastable states and compute, e.g., mean first passage times (MFPTs) between them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# extract the unbiased MSM\n",
+    "msm = memm.msm\n",
+    "\n",
+    "# find metastable sets\n",
+    "pcca = msm.pcca(2)\n",
+    "\n",
+    "# visualize the metastable sets\n",
+    "fig, (ax_pi, ax_f) = plt.subplots(1, 2, figsize=(10, 4))\n",
+    "ax_pi.plot(centers, msm.pi, '--', color='black', label='unbiased')\n",
+    "ax_f.plot(centers, msm.f, '--', color='black')\n",
+    "for i, s in enumerate(pcca.metastable_sets):\n",
+    "    sm = msm.active_set[s]\n",
+    "    ax_pi.scatter(centers[sm], msm.pi[sm], c='C%d' % i, label='state %d' % (i + 1))\n",
+    "    ax_f.scatter(centers[sm], msm.f[sm], c='C%d' % i)\n",
+    "ax_pi.legend()\n",
+    "ax_pi.set_ylabel('$\\pi(x)$')\n",
+    "ax_f.set_ylabel('$f(x)$')\n",
+    "for ax in [ax_pi, ax_f]:\n",
+    "    ax.set_xlabel('$x$')\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please note that only the highlighted microstates have been visited in the unbiased simulations/fragments."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('MFPT[1 -> 2] = %7.1f steps' % (msm.mfpt(pcca.metastable_sets[0], pcca.metastable_sets[1])))\n",
+    "print('MFPT[2 -> 1] = %7.1f steps' % (msm.mfpt(pcca.metastable_sets[1], pcca.metastable_sets[0])))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Bias energies in umbrella sampling simulations\n",
+    "\n",
+    "The bias is computed via a harmonic potential based on the deviation of a frame from a reference structure. In the usual one-dimensional case, this reads\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\frac{k^{(i)}}{2} \\left\\Vert \\mathbf{x} - \\mathbf{x}^{(i)} \\right\\Vert^2.$$\n",
+    "\n",
+    "In the more general case, though, one can use a non-symmetric force matrix:\n",
+    "\n",
+    "$$b^{(i)}(\\mathbf{x}) = \\frac{1}{2} \\left\\langle \\mathbf{x} - \\mathbf{x}^{(i)} \\middle\\vert \\mathbf{k}^{(i)} \\middle\\vert \\mathbf{x} - \\mathbf{x}^{(i)} \\right\\rangle.$$\n",
+    "\n",
+    "## API functions for umbrella sampling\n",
+    "\n",
+    "For these simulation types, the `pyemma.thermo` module provides the API function\n",
+    "\n",
+    "```python\n",
+    "def estimate_umbrella_sampling(\n",
+    "    us_trajs, us_dtrajs, us_centers, us_force_constants,\n",
+    "    md_trajs=None, md_dtrajs=None, kT=None,\n",
+    "    maxiter=10000, maxerr=1.0E-15, save_convergence_info=0,\n",
+    "    estimator='wham', lag=1, dt_traj='1 step', init=None):\n",
+    "    ...\n",
+    "\n",
+    "```\n",
+    "\n",
+    "### Step 1\n",
+    "We start by loading and visualizing the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with np.load(mdshare.load('pyemma-tutorial-us-data.npz', working_directory='data')) as fh:\n",
+    "    # load biased data\n",
+    "    us_trajs = [fh['us_traj_%03d.npy' % i] for i in range(100)]\n",
+    "    us_centers = fh['umbrella_centers'].tolist()\n",
+    "    us_force_constants = fh['force_constants'].tolist()\n",
+    "    # load unbiased data\n",
+    "    md_trajs = [fh['md_traj_%03d.npy' % i] for i in range(5)]\n",
+    "\n",
+    "fig, (ax_us, ax_md) = plt.subplots(1, 2, figsize=(10, 4))\n",
+    "for traj in us_trajs:\n",
+    "    ax_us.plot(traj)\n",
+    "for traj in md_trajs:\n",
+    "    ax_md.plot(traj)\n",
+    "for ax in [ax_us, ax_md]:\n",
+    "    ax.set_xlabel('steps')\n",
+    "    ax.set_ylabel('$x$')\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The umbrella sampling data seems to overlap nicely (left) but the unbiased data appears to be nonreversible (right).\n",
+    "\n",
+    "### Step 2\n",
+    "We create a regular grid which covers the full range of all simulated trajectories and assign the latter to these grid points."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.linspace(min(np.min(us_trajs), np.min(md_trajs)), max(np.max(us_trajs), np.max(md_trajs)), 61)\n",
+    "centers = 0.5 * (x[:-1] + x[1:]).reshape(-1, 1)\n",
+    "us_dtrajs = pyemma.coordinates.assign_to_centers(us_trajs, centers=centers)\n",
+    "md_dtrajs = pyemma.coordinates.assign_to_centers(md_trajs, centers=centers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3\n",
+    "Apply WHAM to **only** the biased data and visualize."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wham = pyemma.thermo.estimate_umbrella_sampling(\n",
+    "    us_trajs, us_dtrajs, us_centers, us_force_constants,\n",
+    "    md_trajs=None, md_dtrajs=None,\n",
+    "    maxiter=100000, maxerr=5e-14, save_convergence_info=1,\n",
+    "    estimator='wham')\n",
+    "\n",
+    "fig, axes = plt.subplots(2, 2, figsize=(10, 8))\n",
+    "pyemma.plots.plot_convergence_info(wham, axes=axes[0, :])\n",
+    "axes[1, 0].plot(centers, wham.pi)\n",
+    "axes[1, 1].plot(centers, wham.f)\n",
+    "for ax in axes[1, :]:\n",
+    "    ax.set_xlabel('$x$')\n",
+    "axes[1, 0].set_ylabel('$\\pi(x)$')\n",
+    "axes[1, 1].set_ylabel('$f(x)$')\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise!\n",
+    "Apply WHAM to the biased and unbiased data. What happens?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wham2 = # FIXME\n",
+    "\n",
+    "fig, axes = plt.subplots(2, 2, figsize=(10, 8))\n",
+    "pyemma.plots.plot_convergence_info(wham2, axes=axes[0, :])\n",
+    "axes[1, 0].plot(centers, wham.pi, label='wham')\n",
+    "axes[1, 1].plot(centers, wham.f)\n",
+    "axes[1, 0].plot(centers, wham2.pi, label='wham2')\n",
+    "axes[1, 1].plot(centers, wham2.f)\n",
+    "for ax in axes[1, :]:\n",
+    "    ax.set_xlabel('$x$')\n",
+    "axes[1, 0].legend()\n",
+    "axes[1, 0].set_ylabel('$\\pi(x)$')\n",
+    "axes[1, 1].set_ylabel('$f(x)$')\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 4\n",
+    "Apply DTRAM to the **only** the biased data and visualize."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dtram = pyemma.thermo.estimate_umbrella_sampling(\n",
+    "    us_trajs, us_dtrajs, us_centers, us_force_constants,\n",
+    "    md_trajs=None, md_dtrajs=None,\n",
+    "    maxiter=20000, maxerr=5e-14, save_convergence_info=1,\n",
+    "    estimator='dtram', lag=[i + 1 for i in range(10)])\n",
+    "\n",
+    "fig, (ax_inc, ax_lli, ax_its) = plt.subplots(3, 1, figsize=(10, 15))\n",
+    "pyemma.plots.plot_convergence_info(dtram, axes=[ax_inc, ax_lli])\n",
+    "pyemma.plots.plot_memm_implied_timescales(dtram, ax=ax_its, nits=5)\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What happend?\n",
+    "\n",
+    "**Remember**: an `MEMM` wonly provides an unbiased `MSM`, if we provide unbiased data. By default, the plotting function looks for the unbiased `MSM` which leads to the observed exception.\n",
+    "\n",
+    "For biased data only, the `MEMM` cannot give us unbiased kinetics. We can still plot the stationary properties, though."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "memm = dtram[0]\n",
+    "\n",
+    "fig, (ax_pi, ax_f) = plt.subplots(1, 2, figsize=(10, 4))\n",
+    "ax_pi.plot(centers, memm.pi, ':o', color='black', label='unbiased')\n",
+    "ax_pi.set_ylabel('$\\pi(x)$')\n",
+    "ax_f.plot(centers, memm.f, ':o', color='black')\n",
+    "ax_f.set_ylabel('$f(x)$')\n",
+    "for model in memm.models:\n",
+    "    ax_pi.plot(centers, model.pi)\n",
+    "    ax_f.plot(centers, model.f)\n",
+    "for ax in (ax_pi, ax_f):\n",
+    "    ax.set_xlabel('$x$')\n",
+    "fig.tight_layout()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise!\n",
+    "Build an `MEMM` with a suitable lag time, visualize the stationary properties, and compute MFPTs between two metastable states."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# FIXME"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/notebooks/shortcuts_thermo.py b/notebooks/shortcuts_thermo.py
new file mode 100644
index 0000000..27d5af9
--- /dev/null
+++ b/notebooks/shortcuts_thermo.py
@@ -0,0 +1,91 @@
+import numpy as np
+import pyemma
+
+def adw_potential(x, k=0.0, x0=0.0):
+    r"""potential energy of confuguration state x in kT"""
+    return 0.5 * (x - 2.0)**4 - 3.0 * (x - 2.0)**2 + x - 2.0 + 0.5 * k * (x - x0)**2
+
+def adw_match_reference_to_binning(xtrajs, clustercenters, kT=1.0):
+    r"""reference free energy in kT averaged over discrete states"""
+    x = np.linspace(
+        np.min([xtraj.min() for xtraj in xtrajs]), np.max([xtraj.max() for xtraj in xtrajs]), 1000)
+    d = pyemma.coordinates.assign_to_centers(data=x, centers=clustercenters)[0]
+    pi_fine = np.exp(-adw_potential(x) / kT)
+    pi_fine /= pi_fine.sum()
+    pi_coarse = np.zeros(shape=(clustercenters.shape[0]), dtype=np.float64)
+    for i in range(clustercenters.shape[0]):
+        idx = np.where(d == i)[0]
+        pi_coarse[i] = pi_fine[idx].sum()
+    f = -np.log(pi_coarse)
+    srt = np.argsort(clustercenters, axis=0)
+    return clustercenters[srt, 0], f[srt]
+
+def adw_reference(xmin, xmax, nbins, kT=1.0, k_bias=0.0, x_bias=0.0):
+    x = np.linspace(xmin, xmax, nbins)
+    delta = 0.5 * (xmax - xmin) / float(nbins)
+    y = np.linspace(xmin - delta, xmax + delta, np.max([1000, 20 * nbins]))
+    d = pyemma.coordinates.assign_to_centers(data=y, centers=x.reshape((-1, 1)))[0]
+    pi_fine = np.exp(-adw_potential(y, k=k_bias, x0=x_bias) / kT)
+    pi_fine /= pi_fine.sum()
+    pi = np.zeros(shape=(nbins,), dtype=np.float64)
+    for i in range(nbins):
+        idx = np.where(d == i)[0]
+        pi[i] = pi_fine[idx].sum()
+    f = -np.log(pi)
+    return x, f, pi
+
+def epot_gauss_2d(r, r0, sigma, pf):
+    return pf * np.exp(-0.5 * (((r - r0) / sigma)**2).sum())
+
+def epot_harmonic_2d(r, r0, k):
+    return 0.5 * (k * (r - r0)**2).sum()
+
+class Hamiltonian(object):
+    def __init__(self):
+        self.pf = np.array([-8.0, -4.8, -8.0, -4.0], dtype=np.float64)
+        self.r0 = np.array([[15.0, 15.0], [9.0, 9.0], [9.0, 21.0], [21.0, 13.0]], dtype=np.float64)
+        self.sigma = np.array([10.0, 2.5, 2.5, 2.5], dtype=np.float64)
+    def potential_energy(self, r, r0=None, k=None):
+        epot = np.sum([epot_gauss_2d(r, self.r0[i], self.sigma[i], self.pf[i]) for i in range(4)])
+        if r0 is not None and k is not None:
+            epot += epot_harmonic_2d(r, r0, k)
+        return epot
+
+def tp_make_centers(n):
+    d = np.linspace(5, 25, n, endpoint=False)
+    d += 0.5 * (d[1] - d[0])
+    x, y = np.meshgrid(d, d)
+    return np.hstack((x.reshape((-1, 1)), y.reshape((-1, 1)))), d.reshape((-1, 1)).copy()
+
+def tp_reference_2d(nbins_per_axis, RT=1.0, k_bias=None, x_bias=None):
+    hamiltonian = Hamiltonian()
+    if x_bias is not None:
+        r0 = np.array([x_bias, 0])
+    else:
+        r0 = None
+    r, d = tp_make_centers(nbins_per_axis)
+    u = np.array([hamiltonian.potential_energy(
+        r[i, :], r0=r0, k=k_bias) for i in range(r.shape[0])]) / RT
+    pi = np.exp(-u).reshape((nbins_per_axis, nbins_per_axis))
+    pi /= pi.sum()
+    f = -np.log(pi)
+    return r[:, 0].reshape((nbins_per_axis, nbins_per_axis)), \
+        r[:, 1].reshape((nbins_per_axis, nbins_per_axis)), f, pi
+
+def tp_match_reference_to_binning(centers):
+    hamiltonian = Hamiltonian()
+    n = 5 * centers.shape[0]
+    centers2d, centers1d = tp_make_centers(n)
+    srt = np.argsort(centers, axis=0)
+    u2d = np.array(
+        [hamiltonian.potential_energy(centers2d[i, :]) for i in range(centers2d.shape[0])])
+    pi2d = np.exp(-u2d)
+    pi2d /= pi2d.sum()
+    pi1d = pi2d.reshape((n, n)).sum(axis=0)
+    dtraj = pyemma.coordinates.assign_to_centers(data=centers1d, centers=centers)[0]
+    pi_coarse = np.array([np.sum(pi1d[(dtraj == i)]) for i in range(centers.shape[0])])
+    return centers[srt, 0], pi_coarse[srt], -np.log(pi_coarse[srt])
+
+def a2_bias_potential(theta_traj, spring_constants, umbrella_centers):
+    return spring_constants[np.newaxis, :] * (
+        1.0 + np.cos(theta_traj[:, np.newaxis] - umbrella_centers[np.newaxis, :] - np.pi))
diff --git a/vamp_feature_selection.ipynb b/vamp_feature_selection.ipynb
new file mode 100644
index 0000000..917a16d
--- /dev/null
+++ b/vamp_feature_selection.ipynb
@@ -0,0 +1,464 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pyemma\n",
+    "import numpy as np\n",
+    "import mdshare\n",
+    "import itertools\n",
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Feature selection with VAMP\n",
+    "\n",
+    "One mission crucial step in building kinetic models like MSMs is to choose a set of molecular features to describe the system of interest. \n",
+    "\n",
+    "The variational approach to Markov processes (VAMP) provides a systematic way to select the set of molecular features. VAMP provides a set of scoring functions (the so-called VAMP scores) that allow to rank a set of features. Figuatively speaking, the VAMP scores measure the degree of \"slowness\" that can be captured with a given set of features. \n",
+    "\n",
+    "Here, we want demontrate the application of VAMP scores using the example of the pepentapeptide simulation data.\n",
+    "We will compare dihedrals angles, minimal residues distance and heavy atom contacts."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We start by downloading the pentapeptide data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# download the peptapeptide data\n",
+    "topfile = mdshare.load('pentapeptide-nowater.pdb')\n",
+    "traj_list = [mdshare.load('pentapeptide-%02d-500ns.xtc' % i) for i in range(25)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Dihedrals"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, we test dihedral angles. We start by computing the sine and cosine of all diheral angles at saving them as a list of numpy arrays."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feat = pyemma.coordinates.featurizer(topfile)\n",
+    "feat.add_backbone_torsions(cossin=True)\n",
+    "feat.add_sidechain_torsions(which=['chi1'])\n",
+    "data_dih = pyemma.coordinates.load(traj_list, features=feat)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we compute a so-called Koopman model.\n",
+    "\n",
+    "#### Scoring the features works by scoring a dynamic model\n",
+    "\n",
+    "\n",
+    "The Koopman operator $\\mathcal{K}$ is an integral operator\n",
+    "that describes conditional future expectation values. Let\n",
+    "$p(\\mathbf{x},\\,\\mathbf{y})$ be the conditional probability\n",
+    "density of visiting an infinitesimal phase space volume around\n",
+    "point $\\mathbf{y}$ at time $t+\\tau$ given that the phase\n",
+    "space point $\\mathbf{x}$ was visited at the earlier time\n",
+    "$t$ Then the action of the Koopman operator on a function\n",
+    "$f$ can be written as follows:\n",
+    "\n",
+    "$$ \\mathcal{K}f=\\int p(\\mathbf{x},\\,\\mathbf{y})f(\\mathbf{y})\\,\\mathrm{dy}=\\mathbb{E}\\left[f(\\mathbf{x}_{t+\\tau}\\mid\\mathbf{x}_{t}=\\mathbf{x})\\right] $$\n",
+    "\n",
+    "If we approximate $f$ by a linear superposition of ansatz\n",
+    "functions $\\boldsymbol{\\chi}$ of the conformational\n",
+    "degrees of freedom (features), the operator $\\mathcal{K}$\n",
+    "can be approximated by a (finite-dimensional) matrix $\\mathbf{K}$.\n",
+    "\n",
+    "The approximation is computed as follows: From the time-dependent\n",
+    "input features $\\boldsymbol{\\chi}(t)$, we compute the mean\n",
+    "$\\boldsymbol{\\mu}_{0}$ ($\\boldsymbol{\\mu}_{1}$) from\n",
+    "all data excluding the last (first) $\\tau$ steps of every\n",
+    "trajectory as follows:\n",
+    "\n",
+    "$$\n",
+    "  \\boldsymbol{\\mu}_{0}\t:=\\frac{1}{T-\\tau}\\sum_{t=0}^{T-\\tau}\\boldsymbol{\\chi}(t) \\\\\n",
+    "  \\boldsymbol{\\mu}_{1}\t:=\\frac{1}{T-\\tau}\\sum_{t=\\tau}^{T}\\boldsymbol{\\chi}(t)\n",
+    "$$\n",
+    "\n",
+    "Next, we compute the instantaneous covariance matrices\n",
+    "$\\mathbf{C}_{00}$ and $\\mathbf{C}_{11}$ and the\n",
+    "time-lagged covariance matrix $\\mathbf{C}_{01}$ as follows:\n",
+    "\n",
+    "$$\n",
+    "\\mathbf{C}_{00}\t:=\\frac{1}{T-\\tau}\\sum_{t=0}^{T-\\tau}\\left[\\boldsymbol{\\chi}(t)-\\boldsymbol{\\mu}_{0}\\right]\\left[\\boldsymbol{\\chi}(t)-\\boldsymbol{\\mu}_{0}\\right] \\\\\n",
+    "\\mathbf{C}_{11}\t:=\\frac{1}{T-\\tau}\\sum_{t=\\tau}^{T}\\left[\\boldsymbol{\\chi}(t)-\\boldsymbol{\\mu}_{1}\\right]\\left[\\boldsymbol{\\chi}(t)-\\boldsymbol{\\mu}_{1}\\right] \\\\\n",
+    "\\mathbf{C}_{01}\t:=\\frac{1}{T-\\tau}\\sum_{t=0}^{T-\\tau}\\left[\\boldsymbol{\\chi}(t)-\\boldsymbol{\\mu}_{0}\\right]\\left[\\boldsymbol{\\chi}(t+\\tau)-\\boldsymbol{\\mu}_{1}\\right]\n",
+    "$$\n",
+    "\n",
+    "The Koopman matrix is then computed as follows:\n",
+    "\n",
+    "$$ \\mathbf{K}=\\mathbf{C}_{00}^{-1}\\mathbf{C}_{01} $$\n",
+    "\n",
+    "We now estimate a Koopman model using the dihedral angles \n",
+    "as input features."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vamp_dih = pyemma.coordinates.vamp(data_dih, lag=20, dim=4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It can be shown that the leading singular functions of the\n",
+    "half-weighted Koopman matrix\n",
+    "$$ \\bar{\\mathbf{K}}:=\\mathbf{C}_{00}^{-\\frac{1}{2}}\\mathbf{C}_{01}\\mathbf{C}_{11}^{-\\frac{1}{2}} $$\n",
+    "encode the best reduced dynamical model for the time series.\n",
+    "The corresponding singular values $\\boldsymbol{\\sigma}$ measure the \n",
+    "amont \"slowness\" caputured in the reduced dynamical model.\n",
+    "\n",
+    "The so-called VAMP1-score of the model $\\mathbf{K}$ is just the sum of the first `dim` singular values.\n",
+    "Above, in the call to `pyemma.coordinates.vamp` we have fixed the dimension of the Koopman model to 4. Therefore the VAMP-1 score will be the sum of the largest 4 singular values + 1. The + 1 term comes from the constant singular value $\\sigma_0=1$ that every Koopman model posesses and which is accompanied by a constant singular functions. We do not include the constant singular function in the output (`vamp_dih.get_output()`) and it is never counted in `dim`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vamp_dih.score(score_method='VAMP1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternative choices are the VAMP-2 score and the VAMP-E score. For details, please refer to the docstring of `score`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Computing a generalizable score with the help of cross-validation\n",
+    "\n",
+    "Like all models, the Koopman model can be subject to overfitting. Overfitting means that model parameters learned do not\n",
+    "generalize to an independent data set.\n",
+    "\n",
+    "To assess how the results of a statistical analysis will generalize to an independent data set, we perform cross-validation\n",
+    "\n",
+    "One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to estimate a final predictive model.\n",
+    "\n",
+    "Next, we implement leave one out cross-validation. There we run as many rounds of estimation as we have trajectories. The training data in each run consits of all the trajectories except one trajectory, which will be used as the test data.\n",
+    "\n",
+    "#### Cross-validation requires a VAMP-score that depends on training data and test data\n",
+    "\n",
+    "The VAMP score in this case cannot be simply the sum of singular values of the model.\n",
+    "Here, *both* the training data *and* the test data need to be taken into account. \n",
+    "A sensible way of defined a VAMP scores that depends both data sets was propose by Wu et al.\n",
+    "and it is given by the equation\n",
+    "$$ \\text{VAMP-}r\\,\\text{score} = \\|(\\mathbf{U}^{\\text{train},T}\\mathbf{C}_{00}^{\\text{test}}\\mathbf{U}^{\\text{train}})^{-\\frac{1}{2}}(\\mathbf{U}^{\\text{train},T}\\mathbf{C}_{01}^{\\text{test}}\\mathbf{V}^{\\text{train}})(\\mathbf{V}^{\\text{train},T}\\mathbf{C}_{11}^{\\text{test}}\\mathbf{V}^{\\text{train}})^{-\\frac{1}{2}}\\|_{r}^{r} $$\n",
+    "\n",
+    "where $\\mathbf{C}_{ij}$ are the covariance matrices as defined above and $\\mathbf{U}$ and $\\mathbf{V}$ are transformed singular vectors the half-weighted Koopman matrix computed from the training data.\n",
+    "$$ \\bar{\\mathbf{K}}^{\\text{train}}=\\mathbf{U}^{\\prime \\text{train}}\\mathbf{S}\\mathbf{V}^{\\prime \\text{train}} $$\n",
+    "$$ \\mathbf{U}^{\\text{train}} = \\left( C_{00}^{\\text{train}} \\right)^{-\\frac{1}{2}} \\mathbf{U}^{\\prime \\text{train}} $$\n",
+    "$$ \\mathbf{V}^{\\text{train}} = \\left( C_{11}^{\\text{train}} \\right)^{-\\frac{1}{2}} \\mathbf{V}^{\\prime \\text{train}} $$\n",
+    "\n",
+    "In Pyemma this equation is implemented under the function `v.score(test_data)` where  `v` is a `VAMP` estiamtor that was trained with the training data and `test_data` is the test data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def leave_out_one_cv_score(trajs):\n",
+    "    scores = []\n",
+    "    for i in range(len(trajs)):\n",
+    "        # split the data into a training set and a test set\n",
+    "        traning_data = [ t for j,t in enumerate(trajs) if j!=i ]\n",
+    "        test_data = [trajs[i]]\n",
+    "        # train a Koopman model (VAMP) with the training data\n",
+    "        vamp = pyemma.coordinates.vamp(traning_data, lag=20, dim=4)\n",
+    "        # test the model that we just estimated with the test data\n",
+    "        scores.append(vamp.score(test_data, score_method='VAMP1'))\n",
+    "    return np.mean(scores), np.std(scores)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "score_dih = leave_out_one_cv_score(data_dih)\n",
+    "print(score_dih[0], '+-', score_dih[1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get an rough insight into the quality of the dynamic model, we plot a 2-D histogram where we project all the simulation data into the singualar fucntions of the half-weigted Koopman matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def show_ic_fel(vamp):\n",
+    "    ics_trajs = vamp.get_output()\n",
+    "    ics = np.concatenate(ics_trajs)\n",
+    "    pyemma.plots.plot_free_energy(ics[:, 0], ics[:, 1])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "show_ic_fel(vamp_dih)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We see that the projection show well separated density blobs which is indicative of a metastable system that spends much time in a single blob and rarely transtions to a different density blob."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Residue mindists"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We repeat all steps for another feature set consititing of all minimal distances between all residues."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feat = pyemma.coordinates.featurizer(topfile)\n",
+    "feat.add_residue_mindist(list(itertools.combinations(range(5), 2)), periodic=False)\n",
+    "data_rmindist = pyemma.coordinates.load(traj_list, features=feat)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vamp_rmindist = pyemma.coordinates.vamp(data_rmindist, lag=20,  dim=4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vamp_rmindist.score(score_method='VAMP1')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "score_rmindist = leave_out_one_cv_score(data_rmindist)\n",
+    "print(score_rmindist[0], '+-', score_rmindist[1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Compared with the Koopman model estimated from dihedral angles, the Koopman model estimated from minimal residue distances has a lower VAMP score.\n",
+    "\n",
+    "Looking at the projection of the simulation data onto the singular functions of the half-weighted Koopman matrix that was estimated from minimal residue distances is in agreement with the low score. The projection look more fuzzy at shows less clearly separated density blobs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "show_ic_fel(vamp_rmindist)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "To conclude, we present the scores graphically. This shows that dihedral angles yield a significantly larger score then the minimal residue distances. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots()\n",
+    "rects1 = ax.bar([1, 2], [score_dih[0], score_rmindist[0]], yerr=[score_dih[1], score_rmindist[1]])\n",
+    "ax.set_ylabel('VAMP score')\n",
+    "ax.set_xlabel('feature set')\n",
+    "ax.set_xticks((1, 2))\n",
+    "ax.set_xticklabels(('dihedrals','residue dist.'));"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## A. Contacts between heavy atoms (computationally more expensive)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As a third example, we investigate heavy atom constacts as a candidate for a set of good features.\n",
+    "Running this takes a couple of minutes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feat = pyemma.coordinates.featurizer(topfile)\n",
+    "feat.add_contacts(list(itertools.combinations(feat.select_Heavy(), 2)), threshold=0.45)\n",
+    "data_con = pyemma.coordinates.load(traj_list, features=feat)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vamp_con = pyemma.coordinates.vamp(data_con, lag=20,  dim=4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vamp_con.score()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "score_con = leave_out_one_cv_score(data_con)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "show_ic_fel(vamp_con)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots()\n",
+    "rects1 = ax.bar([1, 2, 3], [score_dih[0], score_rmindist[0], score_con[0]], \n",
+    "                yerr=[score_dih[1], score_rmindist[1], score_con[1]])\n",
+    "ax.set_ylabel('VAMP score')\n",
+    "ax.set_xlabel('feature set')\n",
+    "ax.set_xticks((1, 2, 3))\n",
+    "ax.set_xticklabels(('dihedrals', 'residue dist.', 'contacts'));"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}