MBAR gives a statistical error magnitudes larger than the bootstrap error. #519

xiki-tempula · 2024-01-05T15:35:04Z

The statistical error is 40.48 while the bootstrap error is 0.35
The code to reproduce

import pymbar
import numpy as np

u_nk = np.load("u_nk.npy")
N_k = [10000] * 12

mbar = pymbar.MBAR(
    u_nk,
    N_k,
    maximum_iterations=10000,
    relative_tolerance=1e-07,
    verbose=True,
    initial_f_k=None,
    solver_protocol="robust",
    n_bootstraps=0,
)
out = mbar.compute_free_energy_differences(return_theta=True, uncertainty_method=None)

print(out["dDelta_f"][0][-1])
mbar = pymbar.MBAR(
    u_nk,
    N_k,
    maximum_iterations=10000,
    relative_tolerance=1e-07,
    verbose=True,
    initial_f_k=None,
    solver_protocol="robust",
    n_bootstraps=50,
)
out = mbar.compute_free_energy_differences(
    return_theta=True, uncertainty_method="bootstrap"
)
print(out["dDelta_f"][0][-1])

u_nk.npy.zip

The text was updated successfully, but these errors were encountered:

mikemhenry · 2024-01-10T17:24:03Z

Have you had a chance to check if you see the same behavior with pymbar 3?

mrshirts · 2024-01-11T04:13:21Z

So this is a fascinating case. I am not sure that it's necessarily wrong. Interestingly, the first state has nearly, but not exactly 0 overlap with the other 11 states (see mbar.compute_overlap()['matrix']. So the analytical estimate is definitely off for that state. However, bootstrapping has very little variation.

One weird thing is the u_nk matrix. After sample 11, only state 0 has any energy - the rest are zeros. This seems odd, and likely unintended? Also in sample 0, both state 0 and 11 have zero energy, which seem weird, since state 0 has large energy for the rest of the run.

xiki-tempula · 2024-01-11T09:08:57Z

@mrshirts Thanks for taking a look.
When you say that

After sample 11, only state 0 has any energy - the rest are zeros.

Which row or column do you mean?

mrshirts · 2024-01-12T00:01:11Z

sorry for being so unclear! 2) 1 gave some numbers from a case where I was manipulating the data to test a hypothesis.

The first paragraph above is completely correct. I.e. the first state has very little overlap with the other states, so the analytical estimate is going to be off. But, suprisingly, that doesn't rule out the bootstrap being correct in some situations. The interesting thing is that the covariance matrixes from the analytical case and the bootstrap are very close to each other (within ~10%, about what you would expect between variance methods), except for the variances from state 0 to the other states (which are very different). Note also that the bootstrap estimate of variance to state 0 from the other states is much larger than the estimate of variance between the other 11 states, but 1 order of magnitude different instead of 3 orders of magnitude different.

The 2nd paragraph was wrong where I was trying to test a hypothesis from the first observation (basically, trying to change the reference state), but was failing to construct the correct u_kn matrix, so that stuff is wrong. Apologies for that.

The hypothesis was "is the bootstrap variation low because I used the poorly overlapping state as the reference state, and would the variances be larger if I relabeled the states without changing the physics. When I successfully switched the 0 and 11 state labels, then I found that the uncertainties were essentially preserved (huge analytical difference from state 11, uncertainty to state 11 by bootstrap like 0.3), meaning that the low variance from bootstrap is NOT an artifact of the state 0.

SO, conclusion - I think this is a case where bootstrap is right (or much closer to right) and analytical is wrong, but if you want to get better results and also uncertainties agree, it is better to run simulations where there is good overlap between all states.

) * problem: In the backward and forward convergence, for the initial set of points, which uses for example the first 10% of the date, it could be the cases where due to the fact that there are not many data points, so the overlap is pretty bad, which gives terrible statistical error. solution: If the statistical error is too bad, use the bootstrap error instead, see choderalab/pymbar#519 * new kwarg error_tol in convergence.forward_backward_convergence() to allow the user to specify a error tolerance; if error > error_tol then switch to using bootstrap error * Update CHANGES * add test

xiki-tempula mentioned this issue May 17, 2024

Improve the error handling in the backward and forward convergence alchemistry/alchemlyb#358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MBAR gives a statistical error magnitudes larger than the bootstrap error. #519

MBAR gives a statistical error magnitudes larger than the bootstrap error. #519

xiki-tempula commented Jan 5, 2024

mikemhenry commented Jan 10, 2024

mrshirts commented Jan 11, 2024

xiki-tempula commented Jan 11, 2024

mrshirts commented Jan 12, 2024 •

edited

Loading

MBAR gives a statistical error magnitudes larger than the bootstrap error. #519

MBAR gives a statistical error magnitudes larger than the bootstrap error. #519

Comments

xiki-tempula commented Jan 5, 2024

mikemhenry commented Jan 10, 2024

mrshirts commented Jan 11, 2024

xiki-tempula commented Jan 11, 2024

mrshirts commented Jan 12, 2024 • edited Loading

mrshirts commented Jan 12, 2024 •

edited

Loading