-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MBAR gives a statistical error magnitudes larger than the bootstrap error. #519
Comments
Have you had a chance to check if you see the same behavior with pymbar 3? |
So this is a fascinating case. I am not sure that it's necessarily wrong. Interestingly, the first state has nearly, but not exactly 0 overlap with the other 11 states (see mbar.compute_overlap()['matrix']. So the analytical estimate is definitely off for that state. However, bootstrapping has very little variation. One weird thing is the u_nk matrix. After sample 11, only state 0 has any energy - the rest are zeros. This seems odd, and likely unintended? Also in sample 0, both state 0 and 11 have zero energy, which seem weird, since state 0 has large energy for the rest of the run. |
@mrshirts Thanks for taking a look.
Which row or column do you mean? |
The first paragraph above is completely correct. I.e. the first state has very little overlap with the other states, so the analytical estimate is going to be off. But, suprisingly, that doesn't rule out the bootstrap being correct in some situations. The interesting thing is that the covariance matrixes from the analytical case and the bootstrap are very close to each other (within ~10%, about what you would expect between variance methods), except for the variances from state 0 to the other states (which are very different). Note also that the bootstrap estimate of variance to state 0 from the other states is much larger than the estimate of variance between the other 11 states, but 1 order of magnitude different instead of 3 orders of magnitude different. The 2nd paragraph was wrong where I was trying to test a hypothesis from the first observation (basically, trying to change the reference state), but was failing to construct the correct u_kn matrix, so that stuff is wrong. Apologies for that. The hypothesis was "is the bootstrap variation low because I used the poorly overlapping state as the reference state, and would the variances be larger if I relabeled the states without changing the physics. When I successfully switched the 0 and 11 state labels, then I found that the uncertainties were essentially preserved (huge analytical difference from state 11, uncertainty to state 11 by bootstrap like 0.3), meaning that the low variance from bootstrap is NOT an artifact of the state 0. SO, conclusion - I think this is a case where bootstrap is right (or much closer to right) and analytical is wrong, but if you want to get better results and also uncertainties agree, it is better to run simulations where there is good overlap between all states. |
) * problem: In the backward and forward convergence, for the initial set of points, which uses for example the first 10% of the date, it could be the cases where due to the fact that there are not many data points, so the overlap is pretty bad, which gives terrible statistical error. solution: If the statistical error is too bad, use the bootstrap error instead, see choderalab/pymbar#519 * new kwarg error_tol in convergence.forward_backward_convergence() to allow the user to specify a error tolerance; if error > error_tol then switch to using bootstrap error * Update CHANGES * add test
The statistical error is 40.48 while the bootstrap error is 0.35
The code to reproduce
u_nk.npy.zip
The text was updated successfully, but these errors were encountered: