-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check any discrepancies/erorrs in AdLaLa #6
Comments
Might be good to bring this into |
Thanks! Very interesting, I obviously do not know this work. I think it would be great if we generally do PRs for new features to get line-by-line reviews, that also helps focus discussions inside the PR. |
Ok, I have the following comments to make ahead of even reading this paper: Then one can compare inference methods etc. . For instance: I am still confused about the noise model needing an epsilon, I have never seen that needed before. It points to a potential bug somewhere or indeed phenomena during training we ignore. Can we find a training dataset with published results using optimization we try to reproduce exactly? |
I'm not sure if the primary goal here is "approximate sampling of a posterior distribution, possibly faster than HMC" or if the goal is "find a more reliable way to minimize the training loss," but that changes what experiments make sense to do. I'm not sure about the other tasks within this project, but the observation so far on the Slack was that Adam may be poorly suited to energy regression tasks that contain a "molecular-mechanics model as a layer", since Adam fails to make the training loss small even with very small step sizes on very small datasets (even when the graph-net is removed from the model so that each molecular mechanics parameter is adjustable independently). To broadly check whether this observation was optimizer-specific vs. due to some model implementation / self-consistency issue, @yuanqing-wang kept the model and task the same but replaced Adam with L-BFGS. L-BFGS solved the task in a few steps. I think the take-away was that we may need to look at other optimizers than Adam when the output of the graph-net is being fed into a molecular mechanics model. We're broadly curious about applying molecular-dynamics-flavored sampling and optimization methods. We're specifically interested in the recently proposed AdLaLa method, but don't have much numerical experience with this on any graph-net problems yet, with or without a molecular-mechanics model as a "layer." One suggestion in the paper is that AdLaLa may be more robust than Adam in some circumstances, even when assessed as an optimizer. I think @yuanqing-wang and @jchodera are interested in checking whether graph-nets and/or MM layers provide such circumstances. To assess this method's utility as a drop-in replacement for standard optimizers, an appropriate comparison is to Adam, rather than to HMC. To assess this method's utility as an approximate sampler, an appropriate comparison is to HMC. @yuanqing-wang : Regarding checking that this implementation is an accurate clone of the paper implementation, I think it would be helpful if the class included
It is helpful that there are already some line comments notating the intent of some code blocks, which helped with self-consistency review in the colab notebook last week. What I and other possible reviewers would ideally want to do is a side-by-side comparison of authors' definitions and the current port, and anything that helps make such a side-by-side comparison easier will help build confidence in the implementation. Aside from comparing implementation details with documented intent, another way to build confidence in a complicated implementation like this is to confirm that it has similar behavior to a reliable reference implementation, possibly by checking things like: whether the toy-task loss trajectories look similar to the ones in e.g. Fig 9, whether weight histograms after running on the toy task look similar to those in Figs 6-7, and--most expensively--whether toy-task performance as a function of a couple hyperparameters looks broadly similar to Fig 15. |
Sorry I already committed the code to master. Created another two branches for code review. I'll follow the guidelines closely from now. |
I re-implemented AdLaLa integrator by Tiffany and Ben (https://github.com/TiffanyVlaar/ThermodynamicParameterizationOfNNs/blob/master/AdLaLa.ipynb) here:
https://github.com/choderalab/pinot/blob/master/pinot/inference/adlala.py
The text was updated successfully, but these errors were encountered: