Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check any discrepancies/erorrs in AdLaLa #6

Open
yuanqing-wang opened this issue Apr 27, 2020 · 6 comments
Open

check any discrepancies/erorrs in AdLaLa #6

yuanqing-wang opened this issue Apr 27, 2020 · 6 comments
Assignees

Comments

@yuanqing-wang
Copy link
Member

I re-implemented AdLaLa integrator by Tiffany and Ben (https://github.com/TiffanyVlaar/ThermodynamicParameterizationOfNNs/blob/master/AdLaLa.ipynb) here:

https://github.com/choderalab/pinot/blob/master/pinot/inference/adlala.py

@jchodera
Copy link
Member

Might be good to bring this into master via a PR to get line-by-line review?

@karalets
Copy link
Collaborator

Thanks! Very interesting, I obviously do not know this work.

I think it would be great if we generally do PRs for new features to get line-by-line reviews, that also helps focus discussions inside the PR.

@jchodera
Copy link
Member

@karalets : Check out the paper from Ben Leimkuhler on using adaptive Langevin in layers.

We've been having trouble training graph nets and have tracked it down to the optimizer, so we're trying out some MCMC-inspired schemes from our collaborators at Edinburgh!

@karalets
Copy link
Collaborator

karalets commented Apr 27, 2020

Ok, I have the following comments to make ahead of even reading this paper:
Evaluating training performance on graph nets should be done on a dataset that there are published results on for specific methods and published code for.

Then one can compare inference methods etc. .
This methoid seems to conflate optimization with Bayesian Inference, and AFAIK we don;t even have an HMC baseline done for the networks which would have been easy to do once the m,odels are semi-solid in their definition.

For instance: I am still confused about the noise model needing an epsilon, I have never seen that needed before. It points to a potential bug somewhere or indeed phenomena during training we ignore.

Can we find a training dataset with published results using optimization we try to reproduce exactly?
Jumping to a fairly complex Bayesian Inference method is interesting, but may not help find errors/bugs that are upstream of inference.

@maxentile
Copy link
Member

maxentile commented Apr 27, 2020

I'm not sure if the primary goal here is "approximate sampling of a posterior distribution, possibly faster than HMC" or if the goal is "find a more reliable way to minimize the training loss," but that changes what experiments make sense to do.

I'm not sure about the other tasks within this project, but the observation so far on the Slack was that Adam may be poorly suited to energy regression tasks that contain a "molecular-mechanics model as a layer", since Adam fails to make the training loss small even with very small step sizes on very small datasets (even when the graph-net is removed from the model so that each molecular mechanics parameter is adjustable independently). To broadly check whether this observation was optimizer-specific vs. due to some model implementation / self-consistency issue, @yuanqing-wang kept the model and task the same but replaced Adam with L-BFGS. L-BFGS solved the task in a few steps. I think the take-away was that we may need to look at other optimizers than Adam when the output of the graph-net is being fed into a molecular mechanics model.

We're broadly curious about applying molecular-dynamics-flavored sampling and optimization methods. We're specifically interested in the recently proposed AdLaLa method, but don't have much numerical experience with this on any graph-net problems yet, with or without a molecular-mechanics model as a "layer." One suggestion in the paper is that AdLaLa may be more robust than Adam in some circumstances, even when assessed as an optimizer. I think @yuanqing-wang and @jchodera are interested in checking whether graph-nets and/or MM layers provide such circumstances. To assess this method's utility as a drop-in replacement for standard optimizers, an appropriate comparison is to Adam, rather than to HMC. To assess this method's utility as an approximate sampler, an appropriate comparison is to HMC.

@yuanqing-wang : Regarding checking that this implementation is an accurate clone of the paper implementation, I think it would be helpful if the class included

  • a docstring describing all the parameters,
  • brief comments on intended similarities and differences relative to reference implementations,
  • more specific references into the paper or source implementation, such as page or equation numbers for each update.

It is helpful that there are already some line comments notating the intent of some code blocks, which helped with self-consistency review in the colab notebook last week. What I and other possible reviewers would ideally want to do is a side-by-side comparison of authors' definitions and the current port, and anything that helps make such a side-by-side comparison easier will help build confidence in the implementation. Aside from comparing implementation details with documented intent, another way to build confidence in a complicated implementation like this is to confirm that it has similar behavior to a reliable reference implementation, possibly by checking things like: whether the toy-task loss trajectories look similar to the ones in e.g. Fig 9, whether weight histograms after running on the toy task look similar to those in Figs 6-7, and--most expensively--whether toy-task performance as a function of a couple hyperparameters looks broadly similar to Fig 15.

@yuanqing-wang
Copy link
Member Author

yuanqing-wang commented Apr 29, 2020

Sorry I already committed the code to master.

Created another two branches for code review. I'll follow the guidelines closely from now.

#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants