-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: API reorganization #30
Comments
I wrote a long reply to this but didn't finish it and it got lost into the ether. So i'll just briefly add that (i) thanks for the hard work, this package is very useful and (ii):
Again, great work! |
Thanks for your comment. Regarding the log density: you can just define logdensity(::Type{LogDensityProblems.ValueGradient}, ::Foo, ::AbstractVector) and use that. Or logdensity(::Type{LogDensityProblems.Value}, ::Foo, ::AbstractVector) and use one of the AD methods from that package. TransformVariables is just a convenience feature that is not required to be used. But perhaps I should make this more clear. I would appreciate if you could elaborate about the callbacks. |
Is there an example that uses that? Would be very helpful. I'm struggling too... |
This comment has been minimized.
This comment has been minimized.
Ha! @zenna, turns out I also need a callback when the sample is kept for the final tally, is this your use case? I'm using MCMC to integrate over an internal variable, so I need to save internal state, but only for the samples that are kept. What's the best way to achieve that now? |
@GAIKA donothing(x...) = nothing
function train(..., cb = donothing)
for i = 1:n
dostuff
cb(loss, otherdata)
end
end This works very well but has two major disavtanges So I've been developing a combination of systems called Lens and Callbacks. https://github.com/zenna/Lens.jl/pulls The high level idea is that you anotate your code with It works fine. I use it in Omega. https://github.com/zenna/Omega.jl It avoids problem (i) but it still suffers from problem (ii). The ultimate right solution is to use Cassette. This will allow you to solve (i) and (ii) So to solve your problem right now: If it were me I would make a fork of DynamicHMC and then either by passing in a callback or using a Lens, capture it. |
@GAIKA: do you need an example of how to code a posterior for which you have the gradient available? Please open an issue at https://github.com/tpapp/DynamicHMCExamples.jl/. @zenna: I don't think callbacks are the right approach. I would do this with an extra payload, which could be a thunk. But I am open to suggestions. It would be great to have a concrete example of a problem to focus the discussion. Please open another issue so that we can discuss it. |
Motivation
DynamicHMC.jl was started in 2017 June, and initially released in 2018 February. Since then, the architecture and the API only underwent minor changes. However, various use cases are stretching the API a bit, and it is time for a redesign and a rewrite of some intervals. I am opening this issue to discuss these — feel free to use it as a wishlist, or to share your recommendations or use cases.
I intend to keep the focus of the package the same, ie as a building block for Bayesian inference using variants of the NUTS sampler. The user is still expected to provide a log density (with gradient).
I briefly discuss the changes I am proposing below.
Low-level implementation changes
These would be mostly invisible to the user.
Non-allocating leapfrog calculations
The most important one is probably reducing allocations, by reusing the vectors for position and momentum. This has a tiny impact (as for nontrivial models, calculating the logdensity is the most costly), but is a low-hanging fruit and has some significance for high-dimensional models. I am undecided about this though, as I would have to trust the log density calculations not to change the position vector.Ideas: perhaps make it optional, and allowSVector
transparently?EDIT: I abandoned this, because keeping the functional design (which is now generalized) makes it much easier for me to use multithreading in 1.3
Mid-level API
Flexible NUTS step implementation
The NUTS sampling step, currently implemented by
DynamicHMC.NUTS_transition
, currently reports the reason for termination, the new drawn position, and the average acceptance rate. However, obtaining the whole trajectory with probabilities could be useful, eg for debugging issues like SciML/DiffEqBayes.jl#60 and also for pedagogical purposes (eg visualizing HMC trajectories).The interface should allow users to experiment with different step sizes (also jitter), momentum, kinetic energy, and
max_depth
specifications, and debug these. Eg if the user learns that most steps terminate because of divergence, he should be able toAllow jittered stepsize
ϵ
A core stepsize should be adapted, while at the same time using a random jitter factor to adjust.
Interface for iterative application
Sometimes a more granular interface would be useful for tuning and adaptation. In #28 we arrived at the interface
for performing NUTS steps, with the idea that
state
could be something that is tuned (eg stepsizeϵ
) instage
. Currently the API only exposes doing this for a pre-determined number of steps (see below).High-level API
Logging
A new de-facto standard seems to be emerging for progress meters via
@logmsg
, eg see Atom.jl and timholy/ProgressMeter.jl#102 . Progress reports should be using this. Cf #10.Interface for initialization and adaptation
I envision each adaptation step as a transformation from previous parameters of the algorithm (stepsize, kinetic energy) to new ones, using random realizations of MCMC draws, ie
The user could be interested in
the whole history of adaptation (currently possible by invoking steps manually),
the posterior and adapted sampler (what is now returned by
NUTS_init_tune_mcmc
),just the posterior.
Targetting (2) as the default interface may have been a mistake, as mostly I am interested in (3) when things go well, and (1) when they don't (cf #24, #9). Also, when samplers have to be parametrized manually, it would be useful to experiment with various initialization and adaptation strategies, eg
picking the initial position by a crude or sophisticated maximization algorithm (addressing discuss starting from the mode #8, optimize before adapting stepsize #25),
less or more aggressive adaptation of stepsize.
The proposed interface is the following: the user provides
a chain of adaptation steps, eg as a
Tuple
,a parameter that specifies how much history should be kept.
Each is applied to the previous state (initialized using
nothing
), with the target log density and the current parameters as given, and returns a new set of parameters and an adaptation history (when required). The high-level interface can then pick what to keep and return.The text was updated successfully, but these errors were encountered: