-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add LKJCholesky #1629
Comments
There is already a related open issue in Bijectors: TuringLang/Bijectors.jl#134 (comment) |
I think this is somewhat more involved than the suggestion in TuringLang/Bijectors.jl#134 (comment). I do not think it can be resolved only by changing/adding a bijector. It seems we are missing 3 things:
This would enable a user to do something like using Turing, PDMats
@model function gdemo(y, N = length(y))
L ~ LKJCholesky(N, 2)
σ ~ filldist(truncated(Normal(), 0, Inf), N)
y ~ MvNormal(PDCholMat(L, σ))
end |
Sure, it requires the LKJCholesky distribution and the bijector, and support for Cholesky factors in whatever distribution you want to use the samples. The quick comment was merely to point out that this has been discussed (also in some of the linked issues IIRC) and would require also a specific bijector. Probably it would also require a more general variable structure ( |
Okay, well in the meantime, I'll try to push things along on the Distributions side so when the DynamicPPL/Bijectors is finished, it's ready to use. |
Yep, I'm working on this as we speak. Also, just to clear the record here, we don't require that the input and output are the same, but yeah strange things can possibly happen if you start breaking this assumption in compositions, batches, etc. |
It might be a good idea to check on MeasureTheory.jl about this. Since it's optimized to work specifically with PPLs, it might have some version of this already in place. |
It's mentioned in the OP, together with a link to JuliaMath/MeasureTheory.jl#100. |
In JuliaStats/Distributions.jl#1336 we're discussing possibly defining an |
No, it's not natively supported but I would prefer samples of type |
I created JuliaStats/PDMats.jl#132 to discuss modifications to PDMats for easy creation of a |
JuliaStats/Distributions.jl#1339 was just merged |
Oh I missed that this was not the ossue in Distributions. I'll reopen 😄 |
LKJCholesky is available in Distributions 0.25.20. However, as discussed above, support for it in Turing requires (probably major) changes in Bijectors and DynamicPPL, so it will take a while until it is supported I assume. |
Really nice updates - @devmotion I'm slightly surprised that we need changes from the Turing side to support new features in Distributions. Do you know what changes are needed from DynamicPPL/Bijectors to support this? |
@mgmverburg was able to get something working in https://github.com/mgmverburg/Turing_examples/blob/f639302c5b28ecc30f6b05e90b9f95adf97be027/Utilities/lkjcholesky.jl#L277-L385. Is that too hackish to adapt for this specific distribution? I agree, ideally we have a general solution for distributions whose support are not arrays (or are arrays with different dimensionality than the distribution), but we've gotten by without that so far, and support for this distribution in Turing seems to be in high demand. |
This distribution operates with and samples
I assume @torfjelde's rewrite of Bijectors will address and solve the problems with Bijectors. And I guess a, probably more generally useful, approach in VarInfo would be to perform linearization and mapping to unconstrained spaces only when requested explicitly and needed, and hence decouple it a bit more from Bijectors and the assumption that every sample is an array. |
@sethaxen It is definitely possible to ignore the current assumptions in the Bijectors API and just implement a bijection by reusing the existing code for |
yes, @phipsgabler and I are currently discussing this in the AbstractPPL |
Perhaps for the short term, Turing could include non-exported functions for 1) defining the number of unconstrained variables in an (2) is more or less handled by This way the user who wants speed could manually create the vector of unconstrained variables drawn from a |
@sethaxen Is this now fixed? |
No. Distributions now has
|
@devmotion any updates on this? |
Duplicate of #1870 |
Issues
Hello, this issue is
I have spent many hours tweaking more than 10 variants of the model, reading up many Stan's docs, PyMC3's docs, and posting the issue on Julia's discourse. I believe the main issue that I, and many other people (the issues linked above) face is that the
LKJ
is numerically unstable. As per 24.1 LKJ Correlation Distribution | Stan Functions Reference, with reference toLKJ
distribution (without cholesky factorisation),Me and many people in Turing have been using
LKJ
directly without the Cholesky decomposition, resulting in those issues. I follow some workaround in those issues, such as usingSymmetric
but it doesn't solve the issue for more complex models.We need a cholesky decomposition of
LKJ
, something calledLKJCholesky
or similar names. It's implemented in Stan, PyMC3 and also Soss.jllkj_corr_cholesky
24.2 Cholesky LKJ Correlation Distribution | Stan Functions Referencepymc3.LKJCholeskyCov
LKJ Cholesky Covariance Priors for Multivariate Normal Models — PyMC3 3.10.0 documentationLKJL
(implemented in MeasureTheory.jl) cscherrer/MeasureTheory.jlLKJ Cholesky
For example, Stan's implementation of
lkj_corr_cholesky
As per 24 Correlation Matrix Distributions | Stan Functions Reference
As per 24.1 LKJ Correlation Distribution | Stan Functions Reference, with reference to
LKJ
distribution (without cholesky factorisation),As per 24.2 Cholesky LKJ Correlation Distribution | Stan Functions Reference
So instead of the "raw version"
or a slightly better version, using the codes by Seth Axen, which uses the
quad_form_diag
implemented in StanStan/PyMC3's version of LKJ Cholesky with reparameterisation (LKJ Cholesky can also be used without reparameterisation). For example, Stan's implementation
If Turing implements
LKJCholesky
, it should look something like thisor a Soss.jl implementation, full example coded up by Seth Axen using Soss.jl
Performance
Comparing Stan's implementation and current use of
LKJ
in Turing, which is Parts of my issues posted hereThe Stan model, sampling 4 chains, finished in about 6 seconds per chain (total = Warm-up + sampling). Summarized elapsed time
Stan model
Stan code saved in “nonCentered.stan”
Sampling Stan model
Turing's reference model
To implement it in Turing, I found a similar model here StatisticalRethinkingJulia/TuringModels.jl - non-centered chimpanzees.md.
The model I implemented to replicate Stan's model above
To implement the Stan's model and when doing this, I also refer to @sethaxen 's implementation above and the model of
TuringModel.jl
. Again, I coded up like more than 10 variants, but same issuesI am looking into defining a custom distribution in Turing as per Advanced Usage - How to Define a Customized Distribution, with references to Stan's implementation of
lkj_corr_cholesky
, I found thesestan-dev/math - lkj_corr_cholesky_log.hpp
stan-dev/math - lkj_corr_cholesky_rng.hpp
stan-dev/math - lkj_corr_cholesky_lpdf.hpp
But everything is written in C/Cpp which I have not learned at all. Seems like I am stucked.
An implementation of
LKJCholesky
in Turing would help solve many of the issues. With that implementation, I can help writing a tutorial for Turing at Tutorials. It's one of the commonly used distribution for modeling covariance matrix in mutlivariate Gaussian. It seems like many Turing users are usingLKJ
directly. It's also a good practice to let other Turing and Julia PPL users know thatLKJCholesky
is a much better version, as quoted by Stan, with reference toLKJ
(without cholesky factorisation)The text was updated successfully, but these errors were encountered: