-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't preallocate GradientConfig in ForwardDiff backend by default #8
Conversation
It's used, e.g., in Turing: https://github.com/TuringLang/Turing.jl/blob/99a1d23333d852fdb1485457444181a94c2a52a0/src/essential/ad.jl#L107 Being able to provide a |
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
@devmotion: thanks for the thorough review, and correcting all my scatterbrained mistakes (I am at home with a sick child and I am less than fully focused on work). I reverted the last suggestion related to tags, since simply exposing the Please review and let me know if this would work with the way Turing.jl is using the package. @sethaxen, since you reported #6 please comment about whether this API is convenient for you; preallocation is still possible but not the default, and |
Hmm I have to think a bit about whether this would work for Turing. At first glance, to me it seems that passing a fixed gradient config would mean that it would be easy to run into these threading issues with Turing and I'm not sure if we can prevent this from happening at all - users just declare/use |
To me it still seems that this would not work well (i.e., be unsafe by default) when you want to use custom tags, as e.g. in Turing. As the
The logic would be:
The main difference to the PR would be that there's no default gradient config assembled from other options. If a user wants to cache the |
@devmotion: I am going to fix this along the lines you suggested, but made JuliaDiff/ForwardDiff.jl#63 for copying |
@devmotion: I added an option for a custom tag (and realized that we can just function LogDensityProblemsAD.ADgradient(ad::ForwardDiffAD, ℓ::Turing.LogDensityFunction)
θ = DynamicPPL.getparams(ℓ)
f = Base.Fix1(LogDensityProblems.logdensity, ℓ)
# Define configuration for ForwardDiff.
tag = if standardtag(ad)
ForwardDiff.Tag(Turing.TuringTag(), eltype(θ))
else
ForwardDiff.Tag(f, eltype(θ))
end
chunk_size = getchunksize(ad)
return LogDensityProblemsAD.ADgradient(Val(:ForwardDiff), ℓ; tag = tag,
gradientconfig = config, chunk = chunk_size)
end Please let me know if you are OK with this. The advantage is not exposing the whole mechanism for creating a closure etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me apart from some minor comments! 👍
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Fixes #6.
Changes:
copy
to make copies of AD'd densities. This is only necessary for ForwardDiff, but allows a consistent AD-agnostic interface for the caller in threaded code.