Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't preallocate GradientConfig in ForwardDiff backend by default #8

Merged
merged 21 commits into from
Mar 8, 2023

Conversation

tpapp
Copy link
Owner

@tpapp tpapp commented Feb 13, 2023

Fixes #6.

Changes:

  • for the ForwardDiff backend, the default is now thread-safe. To preallocate a buffer, provide a type (this is a breaking API change, but I don't think anyone was using this feature anyway).
  • suggest copy to make copies of AD'd densities. This is only necessary for ForwardDiff, but allows a consistent AD-agnostic interface for the caller in threaded code.
  • documentation changes
  • various incidental changes

@devmotion
Copy link
Collaborator

this is a breaking API change, but I don't think anyone was using this feature anyway.

It's used, e.g., in Turing: https://github.com/TuringLang/Turing.jl/blob/99a1d23333d852fdb1485457444181a94c2a52a0/src/essential/ad.jl#L107 Being able to provide a GradientConfig is (currently) crucial for using custom tags.

ext/ForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/ForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/ForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/ForwardDiffExt.jl Outdated Show resolved Hide resolved
@tpapp
Copy link
Owner Author

tpapp commented Feb 13, 2023

@devmotion: thanks for the thorough review, and correcting all my scatterbrained mistakes (I am at home with a sick child and I am less than fully focused on work).

I reverted the last suggestion related to tags, since simply exposing the GradientConfig interface is much simpler and non-breaking. Supporting a combination of tags, types (needed for tags), and whether to pre-allocate the GradientConfig would complicate the API a lot.

Please review and let me know if this would work with the way Turing.jl is using the package.

@sethaxen, since you reported #6 please comment about whether this API is convenient for you; preallocation is still possible but not the default, and copy is needed for each thread.

@tpapp tpapp requested a review from devmotion February 13, 2023 11:24
@sethaxen
Copy link

@sethaxen, since you reported #6 please comment about whether this API is convenient for you; preallocation is still possible but not the default, and copy is needed for each thread.

Yes, this will work for me. Thanks!

@devmotion
Copy link
Collaborator

Hmm I have to think a bit about whether this would work for Turing. At first glance, to me it seems that passing a fixed gradient config would mean that it would be easy to run into these threading issues with Turing and I'm not sure if we can prevent this from happening at all - users just declare/use @threads in their model but we don't have much control over it in Turing. There are some things in DynamicPPL to make log density accumulation somewhat threadsafe (it's not very nice and certainly contains bugs), maybe we could do something there. But it's at least not completely obvious to me, I have to check that more carefully.

@devmotion
Copy link
Collaborator

To me it still seems that this would not work well (i.e., be unsafe by default) when you want to use custom tags, as e.g. in Turing.

As the GradientConfig constructor only takes 2 arguments + the function to be differentiated and its input (https://github.com/JuliaDiff/ForwardDiff.jl/blob/4b143a199f541e7c1248dc5bc29b397be938e81d/src/config.jl#L118-L120), I think the easiest solution would be to support storing these two options (tag and chunk size):

  • Remove x::Union{Nothing,AbstractVector}
  • chunk::Union{Integer,ForwardDiff.Chunk} = _default_chunksize(ell): ForwardDiff chunk size (unchanged but always saved as ForwardDiff.Chunk)
  • tag::Union{Nothing,ForwardDiff.Tag} = nothing: ForwardDiff tag (new)
  • gradientconfig::Union{Nothing,ForwardDiff.GradientConfig} = nothing: gradient config (new default of nothing)

The logic would be:

  • If a gradientconfig !== nothing is supplied, the other two options are ignored and it is re-used in every logdensity_and_gradient call
  • Otherwise, in the logdensity_and_gradient call a new GradientConfig is constructed:
    function logdensity_and_gradient(ell:ForwardDiffLogDensity, x::AbstractVector{<:Real})
        @unpack ℓ, chunk, tag, gradient_config = fℓ
        buffer = _diffresults_buffer(x)
        f = Base.Fix1(logdensity, ℓ)
        config = if gradient_config === nothing
            if tag === nothing
                ForwardDiff.GradientConfig(f, x, ℓ, chunk)
            else
                ForwardDiff.GradientConfig(f, x, ℓ, chunk, tag)
            end
        else
            gradient_config
        end
        result = ForwardDiff.gradient!(buffer, f, x, config)
        return _diffresults_extract(result)
    end

The main difference to the PR would be that there's no default gradient config assembled from other options. If a user wants to cache the GradientConfig, one has to specify it explicitly.

@tpapp
Copy link
Owner Author

tpapp commented Mar 7, 2023

@devmotion: I am going to fix this along the lines you suggested, but made JuliaDiff/ForwardDiff.jl#63 for copying GradientConfigs, would be cleaner that way.

@tpapp
Copy link
Owner Author

tpapp commented Mar 8, 2023

@devmotion: I added an option for a custom tag (and realized that we can just copy a GradientConfig, so no PR to ForwardDiff is needed). With the latest PR, Turing.jl could just do something like

function LogDensityProblemsAD.ADgradient(ad::ForwardDiffAD, ℓ::Turing.LogDensityFunction)
    θ = DynamicPPL.getparams(ℓ)
    f = Base.Fix1(LogDensityProblems.logdensity, ℓ)

    # Define configuration for ForwardDiff.
    tag = if standardtag(ad)
        ForwardDiff.Tag(Turing.TuringTag(), eltype(θ))
    else
        ForwardDiff.Tag(f, eltype(θ))
    end
    chunk_size = getchunksize(ad)

    return LogDensityProblemsAD.ADgradient(Val(:ForwardDiff), ℓ; tag = tag,
                                           gradientconfig = config, chunk = chunk_size)
end

Please let me know if you are OK with this. The advantage is not exposing the whole mechanism for creating a closure etc.

Copy link
Collaborator

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me apart from some minor comments! 👍

ext/LogDensityProblemsADForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/LogDensityProblemsADForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/LogDensityProblemsADForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/LogDensityProblemsADForwardDiffExt.jl Outdated Show resolved Hide resolved
ext/LogDensityProblemsADForwardDiffExt.jl Outdated Show resolved Hide resolved
@tpapp tpapp merged commit bc8c543 into master Mar 8, 2023
@tpapp tpapp deleted the tp/fix-threaded-use branch March 8, 2023 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AD with ForwardMode is not thread-safe
3 participants