Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance dip when using ForwardDiff compared to Turing #81

Open
burtonjosh opened this issue Jan 21, 2023 · 3 comments
Open

Performance dip when using ForwardDiff compared to Turing #81

burtonjosh opened this issue Jan 21, 2023 · 3 comments

Comments

@burtonjosh
Copy link
Contributor

burtonjosh commented Jan 21, 2023

I've noticed that there's a performance dip when using ForwardDiff with a model defined in TuringGLM, compared to defining the model directly in Turing. I've set up a MWE to show this.

First I set up 4 models, two in TuringGLM (with and without custom priors), and two in Turing, with the default and custom priors given to the TuringGLM models.

using Turing, TuringGLM, TuringBenchmarking, BenchmarkTools
using ReverseDiff: ReverseDiff
using CSV, DataFrames, LinearAlgebra

hibbs_df = CSV.read(
    download("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat"),
    DataFrame
);

# TuringGLM model
f = @formula(vote ~ growth)
m_glm = turing_model(f, hibbs_df)

# TuringGLM model with custom priors
priors = CustomPrior(Normal(0, 10), Normal(52, 14), nothing)
m_glm_custom = turing_model(f, hibbs_df; priors=priors)

# extract data for Turing models
y = TuringGLM.data_response(f, hibbs_df)
X = TuringGLM.data_fixed_effects(f, hibbs_df)

# model with default priors
@model function regression_default(X, y; residual=std(y))
    α ~ 50.755 + TDist(3.0)*6.071256084780443
    β ~ filldist(TDist(3.0), size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

m_turing = regression_default(X, y; residual=std(y))

# model with custom priors
@model function regression_custom(X, y; residual=std(y))
    α ~ Normal(52, 14)
    β ~ filldist(Normal(0, 10), size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

m_turing_custom = regression_custom(X, y; residual=std(y))

Then using TuringBenchmarking.jl, I benchmark each of the four models with both Forward and Reverse diff backends:

The results of the benchmark are shown in the table below. You can see that for Reversediff the benchmarks are the same, but with ForwardDiff TuringGLM is ~20-30% slower than Turing (I've included the full results below).

Model ForwardDiff, linked (time, μs) ReverseDiff, linked (time, μs) ForwardDiff, not linked (time, μs) ReverseDiff, not linked (time, μs)
TuringGLM (default prior) 3.967 2.772 3.976 1.990
Turing (default prior) 3.046 2.676 3.059 1.931
TuringGLM (custom prior) 4.013 2.102 3.905 1.868
Turing (custom prior 2.776 1.986 2.827 1.829
Click here for in detail output

TuringGLM model 1 (default priors)

suite_glm = TuringBenchmarking.make_turing_suite(
    m_glm,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_glm)

Output:

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.882 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.772 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.967 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.836 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.990 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.976 μs)

Turing model 1 (default priors)

suite_turing = TuringBenchmarking.make_turing_suite(
    m_turing,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing)

Output:

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.256 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.676 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.046 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.207 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.931 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.059 μs)

TuringGLM model 2 (custom priors)

suite_glm_custom = TuringBenchmarking.make_turing_suite(
    m_glm_custom,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_glm_custom)

Output:

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.724 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.102 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.013 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.737 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.868 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.905 μs)

Turing model 2 (custom priors)

suite_turing_custom = TuringBenchmarking.make_turing_suite(
    m_turing_custom,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing_custom)

Output:

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.176 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.986 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.776 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.160 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.829 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.827 μs)

@storopoli
Copy link
Member

This is really strange. Any hints why the degraded performance?
TuringGLM only creates the model and the data to you.
Everything else is delegated to Turing itself.

@burtonjosh
Copy link
Contributor Author

The only difference that I could think of was that TuringGLM uses the CustomPrior struct, so I tried to emulate this by defining my own and using that in a Turing model:

abstract type TuringPrior end

struct CustomTuringPrior <: TuringPrior
    predictors
    intercept
    auxiliary
end

@model function regression_custom_prior(X, y, priors; residual=std(y))
    α ~ priors.intercept
    β ~ filldist(priors.predictors, size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

turing_prior = CustomTuringPrior(Normal(0, 10), Normal(52, 14), nothing)

m_turing_prior = regression_custom_prior(X, y, turing_prior; residual=std(y))

suite_turing_prior = TuringBenchmarking.make_turing_suite(
    m_turing_prior,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing_prior)

The results from this are

Model ForwardDiff, linked (time, μs) ReverseDiff, linked (time, μs) ForwardDiff, not linked (time, μs) ReverseDiff, not linked (time, μs)
Turing model 3 (custom prior struct) 4.203 1.942 4.173 1.710

which shows the same slowdown as the TuringGLM model benchmarks. So it looks like it's to do with this, but I don't know how.

Click here for in detail output

Turing model 3 (custom prior struct)

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.953 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.942 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.203 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.975 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.710 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.173 μs)

@storopoli
Copy link
Member

Yeah that might a little bit of overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants