Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turing_model part 1: fixed-effects #17

Merged
merged 13 commits into from
Dec 19, 2021
Merged

Conversation

storopoli
Copy link
Member

Ok, this is a big PR. 50% of the package was done in this PR.

Mainly it implements everything for specifying and sampling models using @formula macro that do not have random-effects, i.e. the (1 | group), (x1 | group) or (1 + x1 | group) terms inside the @formula.

Implemented likelihoods:

  1. normal
  2. student-t
  3. bernoulli
  4. poisson
  5. negative binomial

Datasets in data/

I am also adding 3 datasets from stan-dev/rstanarm:

  1. kidiq
  2. wells
  3. roaches

Their license is GPL-3. They are used extensively in tutorials(see storopoli/Bayesian-Julia and also Gelman & Hill (2007), Gelman et al. (2013) (the BDA), Gelman et al. (2020) (RoS).

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

The turing_model docstring deserves extra attention.

Relates to #2.

@yebai feel free to review or ask for others to review.

@storopoli storopoli added the enhancement New feature or request label Dec 11, 2021
@storopoli storopoli added this to the 0.1.0 milestone Dec 11, 2021
@storopoli storopoli self-assigned this Dec 11, 2021
@rikhuijzer
Copy link
Collaborator

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

But the drawbacks from Artifacts are negligible. It could be something like

function dataset(name::AbstractString)
     path = ""
     if name == "roaches"
           path = joinpath(Artifacts"roaches", "roaches.csv")
    elseif ...
    end
    return read_data(path)
end

It's easier than it looks: https://pkgdocs.julialang.org/v1/artifacts/. It clutters the diff less, makes switching to a newer version of the dataset easier, you even have to handle paths slightly less yourself because you can just say joinpath(Artifact"roaches", "roaches.csv") and it makes adding larger datasets in the same way at a later moment possible.

Copy link
Collaborator

@rikhuijzer rikhuijzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes nice stuff 💯

src/TuringGLM.jl Show resolved Hide resolved
src/priors.jl Outdated Show resolved Hide resolved
src/priors.jl Show resolved Hide resolved
Comment on lines 105 to 119
@model function normal_model(
y,
X;
predictors=size(X, 2),
μ_X=μ_X,
σ_X=σ_X,
prior=custom_prior,
residual=1 / std(y),
)
α ~ prior.intercept
β ~ filldist(prior.predictors, predictors)
σ ~ Exponential(residual)
y ~ MvNormal(α .+ X * β, σ^2 * I)
return (; α, β, σ, y)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this closure be moved into a separate function? It's very difficult to read now whether it's a function call or function definition. E.g.

function construct_normal_model(y, X, predictors, ...)
     return @model function normal_model(
           ...
     end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides that, it might also be interesting to be able to access the model definition from outside for other purposes. I don't know how realistic that is, but e.g., to allow people to just look at and work with the interiors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I was thinking in a turing_code function to return or print the Turing underlying Turing code.

I think that we can also have that function to be the main one. turing_model would just call turing_code and eval the string as a macro. Is that possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, ideally that would work in DPPL itself. We have been thinking about adding the original expression to the model struct and use it for printing, which would be pretty minor change but just hasn't been considered to be of high urgency.

However, this will then only work on the instantiated model object, not the result of the @model itself, which after all is just a bunch of method definitions of the model evaluator function.

I wouldn't do eval within the function. If necessary, you could have a top-level dict of names to code and evaluate those on the top level, but I don't really see the advantage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try to work on something during the week. This week is nasty because there are a lot of final committees and also tons of final semester assessments to grade. But it is the final week of the academic semester for me. I will add the turing_code suggestion in the comment down below /

Comment on lines 129 to 144
@model function student_model(
y,
X;
predictors=size(X, 2),
μ_X=μ_X,
σ_X=σ_X,
prior=custom_prior,
residual=1 / std(y),
)
α ~ prior.intercept
β ~ filldist(prior.predictors, predictors)
σ ~ Exponential(residual)
ν ~ prior.auxiliary
y ~ arraydist(LocationScale.(α .+ X * β, σ, TDist.(ν)))
return (; α, β, σ, ν, y)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. With more than 120 lines, this method is too long to be easily readable

test/turing_model.jl Show resolved Hide resolved
test/turing_model.jl Outdated Show resolved Hide resolved
test/turing_model.jl Outdated Show resolved Hide resolved
test/turing_model.jl Outdated Show resolved Hide resolved
test/turing_model.jl Outdated Show resolved Hide resolved
@storopoli
Copy link
Member Author

storopoli commented Dec 12, 2021

Things to do:

  • Break up the huge turing_model into smaller functions. Maybe make family a struct instead of a string? Yes Normal() is the default. It is a Distribution type and Normal.
  • Have turing_model call turing_code and eval it. turing_code should be exposed also.
  • Testing with priors instead of my_prior
  • Testing with f = @formula(...) instead of inside the turing_model
  • Testing Chains stuff with only one chn variable

@storopoli storopoli mentioned this pull request Dec 18, 2021
@storopoli storopoli requested a review from rikhuijzer December 19, 2021 08:45
@storopoli
Copy link
Member Author

Ready to review again. I could not implement easily turing_code function because I need to figure how to parse the Prior structs DefaultPrior CustomPrior to be displayed in the turing_code function. I think we should leave this for the 0.2.0 release or future releases.

The whole turing_model API for non-hierarchical models is done. I've created a custom type for the likelihood called Model. I had to be creative with the naming to avoid conflicts with the Distributions.jl types because we need them in the namespace for users to specify custom priors.

@storopoli storopoli merged commit ed55e3e into main Dec 19, 2021
@storopoli storopoli deleted the turing_model_fixed_effects branch December 19, 2021 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants