turing_model part 1: fixed-effects #17

storopoli · 2021-12-11T19:12:03Z

Ok, this is a big PR. 50% of the package was done in this PR.

Mainly it implements everything for specifying and sampling models using @formula macro that do not have random-effects, i.e. the (1 | group), (x1 | group) or (1 + x1 | group) terms inside the @formula.

Implemented likelihoods:

normal
student-t
bernoulli
poisson
negative binomial

Datasets in `data/`

I am also adding 3 datasets from stan-dev/rstanarm:

kidiq
wells
roaches

Their license is GPL-3. They are used extensively in tutorials(see storopoli/Bayesian-Julia and also Gelman & Hill (2007), Gelman et al. (2013) (the BDA), Gelman et al. (2020) (RoS).

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

The turing_model docstring deserves extra attention.

Relates to #2.

@yebai feel free to review or ask for others to review.

rikhuijzer · 2021-12-12T09:04:16Z

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

But the drawbacks from Artifacts are negligible. It could be something like

function dataset(name::AbstractString)
     path = ""
     if name == "roaches"
           path = joinpath(Artifacts"roaches", "roaches.csv")
    elseif ...
    end
    return read_data(path)
end

It's easier than it looks: https://pkgdocs.julialang.org/v1/artifacts/. It clutters the diff less, makes switching to a newer version of the dataset easier, you even have to handle paths slightly less yourself because you can just say joinpath(Artifact"roaches", "roaches.csv") and it makes adding larger datasets in the same way at a later moment possible.

rikhuijzer

Yes nice stuff 💯

src/TuringGLM.jl

src/priors.jl

rikhuijzer · 2021-12-12T09:12:56Z

src/turing_model.jl

+            @model function normal_model(
+                y,
+                X;
+                predictors=size(X, 2),
+                μ_X=μ_X,
+                σ_X=σ_X,
+                prior=custom_prior,
+                residual=1 / std(y),
+            )
+                α ~ prior.intercept
+                β ~ filldist(prior.predictors, predictors)
+                σ ~ Exponential(residual)
+                y ~ MvNormal(α .+ X * β, σ^2 * I)
+                return (; α, β, σ, y)
+            end


Can this closure be moved into a separate function? It's very difficult to read now whether it's a function call or function definition. E.g.

function construct_normal_model(y, X, predictors, ...) return @model function normal_model( ... end end

Besides that, it might also be interesting to be able to access the model definition from outside for other purposes. I don't know how realistic that is, but e.g., to allow people to just look at and work with the interiors.

Yes! I was thinking in a turing_code function to return or print the Turing underlying Turing code.

I think that we can also have that function to be the main one. turing_model would just call turing_code and eval the string as a macro. Is that possible?

Well, ideally that would work in DPPL itself. We have been thinking about adding the original expression to the model struct and use it for printing, which would be pretty minor change but just hasn't been considered to be of high urgency.

However, this will then only work on the instantiated model object, not the result of the @model itself, which after all is just a bunch of method definitions of the model evaluator function.

I wouldn't do eval within the function. If necessary, you could have a top-level dict of names to code and evaluate those on the top level, but I don't really see the advantage.

Ok, I will try to work on something during the week. This week is nasty because there are a lot of final committees and also tons of final semester assessments to grade. But it is the final week of the academic semester for me. I will add the turing_code suggestion in the comment down below /

rikhuijzer · 2021-12-12T09:13:54Z

src/turing_model.jl

+            @model function student_model(
+                y,
+                X;
+                predictors=size(X, 2),
+                μ_X=μ_X,
+                σ_X=σ_X,
+                prior=custom_prior,
+                residual=1 / std(y),
+            )
+                α ~ prior.intercept
+                β ~ filldist(prior.predictors, predictors)
+                σ ~ Exponential(residual)
+                ν ~ prior.auxiliary
+                y ~ arraydist(LocationScale.(α .+ X * β, σ, TDist.(ν)))
+                return (; α, β, σ, ν, y)
+            end


Same as above. With more than 120 lines, this method is too long to be easily readable

test/turing_model.jl

Co-authored-by: Rik Huijzer <[email protected]>

storopoli · 2021-12-12T11:00:51Z

Things to do:

Break up the huge turing_model into smaller functions. Maybe make family a struct instead of a string? Yes Normal() is the default. It is a Distribution type and Normal.
Have turing_model call turing_code and eval it. turing_code should be exposed also.
Testing with priors instead of my_prior
Testing with f = @formula(...) instead of inside the turing_model
Testing Chains stuff with only one chn variable

storopoli · 2021-12-19T08:47:41Z

Ready to review again. I could not implement easily turing_code function because I need to figure how to parse the Prior structs DefaultPrior CustomPrior to be displayed in the turing_code function. I think we should leave this for the 0.2.0 release or future releases.

The whole turing_model API for non-hierarchical models is done. I've created a custom type for the likelihood called Model. I had to be creative with the naming to avoid conflicts with the Distributions.jl types because we need them in the namespace for users to specify custom priors.

Jose Storopoli added 3 commits December 11, 2021 10:57

models working with default priors

9b45dda

test with reading CSV done

c17f188

fixed-effect models done

4a34c30

storopoli added the enhancement New feature or request label Dec 11, 2021

storopoli added this to the 0.1.0 milestone Dec 11, 2021

storopoli requested review from cpfiffer and rikhuijzer December 11, 2021 19:12

storopoli self-assigned this Dec 11, 2021

Jose Storopoli added 3 commits December 11, 2021 16:14

corrected signature NegativeBinomial2

fd62c77

apply format (oops)

2fb0aa6

more tolerance for turing_model tests

dfee3df

rikhuijzer reviewed Dec 12, 2021

View reviewed changes

Apply suggestions from code review

ae258c9

Co-authored-by: Rik Huijzer <[email protected]>

fixed DefaultPrior test

8ba6336

storopoli mentioned this pull request Dec 18, 2021

Added Gzipped datasets #18

Merged

Jose Storopoli added 4 commits December 18, 2021 07:47

Merge branch 'main' into turing_model_fixed_effects

aa60657

Break up turing_model into smaller functions

5976d06

apply format (oops)

9691a90

docstring turing_model

7190460

storopoli requested a review from rikhuijzer December 19, 2021 08:45

added spaces between turing_model methods

72e6d9a

rikhuijzer approved these changes Dec 19, 2021

View reviewed changes

storopoli merged commit ed55e3e into main Dec 19, 2021

storopoli deleted the turing_model_fixed_effects branch December 19, 2021 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turing_model part 1: fixed-effects #17

turing_model part 1: fixed-effects #17

storopoli commented Dec 11, 2021

rikhuijzer commented Dec 12, 2021

rikhuijzer left a comment

rikhuijzer Dec 12, 2021

phipsgabler Dec 12, 2021

storopoli Dec 12, 2021

phipsgabler Dec 12, 2021

storopoli Dec 12, 2021

rikhuijzer Dec 12, 2021

storopoli commented Dec 12, 2021 •

edited

Loading

storopoli commented Dec 19, 2021

turing_model part 1: fixed-effects #17

turing_model part 1: fixed-effects #17

Conversation

storopoli commented Dec 11, 2021

Implemented likelihoods:

Datasets in data/

rikhuijzer commented Dec 12, 2021

rikhuijzer left a comment

Choose a reason for hiding this comment

rikhuijzer Dec 12, 2021

Choose a reason for hiding this comment

phipsgabler Dec 12, 2021

Choose a reason for hiding this comment

storopoli Dec 12, 2021

Choose a reason for hiding this comment

phipsgabler Dec 12, 2021

Choose a reason for hiding this comment

storopoli Dec 12, 2021

Choose a reason for hiding this comment

rikhuijzer Dec 12, 2021

Choose a reason for hiding this comment

storopoli commented Dec 12, 2021 • edited Loading

storopoli commented Dec 19, 2021

Datasets in `data/`

storopoli commented Dec 12, 2021 •

edited

Loading