Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction to JuliaGPs + Turing #423

Merged
merged 3 commits into from
Sep 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions tutorials/15-gaussian-processes/15_gaussian_processes.jmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
title: Introduction to Gaussian Processes using JuliaGPs and Turing.jl
permalink: /tutorials/:name/
redirect_from: tutorials/15-gaussian-processes/
weave_options:
error : false
---

[JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) packages integrate well with Turing.jl because they implement the Distributions.jl
interface.
You should be able to understand what is going on in this tutorial if you know what a GP is.
For a more in-depth understanding of the
[JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) functionality
used here, please consult the
[JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) docs.

In this tutorial, we will model the putting dataset discussed in chapter 21 of
[Bayesian Data Analysis](http://www.stat.columbia.edu/%7Egelman/book/).
The dataset comprises the result of measuring how often a golfer successfully gets the ball
in the hole, depending on how far away from it they are.
The goal of inference is to estimate the probability of any given shot being successful at a
given distance.

Let's download the data and take a look at it:

```julia
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
using CSV, DataDeps, DataFrames

ENV["DATADEPS_ALWAYS_ACCEPT"] = true
register(
DataDep(
"putting",
"Putting data from BDA",
"http://www.stat.columbia.edu/~gelman/book/data/golf.dat",
"fc28d83896af7094d765789714524d5a389532279b64902866574079c1a977cc",
),
)

fname = joinpath(datadep"putting", "golf.dat")
df = CSV.read(fname, DataFrame; delim=' ', ignorerepeated=true)
df[1:5, :]
```

We've printed the first 5 rows of the dataset (which comprises only 19 rows in total).
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
Observe it has three columns:

1. `distance` -- how far away from the hole. I'll refer to `distance` as `d` throughout the rest of this tutorial
2. `n` -- how many shots were taken from a given distance
3. `y` -- how many shots were successful from a given distance

We will use a Binomial model for the data, whose success probability is parametrised by a
transformation of a GP. Something along the lines of:
$$
f \sim \operatorname{GP}(0, k) \\
y_j \mid f(d_j) \sim \operatorname{Binomial}(n_j, g(f(d_j))) \\
g(x) := \frac{1}{1 + e^{-x}}
$$

To do this, let's define our Turing.jl model:

```julia
using AbstractGPs, LogExpFunctions, Turing

@model function putting_model(d, n; jitter=1e-4)
v ~ Gamma(2, 1)
l ~ Gamma(4, 1)
f = GP(v * with_lengthscale(SEKernel(), l))
f_latent ~ f(d, jitter)
y ~ product_distribution(Binomial.(n, logistic.(f_latent)))
return (fx=f(d, jitter), f_latent=f_latent, y=y)
end
```

We first define an `AbstractGPs.GP`, which represents a distribution over functions, and
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
is entirely separate from Turing.jl.
We place a prior over its variance `v` and length-scale `l`.
`f(d, jitter)` constructs the multivariate Gaussian comprising the random variables
in `f` whose indices are in `d` (+ a bit of independent Gaussian noise with variance
`jitter` -- see [the docs](https://juliagaussianprocesses.github.io/AbstractGPs.jl/dev/api/#FiniteGP-and-AbstractGP)
for more details).
`f(d, jitter) isa AbstractMvNormal`, and is the bit of AbstractGPs.jl that implements the
Distributions.jl interface, so it's legal to put it on the right hand side
of a `~`.
From this you should deduce that `f_latent` is distributed according to a multivariate
Gaussian.
The remaining lines comprise standard Turing.jl code that is encountered in other tutorials
and Turing documentation.

Before performing inference, we might want to inspect the prior that our model places over
the data, to see whether there is anything that is obviously wrong.
These kinds of prior predictive checks are straightforward to perform using Turing.jl, since
it is possible to sample from the prior easily by just calling the model:

```julia
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
m = putting_model(Float64.(df.distance), df.n)
m().y
```

We make use of this to see what kinds of datasets we simulate from the prior:

```julia
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
using Plots

function plot_data(d, n, y, xticks, yticks)
ylims = (0, round(maximum(n), RoundUp; sigdigits=2))
margin = -0.5 * Plots.mm
plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false)
bar!(plt, d, n; color=:red, label="", alpha=0.5)
bar!(plt, d, y; label="", color=:blue, alpha=0.7)
return plt
end

# Construct model and run some prior predictive checks.
m = putting_model(Float64.(df.distance), df.n)
hists = map(1:20) do j
xticks = j > 15 ? :auto : nothing
yticks = rem(j, 5) == 1 ? :auto : nothing
return plot_data(df.distance, df.n, m().y, xticks, yticks)
end
plot(hists...; layout=(4, 5))
```

In this case, the only prior knowledge I have is that the proportion of successful shots
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
ought to decrease monotonically as the distance from the hole increases, which should show
up in the data as the blue lines generally going down as we move from left to right on each
graph.
Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP,
and we can see this in some of the plots above, so we must hope that we have enough data to
ensure that this relationship approximately holds under the posterior.
In any case, you can judge for yourself whether you think this is the most useful
visualisation that we can perform -- if you think there is something better to look at,
please let us know!
Comment on lines +127 to +132
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is crazy (and I'm def not suggesting we put this in the tutorial), but could we perform a quick linear regression and use this as a prior for the GP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm I'm not entirely sure that I understand. If I've understood correctly, the suggestion is to look at the data in order to inform our prior, which I'm not sure makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would indeed make it, I dunno, "semi-Bayesian" or something, but it could be a useful trick to get monotonicity (though it wouldn't guarantee it). But I guess the proper way would be to just treat it as a multivariate normal and model the mean directly?

Another thing, is it sensible to put a prior on the GP prior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in some ways we are here -- we're placing a prior over the kernel parameters, which is implicitely placing a prior over the GP prior. Are you imagining something non-parametric though?


Moving on, we generate samples from the posterior using the default `NUTS` sampler.
We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has
better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on
this example. See Turing.jl's docs on Automatic Differentiation for more info.

```julia
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
using Random, ReverseDiff

Turing.setadbackend(:reversediff)
Turing.setrdcache(true)

m_post = m | (y=df.y,)
chn = sample(Xoshiro(123456), m_post, NUTS(), 1_000)
```

We can use these samples and the `posterior` function from `AbstractGPs` to sample from the
posterior probability of success at any distance we choose:

```julia
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
d_pred = 1:0.2:21
samples = map(generated_quantities(m_post, chn)[1:10:end]) do x
return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
end
p = plot()
plot!(d_pred, reduce(hcat, samples); label="", color=:blue, alpha=0.2)
scatter!(df.distance, df.y ./ df.n; label="", color=:red)
```

We can see that the general trend is indeed down as the distance from the hole increases,
willtebbutt marked this conversation as resolved.
Show resolved Hide resolved
and that if we move away from the data, the posterior uncertainty quickly inflates.
This suggests that the model is probably going to do a reasonable job of interpolating
between observed data, but less good a job at extrapolating to larger distances.
Loading