Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial on time series modelling #307

Merged
merged 17 commits into from
Jul 21, 2022
Merged
328 changes: 328 additions & 0 deletions tutorials/13-seasonal-time-series/13_seasonal_time_series.jmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,328 @@
---
title: Bayesian Time Series Analysis
permalink: /:collection/:name/
weave_options:
error : false
---

In time series analysis we are often interested in understanding how various real-life circumstances impact our quantity of interest.
These can be, for instance, season, day of week, or time of day.
To analyse this it is useful to decompose time series into simpler components (corresponding to relevant circumstances)
and infer their relevance.
In this tutorial we are going to use Turing for time series analysis and learn about useful ways to decompose time series.

# Modelling time series

Before we start coding, let us talk about what exactly we mean with time series decomposition.
In a nutshell, it is a divide-and-conquer approach where we express a time series as a sum or a product of simpler series.
For instance, the time series $f(t)$ can be decomposed into a sum of $n$ components

$$f(t) = \sum_{i=1}^n f_i(t),$$

or we can decompose $g(t)$ into a product of $m$ components

$$g(t) = \sum_{i=1}^m g_i(t).$$

We refer to this as *additive* or *multiplicative* decomposition respectively.
This type of decomposition is great since it lets us reason about individual components, which makes encoding prior information and interpreting model predictions very easy.
Two common components are *trends*, which represent the overall change of the time series (often assumed to be linear),
and *cyclic effects* which contribute oscillating effects around the trend.
Let us simulate some data with an additive linear trend and oscillating effects.

```julia
using StatsPlots, Turing, Statistics, LinearAlgebra, Random
Random.seed!(12345)
true_sin_freq = 2
true_sin_amp = 5
true_cos_freq = 7
true_cos_amp = 2.5
tmax = 10
β_true = 2
α_true = -1
tt = 0:0.05:tmax
f₁(t) = α_true + β_true * t
f₂(t) = true_sin_amp * sinpi(2 * t * true_sin_freq / tmax)
f₃(t) = true_cos_amp * cospi(2 * t * true_cos_freq / tmax)
f(t) = f₁(t) + f₂(t) + f₃(t)

plot(f, tt; label="f(t)", title="Observed time series", legend=:topleft, linewidth=3)
plot!(
[f₁, f₂, f₃],
tt;
label=["f₁(t)" "f₂(t)" "f₃(t)"],
style=[:dot :dash :dashdot],
linewidth=1,
)
```

Even though we use simple components, combining them can give rise to fairly complex time series.
In this time series, cyclic effects are just added on top of the trend.
If we instead multiply the components the cyclic effects cause the series to oscillate
between larger and larger values, since they get scaled by the trend.

```julia
g(t) = f₁(t) * f₂(t) * f₃(t)

plot(g, tt; label="f(t)", title="Observed time series", legend=:topleft, linewidth=3)
plot!([f₁, f₂, f₃], tt; label=["f₁(t)" "f₂(t)" "f₃(t)"], linewidth=1)
```

Unlike $f$, $g$ oscillates around $0$ since it is being multiplied with sines and cosines.
To let a multiplicative decomposition oscillate around the trend we could define it as
$\tilde{g}(t) = f₁(t) * (1 + f₂(t)) * (1 + f₃(t)),$
but for convenience we will leave it as is.
The inference machinery is the same for both cases.

# Model fitting

Having discussed time series decomposition, let us fit a model to the time series above and recover the true parameters.
Before building our model, we standardise the time axis to $[0, 1]$ and subtract the max of the time series.
This helps convergence while maintaining interpretability and the correct scales for the cyclic components.

```julia
σ_true = 0.35
t = collect(tt[begin:3:end])
t_min, t_max = extrema(t)
x = (t .- t_min) ./ (t_max - t_min)
yf = f.(t) .+ σ_true .* randn(size(t))
yf_max = maximum(yf)
yf = yf .- yf_max
devmotion marked this conversation as resolved.
Show resolved Hide resolved

scatter(x, yf; title="Standardised data", legend=false)
```

Let us now build our model.
We want to assume a linear trend, and cyclic effects.
Encoding a linear trend is easy enough, but what about cyclical effects?
We will take a scattergun approach, and create multiple cyclical features using both sine and cosine functions and let our inference machinery figure out which to keep.
To do this, we define how long a one period should be, and create features in reference to said period.
How long a period should be is problem dependent, but as an example let us say it is $1$ year.
If we then find evidence for a cyclic effect with a frequency of 2, that would mean a biannual effect. A frequency of 4 would mean quarterly etc.
Since we are using synthetic data, we are simply going to let the period be 1, which is the entire length of the time series.

```julia
freqs = 1:10
num_freqs = length(freqs)
period = 1
cyclic_features = [sinpi.(2 .* freqs' .* x ./ period) cospi.(2 .* freqs' .* x ./ period)]

plot_freqs = [1, 3, 5]
freq_ptl = plot(
cyclic_features[:, plot_freqs];
label=permutedims(["sin(2π$(f)x)" for f in plot_freqs]),
title="Cyclical features subset",
)
```

Having constructed the cyclical features, we can finally build our model. The model we will implement looks like this

$$
f(t) = \alpha + \beta_t t + \sum_{i=1}^F \beta_{\sin{},i} \sin{}(2\pi f_i t) + \sum_{i=1}^F \beta_{\cos{},i} \cos{}(2\pi f_i t),
$$

with a Gaussian likelihood $y \sim \mathcal{N}(f(t), \sigma^2)$.
For convenience we are treating the cyclical feature weights $\beta_{\sin{},i} and \beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For convenience we are treating the cyclical feature weights $\beta_{\sin{},i} and \beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
For convenience we are treating the cyclical feature weights $\beta_{\sin{},i}$ and $\beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.

And just because it is so easy, we parameterise our model with the operation with which to apply the cyclic effects.
This lets us use the exact same code for both additive and multiplicative models.
Finally, we plot prior predictive samples to make sure our priors make sense.

```julia
@model function decomp_model(t, c, op)
α ~ Normal(0, 10)
βt ~ Normal(0, 2)
βc ~ MvNormal(zeros(size(c, 2)), I)
σ ~ truncated(Normal(0, 0.1); lower=0)

cyclic = c * βc
trend = α .+ βt .* t
μ = op(trend, cyclic)
y ~ MvNormal(μ, σ^2 * I)
return (; trend, cyclic)
end

y_prior_samples = mapreduce(hcat, 1:100) do _
rand(decomp_model(t, cyclic_features, +)).y
end
plot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label="", title="Prior samples")
scatter!(t, yf; color=2, label="Data")
```

With the model specified and with a reasonable prior we can now let Turing decompose the time series for us!

```julia
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML file is quite large, probably due to the size of some of the SVGs. I assume the problem might be this plot (and the similar one below). Maybe we can enforce png output here by setting chunk options? Eg.,

Suggested change
```julia
```julia; fig_ext = ".png"

function mean_ribbon(samples)
qs = quantile(samples)
low = qs[:, Symbol("2.5%")]
up = qs[:, Symbol("97.5%")]
m = mean(samples)[:, :mean]
return m, (m - low, up - m)
end

function get_decomposition(model, x, cyclic_features, chain)
chain_params = Turing.MCMCChains.get_sections(chain, :parameters)
return generated_quantities(model(x, cyclic_features, +), chain_params)
devmotion marked this conversation as resolved.
Show resolved Hide resolved
end

function plot_fit(x, y, decomp, ymax)
trend = mapreduce(x -> x.trend, hcat, decomp)
cyclic = mapreduce(x -> x.cyclic, hcat, decomp)

trend_plt = plot(
x,
trend .+ ymax;
color=1,
label=nothing,
alpha=0.2,
title="Trend",
xlabel="Time",
ylabel="f₁(t)",
)
ls = [ones(length(t)) t] \ y
α̂, β̂ = ls[1], ls[2:end]
plot!(
trend_plt,
t,
α̂ .+ t .* β̂ .+ ymax;
label="Least squares trend",
color=5,
linewidth=4,
)

scatter!(trend_plt, x, y .+ ymax; label=nothing, color=2, legend=:topleft)
cyclic_plt = plot(
x,
cyclic;
color=1,
label=nothing,
alpha=0.2,
title="Cyclic effect",
xlabel="Time",
ylabel="f₂(t)",
)
return trend_plt, cyclic_plt
end

chain = sample(decomp_model(x, cyclic_features, +) | (; y=yf), NUTS(), 2000)
yf_samples = predict(decomp_model(x, cyclic_features, +), chain)
m, conf = mean_ribbon(yf_samples)
predictive_plt = plot(
t,
m .+ yf_max;
ribbon=conf,
label="Posterior density",
title="Posterior decomposition",
xlabel="Time",
ylabel="f(t)",
)
scatter!(predictive_plt, t, yf .+ yf_max; color=2, label="Data", legend=:topleft)

decomp = get_decomposition(decomp_model, x, cyclic_features, chain)
decomposed_plt = plot_fit(t, yf, decomp, yf_max)
combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

Suggested change
combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))

?

```

```julia, echo=false
@assert isapprox(mean(chain, :βt) / t_max, β_true; atol=0.5)
@assert isapprox(mean(chain, :α) + yf_max, α_true; atol=1.0)
@assert isapprox(mean(chain, :σ), σ_true; atol=0.1)
@assert isapprox(mean(chain, "βc[2]"), true_sin_amp; atol=0.5)
@assert isapprox(mean(chain, "βc[17]"), true_cos_amp; atol=0.5)
```

Inference is successful and the posterior beautifully captures the data.
We see that the least squares linear fit deviates somewhat from the posterior trend.
Since our model takes cyclic effects into account separately,
we get a better estimate of the true overall trend than if we would have just fitted a line.
But what frequency content did the model identify?

```julia
function plot_cyclic_features(βsin, βcos)
labels = reshape(["freq = $i" for i in freqs], 1, :)
colors = collect(freqs)'
style = reshape([i <= 10 ? :solid : :dash for i in 1:length(labels)], 1, :)
sin_features_plt = density(
βsin[:, :, 1];
title="Sine features posterior",
label=labels,
ylabel="Density",
xlabel="Weight",
color=colors,
linestyle=style,
legend=nothing,
)
cos_features_plt = density(
βcos[:, :, 1];
title="Cosine features posterior",
ylabel="Density",
xlabel="Weight",
label=nothing,
color=colors,
linestyle=style,
)

return seasonal_features_plt = plot(
sin_features_plt,
cos_features_plt;
layout=(2, 1),
size=(800, 600),
legend=:outerright,
)
end

βc = group(chain, :βc).value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .value is an undocumental internal implementation detail. It would be cleaner to work with the resulting chain directly or convert it to an array with Array(...).

plot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])
```

Plotting the posterior over the cyclic features reveals that the model managed to extract the true frequency content.

Since we wrote our model to accept a combining operator, we can easily run the same analysis for a multiplicative model.

```julia
yg = g.(t) .+ σ_true .* randn(size(t))

y_prior_samples = mapreduce(hcat, 1:100) do _
rand(decomp_model(t, cyclic_features, .*)).y
end
plot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label="", title="Prior samples")
scatter!(t, yf; color=2, label="Data")
```

```julia
chain = sample(decomp_model(x, cyclic_features, .*) | (; y=yg), NUTS(), 2000)
yg_samples = predict(decomp_model(x, cyclic_features, .*), chain)
m, conf = mean_ribbon(yg_samples)
predictive_plt = plot(
t,
m;
ribbon=conf,
label="Posterior density",
title="Posterior decomposition",
xlabel="Time",
ylabel="g(t)",
)
scatter!(predictive_plt, t, yg; color=2, label="Data", legend=:topleft)

decomp = get_decomposition(decomp_model, x, cyclic_features, chain)
decomposed_plt = plot_fit(t, yg, decomp, 0)
combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))

```

The model fits! What about the infered cyclic components?

```julia
βc = group(chain, :βc).value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

plot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])
```

While multiplicative model fits to the data, it does not recover the true parameters for this dataset.

# Wrapping up

In this tutorial we have seen how to implement and fit time series models using additive and multiplicative decomposition.
We also saw how to visualise the model fit, and how to interpret learned cyclical components.
devmotion marked this conversation as resolved.
Show resolved Hide resolved

```julia, echo=false, skip="notebook"
if isdefined(Main, :TuringTutorials)
Main.TuringTutorials.tutorial_footer(WEAVE_ARGS[:folder], WEAVE_ARGS[:file])
end
```
Loading