TuringLang · yebai · Jul 21, 2022 · Jul 3, 2022 · Jul 6, 2022 · Jul 6, 2022
diff --git a/tutorials/13-seasonal-time-series/13_seasonal_time_series.jmd b/tutorials/13-seasonal-time-series/13_seasonal_time_series.jmd
@@ -0,0 +1,328 @@
+---
+title: Bayesian Time Series Analysis
+permalink: /:collection/:name/
+weave_options:
+  error : false
+---
+
+In time series analysis we are often interested in understanding how various real-life circumstances impact our quantity of interest.
+These can be, for instance, season, day of week, or time of day.
+To analyse this it is useful to decompose time series into simpler components (corresponding to relevant circumstances)
+and infer their relevance.
+In this tutorial we are going to use Turing for time series analysis and learn about useful ways to decompose time series.
+
+# Modelling time series
+
+Before we start coding, let us talk about what exactly we mean with time series decomposition.
+In a nutshell, it is a divide-and-conquer approach where we express a time series as a sum or a product of simpler series.
+For instance, the time series $f(t)$ can be decomposed into a sum of $n$ components
+
+$$f(t) = \sum_{i=1}^n f_i(t),$$
+
+or we can decompose $g(t)$ into a product of $m$ components
+
+$$g(t) = \sum_{i=1}^m g_i(t).$$
+
+We refer to this as *additive* or *multiplicative* decomposition respectively.
+This type of decomposition is great since it lets us reason about individual components, which makes encoding prior information and interpreting model predictions very easy.
+Two common components are *trends*, which represent the overall change of the time series (often assumed to be linear),
+and *cyclic effects* which contribute oscillating effects around the trend.
+Let us simulate some data with an additive linear trend and oscillating effects.
+
+```julia
+using StatsPlots, Turing, Statistics, LinearAlgebra, Random
+Random.seed!(12345)
+true_sin_freq = 2
+true_sin_amp = 5
+true_cos_freq = 7
+true_cos_amp = 2.5
+tmax = 10
+β_true = 2
+α_true = -1
+tt = 0:0.05:tmax
+f₁(t) = α_true + β_true * t
+f₂(t) = true_sin_amp * sinpi(2 * t * true_sin_freq / tmax)
+f₃(t) = true_cos_amp * cospi(2 * t * true_cos_freq / tmax)
+f(t) = f₁(t) + f₂(t) + f₃(t)
+
+plot(f, tt; label="f(t)", title="Observed time series", legend=:topleft, linewidth=3)
+plot!(
+    [f₁, f₂, f₃],
+    tt;
+    label=["f₁(t)" "f₂(t)" "f₃(t)"],
+    style=[:dot :dash :dashdot],
+    linewidth=1,
+)
+```
+
+Even though we use simple components, combining them can give rise to fairly complex time series.
+In this time series, cyclic effects are just added on top of the trend.
+If we instead multiply the components the cyclic effects cause the series to oscillate
+between larger and larger values, since they get scaled by the trend.
+
+```julia
+g(t) = f₁(t) * f₂(t) * f₃(t)
+
+plot(g, tt; label="f(t)", title="Observed time series", legend=:topleft, linewidth=3)
+plot!([f₁, f₂, f₃], tt; label=["f₁(t)" "f₂(t)" "f₃(t)"], linewidth=1)
+```
+
+Unlike $f$, $g$ oscillates around $0$ since it is being multiplied with sines and cosines.
+To let a multiplicative decomposition oscillate around the trend we could define it as
+$\tilde{g}(t) = f₁(t) * (1 + f₂(t)) * (1 + f₃(t)),$
+but for convenience we will leave it as is.
+The inference machinery is the same for both cases.
+
+# Model fitting
+
+Having discussed time series decomposition, let us fit a model to the time series above and recover the true parameters.
+Before building our model, we standardise the time axis to $[0, 1]$ and subtract the max of the time series.
+This helps convergence while maintaining interpretability and the correct scales for the cyclic components.
+
+```julia
+σ_true = 0.35
+t = collect(tt[begin:3:end])
+t_min, t_max = extrema(t)
+x = (t .- t_min) ./ (t_max - t_min)
+yf = f.(t) .+ σ_true .* randn(size(t))
+yf_max = maximum(yf)
+yf = yf .- yf_max
+
+scatter(x, yf; title="Standardised data", legend=false)
+```
+
+Let us now build our model.
+We want to assume a linear trend, and cyclic effects.
+Encoding a linear trend is easy enough, but what about cyclical effects?
+We will take a scattergun approach, and create multiple cyclical features using both sine and cosine functions and let our inference machinery figure out which to keep.
+To do this, we define how long a one period should be, and create features in reference to said period.
+How long a period should be is problem dependent, but as an example let us say it is $1$ year.
+If we then find evidence for a cyclic effect with a frequency of 2, that would mean a biannual effect. A frequency of 4 would mean quarterly etc.
+Since we are using synthetic data, we are simply going to let the period be 1, which is the entire length of the time series.
+
+```julia
+freqs = 1:10
+num_freqs = length(freqs)
+period = 1
+cyclic_features = [sinpi.(2 .* freqs' .* x ./ period) cospi.(2 .* freqs' .* x ./ period)]
+
+plot_freqs = [1, 3, 5]
+freq_ptl = plot(
+    cyclic_features[:, plot_freqs];
+    label=permutedims(["sin(2π$(f)x)" for f in plot_freqs]),
+    title="Cyclical features subset",
+)
+```
+
+Having constructed the cyclical features, we can finally build our model. The model we will implement looks like this
+
+$$
+f(t) = \alpha + \beta_t t + \sum_{i=1}^F \beta_{\sin{},i} \sin{}(2\pi f_i t) + \sum_{i=1}^F \beta_{\cos{},i} \cos{}(2\pi f_i t),
+$$
+
+with a Gaussian likelihood $y \sim \mathcal{N}(f(t), \sigma^2)$.
+For convenience we are treating the cyclical feature weights $\beta_{\sin{},i} and \beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
-For convenience we are treating the cyclical feature weights $\beta_{\sin{},i} and \beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
+For convenience we are treating the cyclical feature weights $\beta_{\sin{},i}$ and $\beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
-For convenience we are treating the cyclical feature weights $\beta_{\sin{},i} and \beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
+For convenience we are treating the cyclical feature weights $\beta_{\sin{},i}$ and $\beta_{\cos{},i}$ the same in code and weight them with $\beta_c$.
+And just because it is so easy, we parameterise our model with the operation with which to apply the cyclic effects.
+This lets us use the exact same code for both additive and multiplicative models.
+Finally, we plot prior predictive samples to make sure our priors make sense.
+
+```julia
+@model function decomp_model(t, c, op)
+    α ~ Normal(0, 10)
+    βt ~ Normal(0, 2)
+    βc ~ MvNormal(zeros(size(c, 2)), I)
+    σ ~ truncated(Normal(0, 0.1); lower=0)
+
+    cyclic = c * βc
+    trend = α .+ βt .* t
+    μ = op(trend, cyclic)
+    y ~ MvNormal(μ, σ^2 * I)
+    return (; trend, cyclic)
+end
+
+y_prior_samples = mapreduce(hcat, 1:100) do _
+    rand(decomp_model(t, cyclic_features, +)).y
+end
+plot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label="", title="Prior samples")
+scatter!(t, yf; color=2, label="Data")
+```
+
+With the model specified and with a reasonable prior we can now let Turing decompose the time series for us!
+
+```julia
-```julia
+```julia; fig_ext = ".png"
-```julia
+```julia; fig_ext = ".png"
+function mean_ribbon(samples)
+    qs = quantile(samples)
+    low = qs[:, Symbol("2.5%")]
+    up = qs[:, Symbol("97.5%")]
+    m = mean(samples)[:, :mean]
+    return m, (m - low, up - m)
+end
+
+function get_decomposition(model, x, cyclic_features, chain)
+    chain_params = Turing.MCMCChains.get_sections(chain, :parameters)
+    return generated_quantities(model(x, cyclic_features, +), chain_params)
+end
+
+function plot_fit(x, y, decomp, ymax)
+    trend = mapreduce(x -> x.trend, hcat, decomp)
+    cyclic = mapreduce(x -> x.cyclic, hcat, decomp)
+
+    trend_plt = plot(
+        x,
+        trend .+ ymax;
+        color=1,
+        label=nothing,
+        alpha=0.2,
+        title="Trend",
+        xlabel="Time",
+        ylabel="f₁(t)",
+    )
+    ls = [ones(length(t)) t] \ y
+    α̂, β̂ = ls[1], ls[2:end]
+    plot!(
+        trend_plt,
+        t,
+        α̂ .+ t .* β̂ .+ ymax;
+        label="Least squares trend",
+        color=5,
+        linewidth=4,
+    )
+
+    scatter!(trend_plt, x, y .+ ymax; label=nothing, color=2, legend=:topleft)
+    cyclic_plt = plot(
+        x,
+        cyclic;
+        color=1,
+        label=nothing,
+        alpha=0.2,
+        title="Cyclic effect",
+        xlabel="Time",
+        ylabel="f₂(t)",
+    )
+    return trend_plt, cyclic_plt
+end
+
+chain = sample(decomp_model(x, cyclic_features, +) | (; y=yf), NUTS(), 2000)
+yf_samples = predict(decomp_model(x, cyclic_features, +), chain)
+m, conf = mean_ribbon(yf_samples)
+predictive_plt = plot(
+    t,
+    m .+ yf_max;
+    ribbon=conf,
+    label="Posterior density",
+    title="Posterior decomposition",
+    xlabel="Time",
+    ylabel="f(t)",
+)
+scatter!(predictive_plt, t, yf .+ yf_max; color=2, label="Data", legend=:topleft)
+
+decomp = get_decomposition(decomp_model, x, cyclic_features, chain)
+decomposed_plt = plot_fit(t, yf, decomp, yf_max)
+combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
-combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
+plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))
-combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
+plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))
+```
+
+```julia, echo=false
+@assert isapprox(mean(chain, :βt) / t_max, β_true; atol=0.5)
+@assert isapprox(mean(chain, :α) + yf_max, α_true; atol=1.0)
+@assert isapprox(mean(chain, :σ), σ_true; atol=0.1)
+@assert isapprox(mean(chain, "βc[2]"), true_sin_amp; atol=0.5)
+@assert isapprox(mean(chain, "βc[17]"), true_cos_amp; atol=0.5)
+```
+
+Inference is successful and the posterior beautifully captures the data.
+We see that the least squares linear fit deviates somewhat from the posterior trend.
+Since our model takes cyclic effects into account separately,
+we get a better estimate of the true overall trend than if we would have just fitted a line.
+But what frequency content did the model identify?
+
+```julia
+function plot_cyclic_features(βsin, βcos)
+    labels = reshape(["freq = $i" for i in freqs], 1, :)
+    colors = collect(freqs)'
+    style = reshape([i <= 10 ? :solid : :dash for i in 1:length(labels)], 1, :)
+    sin_features_plt = density(
+        βsin[:, :, 1];
+        title="Sine features posterior",
+        label=labels,
+        ylabel="Density",
+        xlabel="Weight",
+        color=colors,
+        linestyle=style,
+        legend=nothing,
+    )
+    cos_features_plt = density(
+        βcos[:, :, 1];
+        title="Cosine features posterior",
+        ylabel="Density",
+        xlabel="Weight",
+        label=nothing,
+        color=colors,
+        linestyle=style,
+    )
+
+    return seasonal_features_plt = plot(
+        sin_features_plt,
+        cos_features_plt;
+        layout=(2, 1),
+        size=(800, 600),
+        legend=:outerright,
+    )
+end
+
+βc = group(chain, :βc).value
+plot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])
+```
+
+Plotting the posterior over the cyclic features reveals that the model managed to extract the true frequency content.
+
+Since we wrote our model to accept a combining operator, we can easily run the same analysis for a multiplicative model.
+
+```julia
+yg = g.(t) .+ σ_true .* randn(size(t))
+
+y_prior_samples = mapreduce(hcat, 1:100) do _
+    rand(decomp_model(t, cyclic_features, .*)).y
+end
+plot(t, y_prior_samples; linewidth=1, alpha=0.5, color=1, label="", title="Prior samples")
+scatter!(t, yf; color=2, label="Data")
+```
+
+```julia
+chain = sample(decomp_model(x, cyclic_features, .*) | (; y=yg), NUTS(), 2000)
+yg_samples = predict(decomp_model(x, cyclic_features, .*), chain)
+m, conf = mean_ribbon(yg_samples)
+predictive_plt = plot(
+    t,
+    m;
+    ribbon=conf,
+    label="Posterior density",
+    title="Posterior decomposition",
+    xlabel="Time",
+    ylabel="g(t)",
+)
+scatter!(predictive_plt, t, yg; color=2, label="Data", legend=:topleft)
+
+decomp = get_decomposition(decomp_model, x, cyclic_features, chain)
+decomposed_plt = plot_fit(t, yg, decomp, 0)
+combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
-combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
+plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))
-combined_plt = plot(predictive_plt, decomposed_plt...; layout=(4, 1), size=(700, 1000))
+plot(predictive_plt, decomposed_plt...; layout=(3, 1), size=(700, 1000))
+```
+
+The model fits! What about the infered cyclic components?
+
+```julia
+βc = group(chain, :βc).value
+plot_cyclic_features(βc[:, begin:num_freqs, :], βc[:, (num_freqs + 1):end, :])
+```
+
+While multiplicative model fits to the data, it does not recover the true parameters for this dataset.
+
+# Wrapping up
+
+In this tutorial we have seen how to implement and fit time series models using additive and multiplicative decomposition.
+We also saw how to visualise the model fit, and how to interpret learned cyclical components.
+
+```julia, echo=false, skip="notebook"
+if isdefined(Main, :TuringTutorials)
+    Main.TuringTutorials.tutorial_footer(WEAVE_ARGS[:folder], WEAVE_ARGS[:file])
+end
+```