-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
224 lines (179 loc) · 7.14 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
title: "State Space Recipes in R"
author: "Paul Teetor"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
output: bookdown::gitbook
documentclass: book
bibliography: [articles.bib, books.bib, packages.bib]
biblio-style: apalike
link-citations: yes
github-repo: pteetor/StateSpaceRecipes
description: "This monograph is a collection of recipes for creating state-space models in R."
---
```{r globals, include=FALSE}
#
# Document global definitions
#
suppressPackageStartupMessages({
library(magrittr)
library(tidyverse)
library(dlm)
})
knitr::opts_chunk$set(prompt=FALSE, echo=TRUE, comment="##")
# Globals for this document
ALT_COLOR = "red"
ALT_STYLE = "dashed"
```
# Introduction
This monograph is a collection of recipes for creating state-space models in R.
I like the power of state-space models,
and R has several excellent packages for building them.
Unfortunately, it's not quite an "out of the box" technology.
Using any package involves numerous little details,
and unless I used the package very recently,
building a model requires pulling out the package documentation,
reading it all over again, and trying to remember how the parts fit together.
One day I got tired of that,
so I put together these recipes.
This is not a tutorial for state-space models.
For a general introduction to state-space modeling,
I recommend the book by Commandeur and Koopman [@CommandeurKoopman2007].
In these notes, I use the `StructTS` function
to create the simpler models,
and I use the `dlm` package
for more complicated models.
There isn't room here to cover other R packages.
If you're interested in a survey of state-space packaqes for R,
I recommend the excellent review by Tusell [@Tusell2011].
## The Software
Packages used in this monograph include
------------------------- -----------------------------------------------
`dlm` State-space modeling
`ggplot2` and `ggfortify` Plotting time series data
`magrittr` Provides the convenient "pipe" operator (`%>%`)
`xts` and `zoo` Tools for time series data
------------------------- -----------------------------------------------
### The StructTS function
R includes a function, `StructTS`, which can quickly and easily estimate
the parameters of simple state-space models
such as the *local level* model or the *local linear trend* model [@Ripley2002].
`StructTS` is one function in a group of functions
which, together, provide many features of state-space modeling.
Function Purpose
--------- --------
StructTS Estimate parameters of a simple state-space model
tsdiag Plot diagnostics for state-space model
KalmanLike Calculate parameters' log-likelihood (Gaussian model)
KalmanRun Filter time series data
tsSmooth Smooth time series data (calls KalmanSmooth)
KalmanForecast Forecast time series points from model
makeARIMA Create state-space model equivalent to ARIMA model
### The dlm package
For the advanced recipes,
I use the `dlm` package originally created by Giovanni Petris [^Petris2010].
The package is very well documented,
and Petris has even written a book regarding state-space models in general
and the `dlm` package in particular [@Petris2009].
There is also an overview written by Petris and Petrone [@PetrisPetrone2011]
which discusses several R packages with an emphasis on the `dlm` package.
The package contains many useful functions.
This monograph uses these.
Function Purpose
------------ --------------------------------------------------
dlmModPoly Construct polynomial model
dlmModReg Construct regression model
dlmMLE Estimate maximum likelihood parameters of model
dlmFilter Filter a time series
dlmSmooth Smooth a time series
dlmBSample Draw from the posterior distribution
The package includes a very cool feature,
which is the ability to "add" models together into a compound model.
That feature is not illustrated here,
but I urge any serious user to study the feature.
It would let you, say, easily combine a regression model with an ARMA model
to create a better model your data.
## The Examples
Many recipes includes an example.
The examples are intended to be fully stand-alone,
meaning you can cut and paste them directly into R and watch them run.
All examples use some concrete dataset, typically the Nile River data included with R.
Many recipes start by assigning the time series data to variable $y$, like this.
```{r, eval=FALSE, echo=TRUE}
y <- datasets::Nile
```
The subsequent code is written in terms of $y$, not a specific dataset.
My goal was to let you copy the recipe, substitute your data for the Nile River data,
and try the recipe for yourself.
## Smoothing Versus Filtering {#smoothingVersusFiltering}
Every state-space model identifies a noise component in your data.
There are two algorithms for removing the noise from your data: *smoothing* and *filtering*.
The difference between them is what they assume.
Suppose you have a time series with observations
at times $t_0$ through $t_T$,
$$t_0, t_1, ..., t_T$$
and you want to remove the noise from the data,
including some intermediate point, say $t_i$.
$$t_0, t_1, ..., t_i, ..., t_T$$
*Smoothing* assumes the entire time series data is available to you,
start to finish, from $t_0$ through $t_T$.
*Filtering* assumes only the data from $t_0$ through $t_i$
is available;
that is, you have not yet seen the data from $t_i+1$ through $t_T$.
Because smoothing has the entire dataset available,
it can estimate the best model and, hence,
does the best job of separating the signal from the noise.
Filtering can only estimate the best "so far",
given what we've observed,
so it estimates an imperfect model and can mistake
some signal for noise (and vice versa).
The practical impact is that smoothing is possible
when analyzing historical data
but impossible for "real-time" data,
that is, data you're observing moment by moment, day by day,
week by week, and so forth.
For the real-time data, you are limited to filtering.
When more data arrives subsequently, the model estimates
may improve and, in fact, the filtered results may change.
## Plotting Time Series Data {#plotting}
### Base graphics
Base R includes a `plot` function for time series data.
The function is quick-and-easy,
but I find it awkward to adjust the plot's appearance.
```{r}
y <- datasets::Nile
plot(y)
```
### ggplot2 with `autoplot`
Use the `autoplot` function of the ggfortify package
to plot your time series data using ggplot2,
the popular graphics package.
```{r}
library(ggfortify)
y <- datasets::Nile
autoplot(y)
```
The result of `autoplot` is a ggplot2 object,
so you can easily customize the plot using ggplot2 functions.
```{r}
autoplot(y, ts.colour = "red") +
ggtitle("Nile river data") +
xlab("Year") + ylab("Annual flow") +
theme_linedraw()
```
### Plotting multivariate time series
Both `plot` and `autoplot` handle multivariate time series data.
We can use the `airquality` built-in dataset to illustrate
(after converting it to time series data).
```{r}
y <- as.ts(airquality[ , 1:4])
```
```{r}
plot(y)
```
```{r}
autoplot(y)
```
## Online materials
R code examples are available in a public Github repository.
> https://github.com/pteetor/StateSpaceRecipes