-
Notifications
You must be signed in to change notification settings - Fork 36
/
lab10.Rmd
151 lines (122 loc) · 5.96 KB
/
lab10.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "In-Class Lab 10"
author: "ECON 4223 (Prof. Tyler Ransom, U of Oklahoma)"
date: "March 1, 2022"
output:
html_document:
df_print: paged
pdf_document: default
word_document: default
bibliography: biblio.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', fig.keep = 'none')
```
The purpose of this in-class lab is to use R to practice estimating time series regression models with standard errors corrected for heteroskedasticity and serial correlation (HAC). To get credit, upload your .R script to the appropriate place on Canvas.
## For starters
First, install the `pdfetch`, `tsibble`, `sandwich`, and `COVID19` packages. `pdfetch` stands for "Public Data Fetch" and is a slick way of downloading statistics on stock prices, GDP, inflation, unemployment, etc. `tsibble` is a package useful for working with time series data. It is the "tibble" for time series data. `sandwich` is helpful for obtaining HAC standard errors. `COVID19` pulls up-to-date data on COVID-19 cases, deaths, etc.
Open up a new R script (named `ICL10_XYZ.R`, where `XYZ` are your initials) and add the usual "preamble" to the top:
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(modelsummary)
library(broom)
library(car)
library(magrittr)
library(lmtest)
# new packages installed today:
library(pdfetch)
library(sandwich)
library(tsibble)
library(COVID19)
```
### Load the data
We're going to use data on US COVID-19 cases, deaths and other information.
```{r}
df <- covid19(c("US"))
df.ts <- as_tsibble(df, key=id, index=date)
```
Now it will be easy to include lags of various variables into our regression models.
## Plot time series data
Let's have a look at the data on **new** daily cases and deaths:
```{r}
df.ts %<>% mutate(new_deaths = difference(deaths), # data comes in cumulative format
new_tests = difference(tests), # "difference" converts it to
new_cases = difference(confirmed)) # "new cases" etc.
# plots
ggplot(df.ts, aes(date, new_cases)) + geom_line()
ggplot(df.ts, aes(date, new_deaths)) + geom_line()
# plots with 7-day rolling average
ggplot(df.ts, aes(date, new_cases)) + geom_line(aes(y=rollmean(new_cases, 7, na.pad=TRUE)))
ggplot(df.ts, aes(date, new_deaths)) + geom_line(aes(y=rollmean(new_deaths, 7, na.pad=TRUE)))
```
## Determinants of US COVID-19 cases
Now let's estimate the following regression model:
\[
\log(new\_cases_{t}) = \beta_0 + \beta_1 gath_t + \beta_2 gath_{t-7} + \beta_3 gath_{t-14} + \beta_4 \log(new\_cases_{t-7}) + u_t
\]
where $new\_cases$ is the number of new COVID cases, and $gath$ is a variable taking on values 0--4 representing severity of gatherings restricitons.
```{r}
df.ts %<>% mutate(log.new.cases = log(new_cases),
log.new.cases = replace(log.new.cases,new_cases==0,NA_real_)) # since log(0) = -Inf
est <- lm(log.new.cases ~ gatherings_restrictions + lag(gatherings_restrictions,7) +
lag(gatherings_restrictions,14) + lag(log.new.cases,7), data=df.ts)
```
1. Are any of these variables significant determinants of new COVID cases? If so, which ones?
## Correcting for Serial Correlation
Now let's compute HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors. To do so, we'll use the `vcov` option in the `modelsummary()` function along with the `NeweyWest` function from the `sandwich` package.
```{r}
modelsummary(est) # re-display baseline results
modelsummary(est, vcov=sandwich::NeweyWest(est))
```
or, putting them side-by-side:
```{r}
modelsummary(list(est,est),
vcov=list("iid",sandwich::NeweyWest(est))
)
```
2. How does your interpretation of the the effect of gathering restrictions change after using the Newey-West standard errors?
## Oklahoma COVID cases
```{r}
df.ok <- covid19(c("US"),level=2) %>%
filter(administrative_area_level_2=="Oklahoma")
df.ts.ok <- as_tsibble(df.ok, key=id, index=date)
df.ts.ok %<>% mutate(new_deaths = difference(deaths),
new_tests = difference(tests),
new_cases = difference(confirmed))
ggplot(df.ts.ok, aes(date, new_cases)) + geom_line(aes(y=rollmean(new_cases, 7, na.pad=TRUE)))
ggplot(df.ts.ok, aes(date, new_deaths)) + geom_line(aes(y=rollmean(new_deaths, 7, na.pad=TRUE)))
df.ts.ok %<>% mutate(log.new.cases = log(new_cases),
log.new.cases = replace(log.new.cases,new_cases==0,NA_real_)) # since log(0) = -Inf
est.ok <- lm(log.new.cases ~ gatherings_restrictions + lag(gatherings_restrictions,7) +
lag(gatherings_restrictions,14) + lag(log.new.cases,7), data=df.ts.ok)
```
With HAC standard errors:
```{r}
modelsummary(list(est.ok,est.ok),
vcov=list("iid",sandwich::NeweyWest(est.ok)))
```
## Traditional macroeconomic model of interest rates and inflation
### Load the data
We can also use data on US macroeconomic indicators. The `wooldridge` data set is called `intdef`.
```{r}
df.ts <- as_tsibble(intdef, key=NULL, index=year)
```
Now it will be easy to include lags of various variables into our regression models.
## Plot time series data
Let's have a look at the inflation rate for the US over the period 1948--2003:
```{r}
ggplot(df.ts, aes(year, inf)) + geom_line()
```
## Determinants of the interest rate
Now let's estimate the following regression model:
\[
i3_{t} = \beta_0 + \beta_1 inf_t + \beta_2 inf_{t-1} + \beta_3 inf_{t-2} + \beta_4 def_{t} + u_t
\]
where $i3$ is the 3-month Treasury Bill interest rate, $inf$ is the inflation rate (as measured by the CPI), and $def$ is the budget deficit as a percentage of GDP.
```{r}
est <- lm(i3 ~ inf + lag(inf,1) + lag(inf,2) + def, data=df.ts)
modelsummary(list(est,est), vcov=list("iid",sandwich::NeweyWest(est)))
```
[^1]: You may need to install the `sandwich` package.