-
Notifications
You must be signed in to change notification settings - Fork 36
/
lab12.Rmd
104 lines (81 loc) · 3.92 KB
/
lab12.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "In-Class Lab 12"
author: "ECON 4223 (Prof. Tyler Ransom, U of Oklahoma)"
date: "March 24, 2022"
output:
pdf_document: default
html_document:
df_print: paged
word_document: default
bibliography: biblio.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', fig.keep = 'none')
```
The purpose of this in-class lab is to use R to practice with instrumental variables estimation. The lab should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.
## For starters
You may need to install the packages `AER`, `flextable` and `modelsummary`. (`AER` may have already been installed when you previously installed `car` and `zoo`.)
Open up a new R script (named `ICL12_XYZ.R`, where `XYZ` are your initials) and add the usual "preamble" to the top:
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(broom)
library(AER)
library(magrittr)
library(modelsummary)
```
### Load the data
We're going to use data on fertility of Botswanian women.
```{r}
df <- as_tibble(fertil2)
```
### Summary statistics
Let's look at summary statistics of our data by using the `modelsummary` package. We can export this to a word document format if we'd like:
```{r, results="show"}
df %>% datasummary_skim(histogram=F,output="myfile.docx")
```
1. What do you think is going on when you see varying numbers of observations across the different variables?
## Determinants of fertility
Suppose we want to see if education causes lower fertility (as can be seen when comparing more- and less-educated countries):
\[
children = \beta_0 + \beta_1 educ + \beta_2 age + \beta_3 age^2 + u
\]
where $children$ is the number of children born to the woman, $educ$ is years of education, and $age$ is age (in years).
2. Interpret the estimates of the regression:
```{r}
est.ols <- lm(children ~ educ + age + I(age^2), data=df)
```
(Note: include `I(age^2)` puts the quadratic term in automatically without us having to use `mutate()` to create a new variable called `age.sq`.)
We can also use `modelsummary` to examine the output. It puts the standard errors of each variable in parentheses under the estimated coefficient.
```{r, results="show"}
modelsummary(est.ols)
```
### Instrumenting for endogenous education
We know that education is endogenous (i.e. people choose the level of education that maximizes their utility). A possible instrument for education is $firsthalf$, which is a dummy equal to 1 if the woman was born in the first half of the calendar year, and 0 otherwise.
Let's create this variable:
```{r}
df %<>% mutate(firsthalf = mnthborn<7)
```
We will assume that $firsthalf$ is uncorrelated with $u$.
3. Check that $firsthalf$ is correlated with $educ$ by running a regression. (I will suppress the code, since it should be old hat) Call the output `est.iv1`.
```{r include=FALSE, message=FALSE, warning=FALSE, paged.print=FALSE}
est.iv1 <- lm(educ ~ firsthalf, data=df)
modelsummary(est.iv1)
```
### IV estimation
Now let's do the IV regression:
```{r}
est.iv <- ivreg(children ~ educ + age + I(age^2) | firsthalf + age + I(age^2), data=df)
```
The variables on the right hand side of the `|` are the instruments (including the $x$'s that we assume to be exogenous, like $age$). The endogenous $x$ is the first one after the `~`.
Now we can compare the output for each of the models:
```{r, results="show"}
modelsummary(list(est.ols,est.iv1,est.iv))
```
We can also save the output of `modelsummary()` to an image, a text file or something else:
```{r}
modelsummary(list(est.ols,est.iv1,est.iv), output="results.jpg")
modelsummary(list(est.ols,est.iv1,est.iv), output="results.docx")
```
4. Comment on the IV estimates. Do they make sense? Discuss why the IV standard error is so much larger than the OLS standard error.