-
Notifications
You must be signed in to change notification settings - Fork 1
/
Analyzing Texas Vaccine Supply Workbook.Rmd
163 lines (116 loc) · 6.43 KB
/
Analyzing Texas Vaccine Supply Workbook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
title: "Analyzing Vaccine Supply in Texas"
author:
- name: Matt Worthington
url: https://example.com/norajones
affiliation: The LBJ School of Public Affairs
affiliation_url: https://example.com/spacelysprokets
date: "`r Sys.Date()`"
output:
html_document:
theme: paper
toc: true
toc_float:
collapsed: false
smooth_scroll: false
word_document:
toc: yes
pdf_document: default
distill::distill_article:
code_folding: yes
toc: yes
toc_float: yes
description: |
A living analysis of vaccine supply in Texas.
---
# Analysis Setup
Before we start building out our reproducible analysis, let's go ahead and make sure any R packages are loaded and installed properly. The code to install necessary packages and load them can be viewed by clicking on the "Show Code" arrow.
```{r setup, include = TRUE, echo=TRUE, warning = FALSE, message = FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# In case these aren't installed, uncomment this and run it.
# install.packages("janitor", "tidyverse", "gt")
# devtools::install_github("utexas-lbjp-data/lbjdata")
library(janitor) # Package with useful + convenient data cleaning functions
library(tidyverse) # Core Set of R Data Science Tools (dplyr, ggplot2, tidyr, readr, etc.)
```
# Analysis
## Import Our Vaccine Provider and Supply Data
This data comes from the Texas Department of State Health Services and contains the list of vaccine providers across the state of Texas, which can be found on [this page](https://dshs.texas.gov/coronavirus/additionaldata/). They use it for their own interactive mapping application of vaccine provider sites.[^1] Each provider is assigned a type and has a report of how much vaccine supply they have for each of the three approved vaccines. We'll use the `read_csv()` function to read in the data straight from the DSHS website. This will help make sure our analysis is "living", meaning any chart we make will update whenever the feed from DSHS gets updated, and "reproducible", meaning anyone who takes this R Markdown document can run it in their RStudio IDE and get the exact same thing you did.
[^1]: The link for this map is google.com
<aside>
The `read_csv()` comes from the [`readr`](https://readr.tidyverse.org) package that was loaded when we ran `library(tidyverse)` in the setup chunk above (lines 18:30 in the RMarkdown document).
</aside>
```{r import-data}
provider_data_raw <- readr::read_csv("") %>%
janitor::clean_names() # This function makes column headers machine readable
dplyr::glimpse(provider_data_raw) # glimpse() lets you preview a data object
```
## Transform our Vaccine Data
Now that we've imported it and created a data object called `provider_data_raw`, we can call on that object and use a handful of functions from the [`dplyr`](https://dplyr.tidyverse.org) package to transform our data into the shape we want for visualizing.
**The question we'll trying to answer is simple**: "Among all providers, how much of each vaccine exists in Texas?"
```{r transform-data}
supply_data <- provider_data_raw %>%
dplyr::mutate(state = "Texas") %>% # This adds a column where every entry is the word "Texas"
dplyr::group_by(state) %>% # This groups any future functions I write by the state column I created
dplyr::summarise( # This begins the summarise() function
Pfizer = sum(pfizer_available), # Creates a column with all pfizer supply
Moderna = sum(moderna_available), # Creates a column with all pfizer supply
JandJ = sum(jj_available) # Creates a column with all pfizer supply
) %>% # This ends the summarise() function
tidyr::pivot_longer(cols = c(), # reshapes our data from wide to long
names_to = "vaccine_type",
values_to = "supply")
dplyr::glimpse(supply_data) # glimpse() lets you preview a data object
```
## Visualize our Vaccine Data
Now that our data's in shape, we'll make a simple bar chart to show the distribution of vaccine supply in Texas.
```{r visualize-data}
supply_chart <- supply_data %>% # Call on the data
ggplot2::ggplot() + # Draw A Chart Canvas
ggplot2::aes(x = vaccine_type, y = supply, fill = vaccine_type) + # Define How Data Gets Mapped
ggplot2::geom_col() + # Translate into a bar chart format
# ggplot2:: + # Add a basic ggplot2 theme
ggplot2::theme(legend.position = "", # Hide the legend
plot.title = element_text(face = "bold")) + # Make the title bold
ggplot2::labs(title = "Texas Vaccine Supply, by Type", # Add a title
subtitle = "Shown are the current supply of vaccines available in Texas", # Add a subtitle
caption = "Source: Texas Department of State Health Services", # Add a caption
x = "Vaccine Type", # Add an X axis title
y = "Current Supply in Texas") # Add a Y axis title
supply_chart
```
## Export our Transformed Dataset and Visualization
Now that we've done all of this, we want to share our data and our chart, so we'll use a couple of functions to save this each time we run it.
```{r export-data}
## Export Our Data to a CSV File For Sharing
readr::write_csv(supply_data, "clean_supply_data.csv")
## Export Our Chart to a PNG File For Sharing
ggplot2::ggsave("vaccine_supply_chart.png", supply_chart, device = "png", dpi=300, width = 10, height = 6)
```
# Bonus
## Regression Example
### Regresssion Table
```{r regression-table}
# install.packages("modelsummary") # Uncomment this if you have not installed modelsummary
library(modelsummary) # Load the {modelsummary package}
model_1 <- lm(formula=total_shipped ~ type, # Run a regression using base R
data=provider_data_raw)
modelsummary::modelsummary(model_1, stars = TRUE) # Show regression results in a table
```
### Regression Chart
```{r regression-chart, eval=FALSE}
modelsummary::modelplot(model_1) + # Draw a chart using modelsummary package
# ggplot2:: + # Add ggplot2 dark theme
ggplot2::theme(legend.position = "", # Hide legend
plot.title = element_text(face = "")) + # Make title bold
ggplot2::labs(title = "", # Add a title
subtitle = "", # Add a subtitle
caption = "Source: Texas Department of State Health Services", # Add a caption note
x = "", # Add a title for the X Axis
y = "") # Add a title for the Y Axis
```
### Regression Equation
```{r regression-equation}
# install.packages("equatiomatic")
equatiomatic::extract_eq(model_1) # Extract LaTeX equation with equatiomatic package
```