-
Notifications
You must be signed in to change notification settings - Fork 2
/
CS_09.Rmd
189 lines (154 loc) · 6.39 KB
/
CS_09.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
title: Tracking Hurricanes!
subtitle: Analyze historical storm data from NOAA
week: 9
type: Case Study
reading:
- Overview of the [International Best Track Archive for Climate Stewardship (IBTrACS)](https://www.ncei.noaa.gov/products/international-best-track-archive)
- Performing [Spatial Joins with sf](https://r-spatial.github.io/sf/reference/st_join.html)
tasks:
- Write a .Rmd script to perform the following tasks
- Intersect the storms with US states to quantify how many storms in the database have hit each state.
---
```{r, echo=FALSE, message=FALSE, results='hide', purl=FALSE}
source("functions.R")
source("knitr_header.R")
```
# Reading
```{r reading,results='asis',echo=F}
md_bullet(rmarkdown::metadata$reading)
```
# Tasks
```{r tasks,results='asis',echo=F}
md_bullet(rmarkdown::metadata$tasks)
```
## Libraries & Data
```{r message=F,warning=FALSE}
library(sf)
library(tidyverse)
library(ggmap)
library(spData)
data(world)
data(us_states)
```
## Objective
In this case study you will download storm track data from NOAA, make a summary plot, and quantify how many storms have hit each of the United States. This will require you to use a spatial join (`st_join`).
### Your goal
```{r download_data2, eval=T, echo=T, purl=F, warning=F}
# Download a csv from noaa with storm track information
dataurl="https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/csv/ibtracs.NA.list.v04r01.csv"
storm_data <- read_csv(dataurl)
```
```{r, echo=F, eval=T, purl=F, warning=F}
# function to drop storm points not near main tracks
drop_outliers <- function(x) {
return(x) #return filtered dataset
central_point=st_union(x) %>%
st_centroid()
dists=st_distance(x,central_point) #get distances from central point
x2=filter(x,as.numeric(dists)<100000) #keep only points within 100km of centroid
}
storms <- storm_data |>
mutate(year=year(ISO_TIME)) |>
filter(year>=1950)|>
drop_outliers() |>
mutate_if(is.numeric,
function(x) ifelse(x==-999.0,NA,x)) |> #-999 to NA
mutate(decade=floor(year/10)*10)|> #add decade
st_as_sf(coords=c("LON","LAT"),crs=4326) #convert to sf object
region=st_bbox(storms)
```
Your desired figure looks something like the following:
```{r, echo=F, eval=T, purl=F, warning=F, fig.width=15, fig.height=15}
map1=ggplot() +
geom_sf(data=world,
inherit.aes = F,size=.1,
fill="grey",colour="black")+
facet_wrap(~decade)+
stat_bin2d(data=storms,
aes(y=st_coordinates(storms)[,2],
x=st_coordinates(storms)[,1]),bins=100)+
scale_fill_distiller(palette="YlOrRd",
trans="log",
direction=-1,
breaks = c(1,10,100,1000))+
coord_sf(ylim=region[c(2,4)],
xlim=region[c(1,3)])+
labs(x="",y="")
map1
```
Calculate a table of the five states that have experienced the most storms.
```{r, eval=T, echo=F, purl=F, warning=F, message=F}
library(knitr);library(kableExtra)
states<- st_transform(us_states,st_crs(storms)) %>%
dplyr::select(state=NAME)
storm_states <- st_join(storms, states,
join = st_intersects,
left = F)
storm_states%>%
group_by(state)%>%
summarize(storms=length(unique(NAME)))%>%
st_set_geometry(NULL)%>%
arrange(desc(storms))%>%
slice(1:5)%>%
kable()%>%
kable_styling()
```
<div class="well">
<button data-toggle="collapse" class="btn btn-primary btn-sm round" data-target="#demo1">Show Hints</button>
<div id="demo1" class="collapse">
## Steps
1. Use the code above to download the storm data and create an object called `storm_data`
2. Wrangle the data
* Create a new column with just the year using `year(ISO_TIME)` (`year` is in the lubridate package)
* Filter to storms 1950-present with `filter()`
* Use `mutate_if()` to convert `-999.0` to `NA` in all numeric columns with the following command from the `dplyr` package: `mutate_if(is.numeric,` `function(x) ifelse(x==-999.0,NA,x))`
* Use the following command to add a column for decade: `mutate(decade=(floor(year/10)*10))`
* Convert the data to a sf object with `st_as_sf(coords=c("LON","LAT"),crs=4326)`
* Use `st_bbox()` to identify the bounding box of the storm data and save this as an object called `region`.
3. Make the first plot
* Use `ggplot()` to plot the `world` polygon layer and add the following:
* add `facet_wrap(~decade)` to create a panel for each decade
* add `stat_bin2d(data=storms,` `aes(y=st_coordinates(storms)[,2],` `x=st_coordinates(storms)[,1]),bins=100)`
* use
`scale_fill_distiller(palette="YlOrRd",`
`trans="log",`
`direction=-1,`
`breaks = c(1,10,100,1000))` to set the color ramp
* use `coord_sf(ylim=region[c(2,4)],`
`xlim=region[c(1,3)])` to crop the plot to the region.
4. Calculate table of the five states with most storms.
* use `st_transform` to reproject `us_states` to the reference system of the `storms` object (you can extract a CRS from a sf object with `st_crs(storms)`
* Rename the `NAME` column in the state data to `state` to avoid confusion with storm name using `select(state=NAME)`
* Perform a spatial join between the storm database and the states object with: `storm_states <- st_join(storms, states, `
`join = st_intersects,left = F)`. This will 'add' the state to any storm that was recorded within that state.
* Use `group_by(state)` to group the next step by US state
* use `summarize(storms=length(unique(NAME)))` to count how many unique storms occurred in each state.
* use `arrange(desc(storms))` to sort by the number of storms in each state
* use `slice(1:5)` to keep only the top 5 states
```
</div>
</div>
---
<div class="extraswell">
<button data-toggle="collapse" class="btn btn-link" data-target="#extras">
Extra time? Try this...
</button>
<div id="extras" class="collapse">
Try to replicate the following graphic using the data you transformed above.
```{r, eval=T, message=F, echo=F, purl=F, fig.height=10,fig.width=10}
storm_states_year <- storm_states%>%
group_by(state,year)%>%
summarize(storms=length(unique(NAME)))
ggplot(storm_states_year,
# aes(y=fct_reorder(state, storms),
aes(y=state,
x=year,
fill=storms))+
geom_tile()+
scale_fill_viridis_c(name="# Storms")+
ylab("State")
```
Can you sort the rows (states) in order of storm frequency (instead of alphabetical)?
</div>
</div>