forked from geanders/RProgrammingForResearch
-
Notifications
You must be signed in to change notification settings - Fork 0
/
09a-reportingresults2.Rmd
950 lines (740 loc) · 32 KB
/
09a-reportingresults2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
# Reporting results #2
[Download](https://github.com/geanders/RProgrammingForResearch/raw/master/slides/CourseNotes_Week9a.pdf) a pdf of the lecture slides covering this topic.
```{r echo = FALSE, message = FALSE, warning = FALSE}
knitr::opts_knit$set(error = TRUE)
library(viridis)
```
## Course lectures
In 2019, we cancelled our in-course meeting for bad weather.
There are videos of the lecture available, covering ggplot2 extensions and mapping in ggplot:
<iframe width="560" height="315" src="https://www.youtube.com/embed/aUwsIHii_UQ" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/laZpOPOsN3s" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/RheCnDJ_SgU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
Example commit.
## Example data
This week, we'll be using some example data from NOAA's Storm Events Database.
This data lists major weather-related storm events during 2017.
For each event, it includes information like the start and end dates, where it
happened, associated deaths, injuries, and property damage, and some other
characteristics.
See the in-course
exercises for this week for more on getting and cleaning this data. As part of the in-course
exercise, you'll be making the following plot and saving it as the object `storm_plot`:
```{r echo = FALSE, message = FALSE, warning = FALSE, fig.width = 6, fig.height = 3}
library(readr)
library(dplyr)
library(lubridate)
library(stringr)
library(tidyr)
library(ggplot2)
storms_2017 <- read_csv("data/StormEvents_details-ftp_v1.0_d2017_c20180918.csv",
progress = FALSE)
storms_2017 <- storms_2017 %>%
select(BEGIN_DATE_TIME, END_DATE_TIME,
EPISODE_ID:STATE_FIPS, EVENT_TYPE:CZ_NAME, SOURCE,
BEGIN_LAT:END_LON) %>%
mutate(BEGIN_DATE_TIME = dmy_hms(BEGIN_DATE_TIME),
END_DATE_TIME = dmy_hms(END_DATE_TIME),
STATE = str_to_title(STATE),
CZ_NAME = str_to_title(CZ_NAME)) %>%
filter(CZ_TYPE == "C") %>%
select(-CZ_TYPE) %>%
mutate(STATE_FIPS = str_pad(STATE_FIPS, 2, side = "left", pad = "0"),
CZ_FIPS = str_pad(CZ_FIPS, 3, side = "left", pad = "0")) %>%
unite(fips, STATE_FIPS, CZ_FIPS, sep = "") %>%
rename_all(funs(str_to_lower(.)))
data("state")
us_state_info <- data_frame(state = state.name,
area = state.area,
region = state.region)
state_storms <- storms_2017 %>%
group_by(state) %>%
count() %>%
ungroup() %>%
right_join(us_state_info, by = "state")
storm_plot <- ggplot(state_storms, aes(x = area, y = n)) +
geom_point(aes(color = region)) +
labs(x = "Land area (square miles)",
y = "# of storm events in 2017")
storm_plot
```
We'll be using the data and this plot in the next sections.
## `ggplot2` extras and extensions
### `scales` package
The `scales` package gives you a few more options for labeling with
your `ggplot` scales. For example, if you wanted to change the notation
for the axes in the plot of state area versus number of storm events,
you could use the `scales` package to add commas to the numeric axis values.
For the rest of these slides, I've saved the `ggplot` object with out plot to
the object named `storm_plot`, so we don't have to repeat that code every time.
```{r warning = FALSE, message = FALSE, fig.width = 6, fig.height = 3}
library(scales)
storm_plot +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
```
The `scales` package also includes labeling functions for:
- dollars (`labels = dollar`)
- percent (`labels = percent`)
### `ggplot2` extensions
The `ggplot2` framework is set up so that others can create packages that "extend"
the system, creating functions that can be added on as layers to a ggplot object.
Some of the types of extensions available include:
- More themes
- Useful additions (things that you may be able to do without the package, but that
the package makes easier)
- Tools for plotting different types of data
There is a gallery with links to `ggplot2` extensions at https://www.ggplot2-exts.org.
This list may not be exhaustive---there may be other extensions on CRAN or on GitHub
that the package maintainer did not submit for this gallery.
### More `ggplot2` themes
You have already played around a lot with using `ggplot` themes to change how your
graphs look.
Several people have created packages with additional themes:
- `ggthemes`
- `ggthemr`
- `ggtech`
- `ggsci`
```{r fig.width = 12, fig.height = 7, message = FALSE, warning = FALSE}
library(ggthemes)
library(gridExtra)
a <- storm_plot +
theme_fivethirtyeight() +
ggtitle("Five Thirty Eight")
b <- storm_plot +
theme_economist() +
ggtitle("Economist")
c <- storm_plot +
theme_excel() +
ggtitle("Excel")
d <- storm_plot +
theme_few() +
ggtitle("Stephen Few")
grid.arrange(a, b, c, d, ncol = 2)
```
### Other useful `ggplot2` extensions
Other `ggplot2` extensions do things you might have been able to figure out how
to do without the extension, but the extension makes it much easier to do.
These tasks include:
- Highlighting interesting points
- "Repelling" text labels
- Arranging plots
#### Repelling / highlighting with text labels
The first is repelling text labels.
When you add labels to points on a plot, they often overlap:
```{r fig.width = 10, fig.height = 6}
storm_plot + facet_wrap(~ region) +
geom_label(aes(label = state))
```
The `ggrepel` package helps make sure that these labels don't overlap:
```{r fig.width = 10, fig.height = 6}
library(ggrepel)
storm_plot + facet_wrap(~ region) +
geom_label_repel(aes(label = state))
```
It may be too much to label every point. Instead, you may just want to
highlight notable point.
You can use the `gghighlight` package to do that.
```{r warning = FALSE, message = FALSE, fig.width = 8, fig.height = 3}
library(gghighlight)
storm_plot + facet_wrap(~ region) +
gghighlight(area > 150000 | n > 1500, label_key = state)
```
The `gghighlight` package also works for things like histograms. For example, you could
create a dataset with the count by day-of-year of certain types of events:
```{r warning = FALSE, message = FALSE}
storms_by_month <- storms_2017 %>%
filter(event_type %in% c("Flood", "Flash Flood", "Heavy Rain")) %>%
mutate(month = month(begin_date_time, label = TRUE)) %>%
group_by(month, event_type) %>%
count() %>%
ungroup()
storms_by_month %>%
slice(1:4)
```
```{r fig.width = 7, fig.height = 3}
ggplot(storms_by_month, aes(x = month, y = n, group = event_type)) +
geom_bar(stat = "identity") +
labs(x = "Month", y = "# of events") +
gghighlight(max(n) > 400, label_key = event_type) +
facet_wrap(~ event_type, ncol = 1)
```
#### Arranging plots
You may have multiple related plots you want to have as multiple panels of a
single figure.
There are a few packages that help with this. One very good one is `patchwork`.
You need to install this from GitHub:
```{r eval = FALSE}
devtools::install_github("thomasp85/patchwork")
```
Find out more: https://github.com/thomasp85/patchwork#patchwork
Say we want to plot seasonal patterns in events in the five counties with
the highest number of events in 2017. We can use `dplyr` to figure out these
counties:
```{r}
top_counties <- storms_2017 %>%
group_by(fips, state, cz_name) %>%
count() %>%
ungroup() %>%
top_n(5, wt = n)
```
Then create a plot with the time patterns:
```{r}
library(forcats)
top_counties_month <- storms_2017 %>%
semi_join(top_counties, by = "fips") %>%
mutate(month = month(begin_date_time),
county = paste(cz_name, " County, ", state, sep = "")) %>%
count(county, month) %>%
ggplot(aes(x = month, y = n)) +
geom_bar(stat = "identity") +
facet_wrap(~ fct_reorder(county, n, .fun = sum, .desc = TRUE), nrow = 2) +
scale_x_continuous(name = "", breaks = c(1, 4, 7, 10),
labels = c("Jan", "Apr", "Jul", "Oct")) +
scale_y_continuous(name = "Frequency", breaks = c(0, 20))
```
Here's this plot:
```{r fig.width = 6, fig.height = 3}
top_counties_month
```
Now that you have two ggplot objects (`storm_plot` and `top_counties_month`), you can use
`patchwork` to put them together:
```{r fig.height = 5, fig.width = 8, message = FALSE, warning = FALSE, fig.align = "center"}
library(patchwork)
storm_plot +
top_counties_month +
plot_layout(ncol = 1, heights = c(2, 1))
```
A slightly fancier version:
```{r fig.height = 5, fig.width = 8, message = FALSE, warning = FALSE, fig.align = "center"}
(storm_plot + theme(legend.position = "top") +
gghighlight(n > 1500 | area > 200000,
label_key = state)) +
top_counties_month +
plot_layout(ncol = 1, heights = c(2, 1))
```
Other packages for arranging `ggplot` objects include:
- `gridExtra`
- `cowplot`
## Simple features
### Introduction to simple features
`sf` objects: "Simple features"
- R framework that is in active development
- There will likely be changes in the near future
- Plays very well with tidyverse functions, including `dplyr` and `ggplot2` tools
```{r message = FALSE, warning = FALSE}
library(sf)
```
To show simple features, we'll pull in the Colorado county boundaries from the U.S. Census.
To do this, we'll use the `tigris` package, which accesses the U.S. Census API. It allows you
to pull geographic data for U.S. counties, states, tracts, voting districts, roads, rails,
and a number of other geographies.
To learn more about the `tigris` package, check out this article:
https://journal.r-project.org/archive/2016/RJ-2016-043/index.html
With `tigris`, you can read in data for county boundaries using the `counties`
function.
We'll use the option
`class = "sf"` to read these spatial dataframes in as `sf` objects.
```{r message = FALSE, warning = FALSE, results = "hide"}
library(tigris)
co_counties <- counties(state = "CO", cb = TRUE, class = "sf")
```
```{r}
class(co_counties)
```
You can think of an `sf` object as a dataframe, but with one special column called `geometry`.
```{r}
co_counties %>%
slice(1:3)
```
The `geometry` column has a special class (sfc):
```{r}
class(co_counties$geometry)
```
You'll notice there's some extra stuff up at the top, too:
- **Geometry type**: Points, polygons, lines
- **Dimension**: Often two-dimensional, but can go up to four (if you have, for example, time
for each measurement and some measure of measurement error / uncertainty)
- **Bounding box (bbox)**: The x- and y-range of the data included
- **EPSG**: The EPSG Geodetic Parameter Dataset code for the Coordinate Reference Systems
- **Projection (proj4string)**: How the data is currently projected, includes projection ("+proj") and datum ("+datum")
You can pull some of this information out of the `geometry` column. For example, you can pull
out the coordinates of the bounding box:
```{r}
st_bbox(co_counties$geometry) # For all counties
st_bbox(co_counties$geometry[1]) # Just for first county
```
You can add `sf` objects to `ggplot` objects using `geom_sf`:
```{r fig.width = 6, fig.height = 4, fig.align = "center"}
library(ggplot2)
ggplot() +
geom_sf(data = co_counties)
```
You can map one of the columns in the `sf` object to the fill aesthetic to make a **choropleth**:
```{r fig.width = 6, fig.height = 4, fig.align = "center"}
ggplot() +
geom_sf(data = co_counties, aes(fill = ALAND))
```
You can use all your usual `ggplot` tricks with this:
```{r fig.width = 6, fig.height = 4, fig.align = "center"}
library(viridis)
ggplot() +
geom_sf(data = co_counties, aes(fill = ALAND)) +
scale_fill_viridis(name = "Land area", label = comma) +
ggtitle("Land areas of Colorado counties") +
theme_dark()
```
Because simple features are a special type of dataframe, you can also use a lot of `dplyr` tricks.
For example, you could pull out just Larimer County, CO:
```{r}
larimer <- co_counties %>%
filter(NAME == "Larimer")
larimer
```
Note: You may need the development version of `ggplot2` for the next piece of code to work (`devtools::install_github("tidyverse/ggplot2")`).
```{r warning = FALSE, message = FALSE, fig.width = 6, fig.height = 4, fig.align = "center"}
ggplot() +
geom_sf(data = co_counties, color = "lightgray") +
geom_sf(data = larimer, fill = "darkcyan") +
geom_sf_text(data = larimer, aes(label = NAME), color = "white") +
theme_dark() + labs(x = "", y = "")
```
This operability with tidyverse functions means that you should now be able to figure out how to
create a map of the number of events listed in the NOAA Storm Events database (of those listed by
county) for each county in Colorado (for the code, see the in-course exercise):
```{r echo = FALSE, warning = FALSE, message = FALSE, fig.width = 6, fig.height = 4, fig.align = "center"}
co_event_counts <- storms_2017 %>%
filter(state == "Colorado") %>%
group_by(fips) %>%
count() %>%
ungroup()
co_county_events <- co_counties %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
full_join(co_event_counts, by = "fips") %>%
mutate(n = ifelse(!is.na(n), n, 0))
ggplot() +
geom_sf(data = co_county_events, aes(fill = n)) +
scale_fill_viridis(name = "Number of events\n(2017)")
```
### State boundaries
The `tigris` package allows you to pull state boundaries, as well, but on some computers
mapping these seems to take a really long time.
Instead, for now I recommend that you pull the state boundaries using base R's `maps`
package and convert that to an `sf` object:
```{r}
library(maps)
us_states <- map("state", plot = FALSE, fill = TRUE) %>%
st_as_sf()
```
You can see these borders include an `ID` column that you can use to join by state:
```{r}
head(us_states)
```
As with other `sf` objects, you can map these state boundaries using `ggplot`:
```{r fig.width = 5, fig.align = "center"}
ggplot() +
geom_sf(data = us_states, color = "white",
fill = "darkcyan", alpha = 0.5)
```
As a note, you can use `xlim` and `ylim` with these plots, but remember that the x-axis is longitude in degrees West, which are negative:
```{r fig.height = 3, fig.align = "center"}
ggplot() +
geom_sf(data = us_states, color = "white",
fill = "darkcyan", alpha = 0.5) +
xlim(c(-90, -75)) + ylim(c(24, 38))
```
### Creating an `sf` object
You can create an `sf` object from a regular dataframe.
You just need to specify:
1. The coordinate information (which columns are longitudes and latitudes)
2. The Coordinate Reference System (CRS) (how to translate your coordinates to places in the world)
For the CRS, if you are mapping the new `sf` object with other, existing `sf` objects,
make sure that you use the same CRS for all `sf` objects.
Spatial objects can have different Coordinate Reference Systems (CRSs). CRSs can be *geographic* (e.g., WGS84, for longitude-latitude data) or *projected* (e.g., UTM, NADS83).
There is a website that lists projection strings and can be useful in setting projection information or re-projecting data: http://www.spatialreference.org
Here is an excellent resource on projections and maps in R from Melanie Frazier: https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf
Let's look at floods in Colorado. First, clean up the data:
```{r}
co_floods <- storms_2017 %>%
filter(state == "Colorado" &
event_type %in% c("Flood", "Flash Flood")) %>%
select(begin_date_time, event_id, begin_lat:end_lon) %>%
gather(key = "key", value = "value",
-begin_date_time, -event_id) %>%
separate(key, c("time", "key")) %>%
spread(key = key, value = value)
```
There are now two rows per event, one with the starting location and one with the ending location:
```{r}
co_floods %>%
slice(1:5)
```
Change to an `sf` object by saying which columns are the coordinates and setting a CRS:
```{r}
co_floods <- st_as_sf(co_floods, coords = c("lon", "lat")) %>%
st_set_crs(4269)
co_floods %>% slice(1:3)
```
Now you can map the data:
```{r fig.width = 6}
ggplot() +
geom_sf(data = co_counties, color = "lightgray") +
geom_sf(data = co_floods, aes(color = month(begin_date_time),
shape = time)) +
scale_color_viridis(name = "Month")
```
If you want to show lines instead of points, group by the appropriate ID and then summarize
within each event to get a line:
```{r}
co_floods <- co_floods %>%
group_by(event_id) %>%
summarize(month = month(first(begin_date_time)),
do_union = FALSE) %>%
st_cast("LINESTRING")
```
```{r}
head(co_floods)
```
Now this data will map as lines:
```{r fig.width = 6, fig.align = "center"}
ggplot() +
geom_sf(data = co_counties, color = "lightgray") +
geom_sf(data = co_floods,
aes(color = factor(month), fill = factor(month))) +
scale_fill_viridis(name = "Month", discrete = TRUE) +
scale_color_viridis(name = "Month", discrete = TRUE)
```
### Reading in from GIS files
You can also create `sf` objects by reading in data from files you would normally use for GIS.
For example, you can read in an `sf` object from a shapefile, which is a format often used for GIS in which a collection of several files jointly store geographic data. The files making up a shapefile can include:
- ".shp": The coordinates defining the shape of each geographic object. For a point, this
would be a single coordinate (e.g., latitude and longitude). For lines and polygons, there
will be multiple coordinates per geographic object.
- ".prf": Information on the projection of the data (how to get from the coordinates to
a place in the world).
- ".dbf": Data that goes along with each geographical object. For example, earlier we looked
at data on counties, and one thing measured for each county was its land area. Characteristics
like that would be included in the ".dbf" file in a shapefile.
Often, with geographic data, you will be given the option to downloaded a compressed
file (e.g., a zipped file). When you unzip the folder, it will include a number of
files in these types of formats (".shp", "prf", ".dbf", etc.).
Sometimes, that single folder will include multiple files from each extension. For
example, it might have several files that end with ".shp". In this case, you have
multiple **layers** of geographic information you can read in.
We've been looking at data on storms from NOAA for 2017. As an example, let's try to
pair that data up with some from the National Hurricane Center for the same year.
The National Hurricane Center allows you to access a variety of GIS data through
the webpage https://www.nhc.noaa.gov/gis/?text.
Let's pull some data on Hurricane Harvey in 2017 and map it with information from
the NOAA Storm Events database.
On https://www.nhc.noaa.gov/gis/?text, go to the section called "Preliminary Best Track".
Select the year 2017. Then select "Hurricane Harvey" and download "al092017_best_track.zip".
Depending on your computer, you may then need to unzip this file (many computers will
unzip it automatically). Base R has a function called `unzip` that can help with this.
You'll then have a folder with a number of different files in it. Move this folder
somewhere that is convenient for the working directory you use for class. For example,
I moved it into the "data" subdirectory of the working directory I use for the class.
You can use `list.files` to see all the files in this unzipped folder:
```{r}
list.files("data/al092017_best_track/")
```
You can use `st_layers` to find out the available layers in a shapefile directory:
```{r}
st_layers("data/al092017_best_track/")
```
Once you know which layer you want, you can use `read_sf` to read it in as an `sf` object:
```{r}
harvey_track <- read_sf("data/al092017_best_track/",
layer = "al092017_lin")
head(harvey_track)
```
```{r fig.width = 6, fig.align = "center"}
ggplot() +
geom_sf(data = filter(us_states, ID %in% c("texas", "louisiana"))) +
geom_sf(data = harvey_track, aes(color = STORMTYPE)) +
xlim(c(-107, -89)) + ylim(c(25, 37))
```
You can read in other layers:
```{r}
harvey_windswath <- read_sf("data/al092017_best_track/",
layer = "al092017_windswath")
head(harvey_windswath)
```
```{r fig.width = 6, fig.align = "center"}
ggplot() +
geom_sf(data = filter(us_states, ID %in% c("texas", "louisiana"))) +
geom_sf(data = harvey_windswath,
aes(fill = factor(RADII)), alpha = 0.2) +
xlim(c(-107, -89)) + ylim(c(25, 37)) +
scale_fill_viridis(name = "Wind (kts)", discrete = TRUE,
option = "B", begin = 0.6, direction = -1)
```
The `read_sf` function is very powerful and can read in data from lots of different
formats.
See Section 2 of the `sf` manual (https://cran.r-project.org/web/packages/sf/vignettes/sf2.html)
for more on this function.
You can find (much, much) more on working with spatial data in R online:
- **R Spatial**: http://rspatial.org/index.html
- **Geocomputation with R**: https://geocompr.robinlovelace.net
## In-course exercise
### Getting and cleaning the example data
This week, we'll be using some example data from NOAA's Storm Events Database.
This data lists major weather-related storm events during 2017.
For each event, it includes information like the start and end dates, where it
happened, associated deaths, injuries, and property damage, and some other
characteristics.
Each row is a separate **event**. However, often several events are grouped together
within the same **episode**.
Some of the event types are listed by their county ID (FIPS code) ("C"), but some are listed
by a forecast zone ID ("Z"). Which ID is used is given in the column `CZ_TYPE`.
- Go to https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/ and download
the bulk storm details data for 2017, in the file that starts
"StormEvents_details" and includes "d2017".
- Move this into a good directory for your current working directory and read it in
using `read_csv` from the `readr` package.
- Limit the dataframe to: the beginning and ending dates and times,
the episode ID, the event ID, the state name and
FIPS, the "CZ" name, type, and FIPS, the event type, the source, and the
begining latitude and longitude and ending latitude and longitude
- Convert the beginning and ending dates to a "date-time" class (there should
be one column for the beginning date-time and one for the ending date-time)
- Change state and county names to title case (e.g., "New Jersey" instead of "NEW JERSEY")
- Limit to the events listed by county FIPS (`CZ_TYPE` of "C") and then remove the
`CZ_TYPE` column
- Pad the state and county FIPS with a "0" at the beginning (hint: there's
a function in `stringr` to do this) and then unite the two columns to make
one `fips` column with the 5-digit county FIPS code
- Change all the column names to lower case (you may want to try the `rename_all`
function for this)
- There is data that comes with R on U.S. states (`data("state")`). Use that to create a dataframe
with the state name, area, and region
- Create a dataframe with the number of events per state in 2017. Merge in the state information
dataframe you just created. Remove any states that are not in the state information
dataframe
- Create the following plot:
```{r echo = FALSE, fig.width = 6, fig.height = 3}
library(ggplot2)
storm_plot <- ggplot(state_storms, aes(x = area, y = n)) +
geom_point(aes(color = region)) +
labs(x = "Land area (square miles)",
y = "# of storm events in 2017")
storm_plot
```
#### Example R code
Read in the data using `read_csv`. Here's the code I used. Yours might be a bit different,
depending on the current name of the file and where you moved it.
```{r}
library(readr)
library(dplyr)
storms_2017 <- read_csv("data/StormEvents_details-ftp_v1.0_d2017_c20180918.csv")
```
Here's what the first few columns and rows should look like:
```{r}
storms_2017 %>%
select(1:3) %>%
slice(1:3)
```
Once you've read the data in, here's the code that I used to clean the data:
```{r}
library(lubridate)
library(stringr)
library(tidyr)
storms_2017 <- storms_2017 %>%
select(BEGIN_DATE_TIME, END_DATE_TIME,
EPISODE_ID:STATE_FIPS, EVENT_TYPE:CZ_NAME, SOURCE,
BEGIN_LAT:END_LON) %>%
mutate(BEGIN_DATE_TIME = dmy_hms(BEGIN_DATE_TIME),
END_DATE_TIME = dmy_hms(END_DATE_TIME),
STATE = str_to_title(STATE),
CZ_NAME = str_to_title(CZ_NAME)) %>%
filter(CZ_TYPE == "C") %>%
select(-CZ_TYPE) %>%
mutate(STATE_FIPS = str_pad(STATE_FIPS, 2, side = "left", pad = "0"),
CZ_FIPS = str_pad(CZ_FIPS, 3, side = "left", pad = "0")) %>%
unite(fips, STATE_FIPS, CZ_FIPS, sep = "") %>%
rename_all(funs(str_to_lower(.)))
```
Here's what the data looks like now:
```{r}
storms_2017 %>%
slice(1:3)
```
There is data that comes with R on U.S. states (`data("state")`). Use that to create a dataframe
with the state name, area, and region:
```{r}
data("state")
us_state_info <- data_frame(state = state.name,
area = state.area,
region = state.region)
```
Create a dataframe with the number of events per state in 2017. Merge in the state information
dataframe you just created. Remove any states that are not in the state information
dataframe:
```{r}
state_storms <- storms_2017 %>%
group_by(state) %>%
count() %>%
ungroup() %>%
right_join(us_state_info, by = "state")
```
To create the plot:
Ultimately, in this group exercise, you will create a plot of state land area versus the number
of storm events in the state:
```{r fig.width = 6, fig.height = 3}
library(ggplot2)
storm_plot <- ggplot(state_storms, aes(x = area, y = n)) +
geom_point(aes(color = region)) +
labs(x = "Land area (square miles)",
y = "# of storm events in 2017")
storm_plot
```
### Trying out `ggplot2` extensions
- Go back through the notes so far for this week. Pick your favorite plot that's been shown
so far and recreate it. All the code is in the notes, but you'll need to work through it
closely to make sure that you understand how to add code from the extension into the
rest of the `ggplot2` code.
### Using simple features to map
- Re-create the following map of the number of events listed in the NOAA Storm Events database (of those listed by
county) for each county in Colorado:
```{r echo = FALSE, warning = FALSE, message = FALSE, fig.width = 6, fig.height = 4, fig.align = "center"}
co_event_counts <- storms_2017 %>%
filter(state == "Colorado") %>%
group_by(fips) %>%
count() %>%
ungroup()
co_county_events <- co_counties %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
full_join(co_event_counts, by = "fips") %>%
mutate(n = ifelse(!is.na(n), n, 0))
ggplot() +
geom_sf(data = co_county_events, aes(fill = n)) +
scale_fill_viridis(name = "Number of events\n(2017)")
```
- If you have time, try this one, too. It shows the number of three certain types of events
by county. If a county had no events, it's shown in gray (as having a missing value when
you count up the events that did happen).
```{r echo = FALSE, fig.width = 8, fig.height = 3}
co_event_counts <- storms_2017 %>%
filter(state == "Colorado") %>%
filter(event_type %in% c("Tornado", "Heavy Rain", "Hail")) %>%
group_by(fips, event_type) %>%
count() %>%
ungroup()
co_county_events <- co_counties %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
right_join(co_event_counts, by = "fips")
ggplot() +
geom_sf(data = co_counties, color = "lightgray") +
geom_sf(data = co_county_events, aes(fill = n)) +
scale_fill_viridis(name = "Number of events\n(2017)") +
theme(legend.position = "top") +
facet_wrap(~ event_type, ncol = 3) +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
```
#### Example R code
Here is some R code that could be used to create the figure. Note that the code to create `storms_2017` and
`co_counties` is available in the course notes.
```{r eval = FALSE}
library(viridis)
co_event_counts <- storms_2017 %>%
filter(state == "Colorado") %>%
group_by(fips) %>%
count() %>%
ungroup()
co_county_events <- co_counties %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
full_join(co_event_counts, by = "fips") %>%
mutate(n = ifelse(!is.na(n), n, 0))
ggplot() +
geom_sf(data = co_county_events, aes(fill = n)) +
scale_fill_viridis(name = "Number of events\n(2017)")
```
Example code for second plot:
```{r eval = FALSE}
co_event_counts <- storms_2017 %>%
filter(state == "Colorado") %>%
filter(event_type %in% c("Tornado", "Heavy Rain", "Hail")) %>%
group_by(fips, event_type) %>%
count() %>%
ungroup()
co_county_events <- co_counties %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
right_join(co_event_counts, by = "fips")
ggplot() +
geom_sf(data = co_counties, color = "lightgray") +
geom_sf(data = co_county_events, aes(fill = n)) +
scale_fill_viridis(name = "Number of events\n(2017)") +
theme(legend.position = "top") +
facet_wrap(~ event_type, ncol = 3) +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
```
### More on mapping
- See if you can put everything we've talked about together
to create the following map. Note that you can get the county boundaries using the `tigris`
package and the hurricane data from the NHC website mentioned in the text. The storm events
data is in the `storms_2017` dataframe created in code in the text. See how close you can
get to this figure.
```{r echo = FALSE, message = FALSE, fig.width = 7, fig.align = "center"}
harvey_events <- storms_2017 %>%
filter(state %in% c("Texas", "Louisiana") &
ymd_hms("2017-08-25 00:00:00") < begin_date_time &
begin_date_time < ymd_hms("2017-10-05 00:00:00")) %>%
group_by(fips) %>%
count() %>%
ungroup()
tx_la_counties <- counties(state = c("TX", "LA"), cb = TRUE, class = "sf") %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
full_join(harvey_events, by = "fips") %>%
mutate(n = ifelse(is.na(n), 0, n))
ggplot() +
geom_sf(data = tx_la_counties, color = NA,
aes(fill = n)) +
geom_sf(data = filter(us_states, ID %in% c("texas", "louisiana")),
fill = NA, color = "lightgray") +
geom_sf(data = harvey_windswath,
aes(color = factor(RADII)), alpha = 0.4) +
geom_sf(data = harvey_track, color = "red",
alpha = 0.1, size = 1) +
xlim(c(-107, -89)) + ylim(c(25, 37)) +
scale_fill_viridis(name = "# of events", option = "A", direction = -1) +
scale_color_viridis(name = "Wind (kts)", discrete = TRUE,
option = "B", direction = -1) +
theme_dark() +
ggtitle("Number of events near Harvey's landfall",
subtitle = "# events starting Aug. 25 to Sept. 5, 2017")
```
#### Example R code
Here is the code I used to create the figure:
```{r eval = FALSE}
harvey_events <- storms_2017 %>%
filter(state %in% c("Texas", "Louisiana") &
ymd_hms("2017-08-25 00:00:00") < begin_date_time &
begin_date_time < ymd_hms("2017-10-05 00:00:00")) %>%
group_by(fips) %>%
count() %>%
ungroup()
tx_la_counties <- counties(state = c("TX", "LA"), cb = TRUE, class = "sf") %>%
mutate(fips = paste(STATEFP, COUNTYFP, sep = "")) %>%
full_join(harvey_events, by = "fips") %>%
mutate(n = ifelse(is.na(n), 0, n))
ggplot() +
geom_sf(data = tx_la_counties, color = NA,
aes(fill = n)) +
geom_sf(data = filter(us_states, ID %in% c("texas", "louisiana")),
fill = NA, color = "lightgray") +
geom_sf(data = harvey_windswath,
aes(color = factor(RADII)), alpha = 0.4) +
geom_sf(data = harvey_track, color = "red",
alpha = 0.1, size = 1) +
xlim(c(-107, -89)) + ylim(c(25, 37)) +
scale_fill_viridis(name = "# of events", option = "A", direction = -1) +
scale_color_viridis(name = "Wind (kts)", discrete = TRUE,
option = "B", direction = -1) +
theme_dark() +
ggtitle("Number of events near Harvey's landfall",
subtitle = "# events starting Aug. 25 to Sept. 5, 2017")
```