-
Notifications
You must be signed in to change notification settings - Fork 2
/
CS_04.Rmd
127 lines (102 loc) · 4.32 KB
/
CS_04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: Farthest airport from New York City
week: 4
type: Case Study
subtitle: Joining Relational Data
reading:
- R4DS [Chapter 13 - Relational Data](http://r4ds.had.co.nz/relational-data.html){target='blank'}
tasks:
- Join two datasets using a common column
- Answer a question that requires understanding how multiple tables are related
- Save your script as a .R or .Rmd in your course repository
---
```{r setup, include=FALSE, purl=F}
options(knitr.duplicate.label = 'allow')
source("functions.R")
source("knitr_header.R")
```
# Reading
```{r reading,results='asis',echo=F,purl=F}
md_bullet(rmarkdown::metadata$reading)
```
# Background
In this exercise you will use various data wrangling tools to answer questions from the data held in separate tables. We'll use the data in the `nycflights13` package which has relationships between the tables as follows.
![](http://r4ds.had.co.nz/diagrams/relational-nycflights.png)
# Objective
> What is the full name (not the three letter code) of the destination airport farthest from any of the NYC airports in the `flights` table?
Save the name as a character value `farthest_airport`.
# Tasks
```{r tasks,results='asis',echo=F, purl=F}
md_bullet(rmarkdown::metadata$tasks)
```
You will need to load the necessary packages
```{r, message=FALSE}
library(tidyverse)
library(nycflights13)
```
[<i class="fa fa-file-code-o fa-1x" aria-hidden="true"></i> Download starter R script (if desired)](`r output_nocomment`){target="_blank"}
There are several ways to do this using at least two different joins. I found two solutions that use 5 or 6 functions separated by pipes (`%>%`). Can you do it in fewer?
<div class="well">
<button data-toggle="collapse" class="btn btn-primary btn-sm round" data-target="#demo1">Show Hints</button>
<div id="demo1" class="collapse">
The details below describe one possible approach.
1. Open the help file for the `nycflights13` package by searching in the "Help" panel in RStudio.
2. Look at the contents of the various tables to find the ones you need (`name`, `distance`, and `dest`). You can use `head()`, `glimpse()`, `View()`, `str()`.
2. In the table with distances, find the airport code that is farthest from the New York Airports (perhaps using `arrange()` and `slice()`)
3. Join this table with the one that has the full airport names. You will either need to rename the columns so they match the other table or use the `by` parameter in the join. e.g. check out `?left_join()`
4. `select()` only the `destName` column
5. Convert the data.frame to a single character value with `as.character()`. This converts the data.frame object into a single value.
5. Save the resulting airport name as an object called `farthest_airport`
</div>
</div>
```{r purl=F, echo=F, results='hide',message=FALSE, eval=F}
farthest_airport <- flights%>%
select(distance,dest)%>%
arrange(desc(distance))%>%
slice(1)%>%
left_join(airports,by=c("dest"="faa"))%>%
select(name) %>%
as.character()
#OR
select(airports,
dest=faa,
destName=name)%>%
right_join(flights)%>%
arrange(desc(distance)) %>%
slice(1) %>%
select(destName) %>%
as.character()
```
<div class="extraswell">
<button data-toggle="collapse" class="btn btn-link" data-target="#extras">
Extra time? Try these extra activities...
</button>
<div id="extras" class="collapse">
Soon we will introduce working with spatial data and doing similar kinds of operations. If you have time to play, see if you can figure out what this does:
```{r, message=FALSE, purl=F}
airports %>%
distinct(lon,lat) %>%
ggplot(aes(lon, lat)) +
borders("world") +
geom_point(col="red") +
coord_quickmap()
```
Can you figure out how to map mean delays by destination airport as shown below?
```{r, echo=F, purl=F}
airports %>%
filter(!tzone%in%c("Pacific/Honolulu",
"America/Anchorage",
"Asia/Chongqing" ))%>%
right_join(flights, c("faa" = "dest")) %>%
group_by(name,lat,lon)%>%
summarize(meandelay=mean(arr_delay, na.rm=T))%>%
na.omit()%>%
ggplot(aes(lon, lat,col=meandelay)) +
borders("state") +
geom_point(size=2) +
coord_quickmap()+
scale_color_distiller(type="div",palette = "RdBu",name="Mean\nDelay\n(minutes)")
```
</div>
</div>
<i> <small> Adapted from [R for Data Science](http://r4ds.had.co.nz/relational-data.html#filtering-joins) </small> </i>