forked from kbroman/datacarpentry_R_2017-01-10
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchallenge_slides.Rmd
230 lines (115 loc) · 3.82 KB
/
challenge_slides.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
output: slidy_presentation
---
# Challenge 1
What is the value of `y` after doing the following?
```{r, eval=FALSE}
x <- 50
y <- x * 2
x <- 80
```
---
# Challenge 2
Study the output of `str(surveys)`. How are the missing values being treated?
---
# Challenge 3
The function `nrow()` on a `data.frame` returns the number of rows.
Use `nrow()`, in conjuction with `seq()` to create a new `data.frame` called
`surveys_by_10` that includes every 10th row of the survey data frame
starting at row 10 (10, 20, 30, ...)
---
# Challenge 4
The function `table()` tabulates observations.
```{r table, eval=FALSE}
expt <- c("treat1", "treat2", "treat1", "treat3", "treat1",
"control", "treat1", "treat2", "treat3")
expt <- factor(expt)
table(expt)
```
* In which order are the treatments listed?
* How can you recreate this table with "`control`" listed last instead
of first?
---
# Challenge 5
Using pipes, subset the data to include individuals collected before 1995,
and retain the columns `year`, `sex`, and `weight.`
---
# Challenge 6
Create a new dataframe from the survey data that meets the following
criteria:
- contains only the `species_id` column and a column that contains
values that are the square-root of `hindfoot_length` values (e.g. a new column
`hindfoot_sqrt`).
- In this `hindfoot_sqrt` column, there are no NA values
and all values are < 3.
Hint: think about how the commands should be ordered
---
# Challenge 7
How many times was each `plot_type` surveyed?
---
# Challenge 8
Use `group_by()` and `summarize()` to find the mean, min, and max hindfoot
length for each species.
---
# Challenge 9
What was the heaviest animal measured in each year? Return the columns `year`,
`genus`, `species`, and `weight`.
Hint: Use `filter()` rather than `summarize()`.
---
# Challenge 10
Make a scatterplot of `hindfoot_length` vs `weight`, but only for the
`species_id`, `"DM"`.
---
# Challenge 11
Use dplyr to calculate the mean `weight` and `hindfoot_length` as well
as the sample size for each species.
Make a scatterplot of mean `hindfoot_length` vs mean `weight`, with
the sizes of the points corresponding to the sample size.
---
# Challenge 12
Make a plot of counts of `species_id` `"DM"` and `"DS"` by year.
---
# Challenge 13
Try using `geom_histogram()` to make a histogram visualization of the
distribution of `weight`.
Hint: You want `weight` as the x-axis aesthetic. Try specifying `bins`
in `geom_histogram()`.
---
# Challenge 14
A variant on the box plot is the violin plot. Use `geom_violin()` to
make violin plots of `hindfoot_length` by `species_id`.
---
# Challenge 15
- Calculate counts grouped by year, species_id, and sex
- make the faceted plot splitting further by sex (within each panel)
- color by sex rather than species
---
# Challenge 16
- Create a new R Markdown document.
- Delete all of the R code chunks and write a bit of Markdown (some sections, some italicized
text, and an itemized list).
- Convert the document to a webpage.
---
# Challenge 17
Add code chunks to
- Load the ggplot2 package
- Read the portal data
- Create a plot
---
# Challenge 18
Use chunk options to control the size of a figure and to hide the
code.
---
# Challenge 19
Try out a bit of in-line R code.
# Capstone project
Create and compile an R Markdown report:
1. Load the `portal_data_joined.csv` data.
2. Create boxplots of weight by sex, _omitting individuals with
missing sex_.
3. Create a histogram of hindfoot lengths.
4. Create a scatterplot of hindfoot length vs weight for the species
`"DM"`, `"DO"`, and `"DS"`. _Use different colors for the three
species, and put the three species in different panels._
5. Create a table of counts of `"DM"` by plot type for the year 1977.
6. Create a line plot of the counts of `"DM"` in `"Rodent Exclosure"` plots over time.