forked from geanders/RProgrammingForResearch
-
Notifications
You must be signed in to change notification settings - Fork 0
/
00-course_info.Rmd
431 lines (336 loc) · 25.3 KB
/
00-course_info.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
\mainmatter
# Course information {-}
[Download](https://github.com/geanders/RProgrammingForResearch/raw/master/slides/CourseOverview.pdf)
a pdf of the lecture slides covering this topic.
## Course overview
This document provides the course notes for Colorado State University's **R
Programming for Research** courses (ERHS 535, ERHS 581A3, and ERHS 581A4). The
courses offer in-depth instruction on data collection, data management,
programming, and visualization, using data examples relevant to data-intensive
research.
## Time and place
Students for ERHS 535, ERHS 581A3, and ERHS 581A4 will meet together. Students in
ERHS 535 will meet for the entire semester, completing a three-credit course.
Students in ERHS 581A3 will meet for the first five weeks of the semester,
completing a one-credit course. Students in ERHS 581A4 will meet from the
sixth week to the final week of the semester, completing a two-credit course.
For the first five weeks of class, the course meets in the first-floor classroom
of the Military Sciences building on Mondays and Wednesdays, 10:00 am--11:50 am.
For the remaining weeks, the course meets in Room 120 of the Environmental
Health Building on Mondays and Wednesdays, 10:00 am--12:00 pm.
Exceptions to these meeting times are:
- There will be no meeting on Labor Day (Monday, Sept. 2).
- There are no course meetings the week of Thanksgiving (week of Nov. 25).
- Office hours will be 10:00--11:00 AM on Fridays in EH 120.
## Detailed schedule
Here is a more detailed view of the schedule for this course for Fall 2019:
```{r echo = FALSE}
a <- data.frame(week = 1:16,
date = c("Aug. 26, 28",
"Sept. 4",
"Sept. 9, 11",
"Sept. 16, 18",
"Sept. 23, 25",
"Sept. 30, Oct. 2",
"Oct. 7, 9",
"Oct. 14, 16",
"Oct. 21, 23",
"Oct. 28, 30",
"Nov. 4, 6",
"Nov. 11, 13",
"Nov. 18, 20",
"Dec. 2, 4",
"Dec. 9, 11",
"Week of Dec. 16"),
level = c("Preliminary",
rep("Basic", 4),
rep("Intermediate", 4),
rep("Advanced", 6),
" "),
content = c("R Preliminaries",
rep(c("Entering and cleaning data",
"Exploring data",
"Reporting data results",
"Reproducible Research"), 2),
"Entering and cleaning data",
"Exploring data",
"Exploring data (mapping)",
"Reporting data results",
"Reproducible Research",
"Continuing in R",
"Group presentations"),
due = c("", "Quiz (W)", "Quiz (W), HW #1 (F)", "Quiz (W)",
"Quiz (W), HW #2 (F)", "Quiz (W)",
"Quiz (W)", "Quiz (W), HW #3 (F)",
"Quiz (W)", "Quiz (W), HW #4 (F)",
"", "HW #5 (F)",
"", "", "Project draft (W), HW #6 (W)",
"Final project"))
knitr::kable(a, col.names = c("Week", "Class dates", "Level", "Lecture content", "Graded items"))
```
Students in ERHS 581A3 will be in weeks 1--5 of this schedule. Students in ERHS 581A4 will
be in weeks 6--16 of this schedule.
## Grading
### Grading for ERHS 535
For ERHS 535, course grades will be determined by the following five components:
```{r echo = FALSE}
a <- data.frame(a = c("Final group project",
"Weekly in-class quizzes, weeks 2-10",
"Six homework assignments",
"Attendance and class participation",
"Weekly in-course group exercises"),
b = c(30, 25, 25, 10, 10))
knitr::kable(a, col.names = c("Assessment component",
"Percent of grade"))
```
### Grading for ERHS 581A3
For ERHS 581A3, course grades will be determined by the following four components:
```{r echo = FALSE}
a <- data.frame(a = c("Weekly in-class quizzes, weeks 2-5",
"Two homework assignments",
"Attendance and class participation",
"Weekly in-course group exercises"),
b = c(40, 30, 10, 20))
knitr::kable(a, col.names = c("Assessment component",
"Percent of grade"))
```
### Grading for ERHS 581A4
For ERHS 581A4, course grades will be determined by the following five components:
```{r echo = FALSE}
a <- data.frame(a = c("Final group project",
"Weekly in-class quizzes, weeks 1--5 (weeks 6--10 of the semester)",
"Four homework assignments",
"Attendance and class participation",
"Weekly in-course group exercises"),
b = c(30, 25, 30, 5, 10))
knitr::kable(a, col.names = c("Assessment component",
"Percent of grade"))
```
### Attendance and class participation
Because so much of the learning for this class is through interactive work in
class, it is critical that you come to class.
If you are in **ERHS 535**, out of a possible 10 points for class attendance,
you will get:
- **10 points** if you miss two or fewer classes
- **8 points** if you miss three classes
- **6 points** if you miss four classes
- **4 points** if you miss five classes
- **2 points** if you miss six classes
- **0 points** if you miss seven or more classes
If you are in **ERHS 581A3** or **ERHS581A4**, out of a possible 10 points for
class attendance, you will get:
- **10 points** if you miss one or fewer classes
- **8 points** if you miss two classes
- **6 points** if you miss three classes
- **4 points** if you miss four classes
- **2 points** if you miss five classes
- **0 points** if you miss six or more classes
Exceptions:
- Attendance on the first day of class (Aug. 26) will not be counted.
- If you miss classes for "University-sanctioned" activities. These can include
attending a conference, travel to collect data for your dissertation), For these
absences, you must provide a signed letter from your research adviser. For more
details, see [CSU's Academic Policies on Course
Attendance](http://catalog.colostate.edu/general-catalog/academic-standards/academic-policies/).
- If you have to miss class for a serious medical issue (e.g., operation,
sickness severe enough to require a doctor's visit), the absence will be excused
if you bring in a note from a doctor of other medical professional giving the
date you missed and that it was for a serious medical issue.
**For an absence to be excused, you must email me a copy of the letter by 5:00 pm
the Friday afternoon of the week of the class you missed.**
### Weekly in-course group exercises
Part of each class will be spent doing in-course group exercises. As long
as you are in class and participate in these exercises, you will get full credit
for this component.
**If you miss a class,** to get credit towards this component of your grade, you
will need to turn a few paragraphs describing what was covered in the exercise
and what you learned. To get credit for this, you must submit it to me by email
by 5:00 pm the Friday afternoon of the week of the class you missed.
All in-class exercises are included in the online course book at the end of the
chapter on the associated material.
### In-class quizzes
There will be weekly in-course quizzes for weeks 2--10 of the course. Students
in ERHS 535 will take all these quizzes. Students in ERHS 581A3 will take
quizzes in weeks 2--5. Students in ERHS 581A4 will take quizzes in weeks 6--10.
Each quiz will have at least 10 questions. Typically, a quiz will have more
questions, usually 12--15 questions. The grading of the quizzes is structures
so that you can get full credit for the quiz portion of the grade without
getting 100% of quiz questions right. Instead, if you get ten questions right
per quiz on average, you will get full credit for the quiz portion of the grade.
Once you reach the maximum possible points on quizzes, you can continue to take
the quizzes for practice, or you can choose to skip any following quizzes.
Quiz questions will be multiple choice, matching, or very short answers. The
["Vocabulary"
appendix](https://geanders.github.io/RProgrammingForResearch/appendix-a-vocabulary.html)
of our online book has the list of material for which you will be responsible
for this quiz. Most of the functions and concepts will have been covered in
class, but some may not. You are responsible for going through the list and, if
there are things you don't know or remember from class, learning them. To do
this, you can use help functions in R, Google, StackOverflow, books on R, ask a
friend, and any other resource you can find. The final version of the Vocabulary
list you will be responsible will be posted by the Wednesday evening before each
quiz. In general, using R frequently in your research or other coursework will
help you to prepare and do well on these quizzes.
Except in very unusual situations, the only time you will be able to make up a
quiz is during office hours of the same week when you missed the quiz. Note that
you can still get full credit on your total possible quiz points if you miss a
class, but it means you will have to work harder and get more questions right
for days you are in class.
#### Quiz grade calculations for ERHS 535
For students in ERHS 581A3, the **nine quizzes** in weeks 2--10 count for **25
points** of the final grade. The final quiz total for students in ERHS 535 will
be calculated as:
$$
\mbox{Quiz grade} = 25 * \frac{\mbox{Number of correct quiz answers}}{90}
$$
#### Quiz grade calculations for ERHS 581A3
For students in ERHS 581A3, the **four quizzes** in weeks 2--5 count for **40
points** of the final grade. The final quiz total for students in ERHS 581A3
will be calculated as:
$$
\mbox{Quiz grade} = 40 * \frac{\mbox{Number of correct quiz answers}}{40}
$$
#### Quiz grade calculations for ERHS 581A4
For students in ERHS 581A4, the **five quizzes** in weeks 6--10 count for **25
points** of the final grade. The final quiz total for students in ERHS 581A3
will be calculated as:
$$
\mbox{Quiz grade} = 25 * \frac{\mbox{Number of correct quiz answers}}{50}
$$
<!-- Because grading format for these quizzes allows for you to miss some questions -->
<!-- and still get the full quiz credit for the course, I will not ever re-consider -->
<!-- the score you got on a previous quiz, give points back for a wrong answer on a -->
<!-- poorly-worded question, etc. However, if a lot of people got a particular -->
<!-- question wrong, I will be sure to cover it in the next class period. Also, -->
<!-- especially if a question was poorly worded and caused confusion, I will work a -->
<!-- similar question into a future quiz-- in addition to the 10 guaranteed questions -->
<!-- for that quiz-- so every student will have the chance to get an extra 1/3 point -->
<!-- of credit for the question. -->
### Homework
There will be [homework
assignments](https://geanders.github.io/RProgrammingForResearch/appendix-b-homework.html)
due every two to three weeks during the course, starting the third week of the
course (see the detailed schedule in the online course book for exact due
dates).
The first two homeworks (HWs #1 and # 2) should be done individually. For later
homeworks, you may be given the option to work in small groups of approximately
three students.
Homeworks will be graded for correctness, but some partial credit will be
given for questions you try but fail to answer correctly. Some of the exercises will
not have "correct" answers, but instead will be graded on completeness.
For later homeworks, a subset of the full set of questions will be selected for
which I will do a detailed grading of the code itself, with substantial feedback
on coding. All other questions in the homework will be graded for completeness
and based on the final answer produced.
Homework is due to me by email by 5:00 pm on the Friday it is due. Your grade
will be reduced by 10 points for each day it is late, and will receive no credit
if it is late by over a week.
<!-- ### Final group project -->
<!-- For the final project, you will work in small groups (3--4 people) on an R -->
<!-- programming challenge. The final grade will be based on the resulting R -->
<!-- software, as well as on a short group presentation and written report describing -->
<!-- your work. You will be given a lot of in-class time during the last third of the -->
<!-- semester to work with your group on this project, and you will also need to -->
<!-- spend some time working outside of class to complete the project. More details -->
<!-- on this project will be provided later in the semester. -->
### Final group project
You will do the final group project in groups of 4. The final product will be a report of 1,500 words or less and an accompanying flexdashboard. Come up with an interesting question you'd love to get the answer to that you think you can use the main project data to help you answer. The final product will be a Word or pdf document created from an RMarkdown file and an accompanying flexdashboard (html created from RMarkdown file).
<!-- Here are some articles to give you an idea of the style and content for this project: -->
<!-- - [Does Christmas come earlier each year?](http://www.statslife.org.uk/culture/1892-does-christmas-really-come-earlier-every-year) -->
<!-- - [Hilary: the most poisoned baby name in US history](http://hilaryparker.com/2013/01/30/hilary-the-most-poisoned-baby-name-in-us-history/) -->
<!-- - [Every Guest Jon Stewart Ever Had On "The Daily Show"](http://fivethirtyeight.com/datalab/every-guest-jon-stewart-ever-had-on-the-daily-show/) -->
<!-- - [Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?](http://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/) -->
<!-- - [Billion-Dollar Billy Beane](http://fivethirtyeight.com/features/billion-dollar-billy-beane/) -->
You will have in-class group work time to work on this. This project will also require some work with your group outside of class. You will be able to get feedback from me through weekly informal written reports in these weeks. I will also provide feedback and help during the in-class group work time.
The final group project will be graded with A through F, with the following point values (out of 30 possible):
- **30 points** for an A
- **25 points** for a B
- **20 points** for a C
- **15 points** for a D
- **10 points** for an F
If you turn nothing in, you will get **0 points**.
#### Final presentation
- In total, the group's presentation should last 20 minutes. There will then be 5 minutes for questions.
- Split the presentation up into two parts: (1) the main presentation and overview of your flexdashboard (about 12 minutes) and (2) a tutorial-style discussion of how you used R to do the project (about 8 minutes).
- The main presentation part should include the following sections:
+ **Research question**: In one sentence, what is the main thing you were trying to figure out?
+ **Introduction**: Why did you decide to ask this question?
+ **Methods**: How did you investigate the data to try to answer your question? This should not include R code (save that for the tutorial part), but rather should use language like "To determine if ... was associated with ..., we measured the correlation ...". It's fine for this project if the Methods are fairly simple ("We investigated the distribution of ... using boxplots ...", "We took the mean and interquartile range of ...", "We mapped state-level averages of ...", etc.). Why do you choose to use the Methods you used? Why do you think they're appropriate and useful for your project?
+ **Results**: What did you find out? Most of these slides should be figures or tables. Discuss your interpretation of your results as you present them. Ideally, you should be able to show your main results in about 3 slides, with one figure or table per slide.
+ **Conclusions**: So what? How do your results compare with what other people have found out about your research question? Based on what you found, are there now other things you want to check out?
- The tutorial part should include the following sections:
+ **Overview of your approach in R**: Step us through a condensed version of how you did your project
+ **Interesting packages / techniques**: Spend a bit more time on any parts that you found particularly interesting or exciting. Were there packages you used that were helpful that we haven't talked about in class? Did you find out how to do anything that you think other students could use in the future? Did you end up writing a lot of functions to use? Did you have an interesting way of sharing code and data among your group members?
+ **Lessons learned**: If you were to do this project again from scratch, what would you do differently? Were there any big wrong turns along the way? Did you find out how to do something late in the project that would have saved you time if you'd started using it earlier?
#### Final report
The final report should not exceed 1,500 words. You should aim for no more than three figures and tables.
<!-- In addition to the good examples linked above, you can find another example of the type of document we're looking for [here](http://fivethirtyeight.com/features/the-20-most-extreme-cases-of-the-book-was-better-than-the-movie/) from FiveThirtyEight. This example would have received an excellent score if it had been turned in for this class because it is clearly and engagingly written, it presents figures and tables that directly help to answer its main question and that are clearly explained and attractively presented, and its author has convinced me that this is an interesting question worth reading an article about. Notice that it is not very long, only has three figures and tables, and uses fairly simple analysis. -->
I will assess the final report on the following criteria:
- Is it written with correct spelling and grammar?
- Is it very clear what your over-arching research question is?
- Have you explained the way you analyzed the data clearly enough that I think that I could reproduce your analysis if I had your data? Have you explained a bit why your method of analyzing the data is appropriate for your question? Have you let me know about major caveats or limitations related to the methods of analysis you're using?
- Have you presented figures and / or tables with results that help answer your main research question? Is it clear what each is showing and how I should interpret it? (For a nice example of explaining how to interpret results, see footnote 4 [here](http://fivethirtyeight.com/features/the-20-most-extreme-cases-of-the-book-was-better-than-the-movie/).) Have you explained and interpreted your main results in the text? Have you pointed out any particularly interesting observations (interesting outliers, for example)?
- When I'm finished with your article, do I have more insight into your research question than when I started?
- If you include a quote or a figure from an outside source, you **must** include a full reference for it. Otherwise, I am okay with you doing referencing more in a blog-post style. That is, if you are repeating another person's ideas or findings, you must reference it, but you may use a web link rather than writing out full references. You do not need to include references of any type for standard analysis techniques (for example, you would not need to include a reference from a Stats book if you are fitting a regression model).
<!-- - Have you made a convincing case that this is an important or interesting problem? You could meet this criterion even by convincing me that this is a problem that just one of you is passionate about (as an example, see [here](http://hilaryparker.com/2013/01/30/hilary-the-most-poisoned-baby-name-in-us-history/)). -->
<!-- - Are the data that you chose to use reasonable for answering the question? Have you explained any caveats or limitations to the data that I should keep in mind when interpreting your results? As an example of how to do this for an analysis with secondary (imperfect) data, see how [this post](http://fivethirtyeight.com/features/the-20-most-extreme-cases-of-the-book-was-better-than-the-movie/) handles describing the data it uses, particularly in footnotes 1 and 3 and the sentences in the main text that correspond to them. -->
#### Flexdashboard
Finally, you should create a flexdashboard (https://rmarkdown.rstudio.com/flexdashboard/) to provide a visualization related to your research question. Expectations for that are:
- It should work.
- It should include text that is clearly written, without grammatical errors or typos. Any graphics are easily to interpret and follow some of the principles of good graphics covered in class.
- It includes at least two rendered outputs (e.g., one plot and one table; two plots).
- It is self-contained-- in other words, a user shouldn't need to read your report or hear you explain the dashboard to understand it. It should include enough information on the app for a user to figure out how to use the app and interpret the output.
## Course set-up
Please download and install the latest version of R and RStudio (Desktop version,
Open Source edition) installed. Both are free for anyone to download.
Students in ERHS 535 and ERHS 581A4 will also need to download and install a
version of LaTeX (MikTeX for Windows and MacTeX for Macs). They will also
need to download and install git software and create a GitHub account.
Here are useful links for this set-up:
- R: https://cran.r-project.org
- RStudio: https://www.rstudio.com/products/rstudio/#Desktop
- Install MikTeX: https://miktex.org/ (only ERHS 535 / 581A4 with Windows)
- Install MacTeX: http://www.tug.org/mactex/ (only ERHS 535 / 581A4 with Macs)
- Install git: https://git-scm.com/downloads (only ERHS 535 / 581A4)
- Sign-up for a GitHub account: https://github.com (only ERHS 535 / 581A4)
## Coursebook
This coursebook will serve as the only required textbook for this course. I am
still in the process of editing and adding to this book, so content may change
somewhat over the semester (particularly for later weeks, which
is currently in a rawer draft than the beginning of the book). We typically
cover about a chapter of the book each week of the course.
This coursebook includes:
- Links to the slides presented in class for each topic
- In-course exercises, typically including links to the data used in the exercise
- An appendix with homework assignments
- A list of vocabulary and concepts that should be mastered for each quiz
If you find any typos or bugs, or if you have any suggestions for how the book
can be improved, feel free to post it on the book's [GitHub Issues
page](https://github.com/geanders/RProgrammingForResearch/issues).
This book was developed using Yihui Xie's wonderful
[bookdown](https://bookdown.org) framework. The book is built using code that
combines R code, data, and text to create a book for which R code and examples
can be re-executed every time the book is re-built, which helps identify bugs
and broken code examples quickly. The online book is hosted using GitHub's free
[GitHub Pages](https://pages.github.com). All material for this book is
available and can be explored at [the book's GitHub
repository](https://github.com/geanders/RProgrammingForResearch).
### Other helpful books (not required)
The best book to supplement the coursebook and lectures for this course is [R
for Data Science](http://r4ds.had.co.nz), by Garrett Grolemund and Hadley
Wickham. The entire book is freely available online through the same format at
the coursebook. You can also purchase a paper version of the book (published by
O'Reilly) through Amazon, Barnes & Noble, etc., for around $40. This book is an
excellent and up-to-date reference by some of the best R programmers in the
world.
There are a number of other useful books available on general R programming, including:
- [R for Dummies](https://colostate-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=01COLSU_ALMA51267598310003361&context=L&vid=01COLSU&lang=en_US&search_scope=Everything&adaptor=Local%20Search%20Engine&tab=default_tab&query=any,contains,r%20for%20dummies&sortby=rank&offset=0)
- [R Cookbook](https://colostate-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=01COLSU_ALMA21203304500003361&context=L&vid=01COLSU&lang=en_US&search_scope=Everything&adaptor=Local%20Search%20Engine&tab=default_tab&query=any,contains,r%20cookbook&sortby=rank&offset=0)
- [R Graphics Cookbook](http://www.amazon.com/R-Graphics-Cookbook-Winston-Chang/dp/1449316956/ref=sr_1_1?ie=UTF8&qid=1440997472&sr=8-1&keywords=r+graphics+cookbook)
- [Roger Peng's Leanpub books](https://leanpub.com/u/rdpeng)
- Various books on [bookdown.org](www.bookdown.org)
The R programming language is used extensively within certain fields, including
statistics and bioinformatics. If you are using R for a specific type of
analysis, you will be able to find many books with advice on using R for both
general and specific statistical analysis, including many available in print or
online through the CSU library.