-
Notifications
You must be signed in to change notification settings - Fork 14
/
io100-working_dates.Rmd
304 lines (252 loc) · 12.9 KB
/
io100-working_dates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
---
title: "Working with Dates and Times in R"
author: "Peter Higgins"
date: "1/05/2023"
output: html_document
editor_options:
markdown:
wrap: 72
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(medicaldata)
library(lubridate)
library(parsedate)
library(janitor)
library(webexercises)
```
# Dates and Times in R
The horrors of dates and times (especially times and time zones) is a
bane of computing. Dates have changed by government fiat (Gregorian to
Julian, etc.) several times in history, and time zones and daylight
savings time practices change more frequently. ![Time
zones](images/tzmath1.jpg) To add to the complications, there are many,
many date formats around the world, with different countries and
cultures mixing and matching days, months, and years willy-nilly to
indicate a particular date. For this reason, there is an international
standard, [ISO
8601](https://en.wikipedia.org/wiki/ISO_8601 "link to ISO8601 wikipedia page"),
for dates and times. The standard calls for listing all dates in
YYYY-MM-DD format, with numbers for the 4 digit year, 2 digit month, and
2 digit day, as in 2021-04-17 for April 17, 2021. All numbers are
zero-padded (necessary zeroes added to the left) to maintain 4 or two
digits as needed. **Use this format** whenever you can, for consistency
and to avoid errors that will be painful later. **Insist** that all of
your collaborators use ISO8601 format, and your life will be vastly
improved.
Time in ISO8601 uses a 24 hour clock, and is formatted in two ways:
the *basic format* is T[hh][mm][ss] and the *extended format* is
T[hh]:[mm]:[ss]. Each number in a time is two digits, and is zero-padded
when needed, so that one second after midnight is T00:00:01.
When possible, use only dates in your data analysis. Times are complex
and surprisingly painful, due to unusual cases that frequently cause
trouble. These include:
- GMT (the time zone for Greenwich mean time) is close to, but not the
same as UTC. Coordinated Universal Time (UTC) has no daylight
savings, while GMT does.
- There are leap years, and leap seconds, to keep track of.
- January 30 + 1 month = NA, because there is no Feb 30.
- There are more than 24 time zones, because of episodic local
changes. For example, tz = "America/Detroit" is not quite the same
as Eastern Standard Time, as Michigan had daylight savings from
1916-1968, not from 1969-72, then returned to DST in 1973 and
thereafter.
- With all of the odd geopolitical time zone and DST variations, there
are now more than 600 distinct valid time zones that R keeps track
of, and you can print these with the base R function *OlsonNames()*.
Arthur David Olson was the original maintainer of the time zone
database, which is now managed by the Internet Assigned Numbers
Authority (IANA), currently led by Paul Eggert at UCLA.
Unless you absolutely need time stamps to track intervals of less than a
day, I strongly urge you to convert all date-times to simple ISO8601
dates.
![TZ](images/tzmath2.jpg)
## Data Types for Dates and Times
Data for dates and times comes in several formats. - POSIXct - calendar
time (also called continuous time or count time, as it counts seconds) -
POSIXlt - local time (also called list time) - Date - date-time <dttm>
POSIX is probably unfamiliar to you - it stands for the Portable
Operating System Interface for uniX used to make standards for UNIX
across Operating systems.
POSIX Dates use the reference point of 1970-01-01 T 0:00:00 UTC
(midnight at the start of the year 1970 in coordinated universal time)
as zero seconds. All dates after that are counted in seconds in POSIXct.
Dates prior to 1970 are counted in negative seconds. POSIXct is very
memory-efficient for storing dates, but not very human-readable.
In the code chunk below, we will start with a ISO 8601 date, with a time
zone. We will then print it out, and then show it as it is stored, are
using the unclass function. Copy and run the code chunk to see the
results.
```{r POSIXct}
date1 <- as.POSIXct("2018-01-16 04:34:45",
tz = "America/Detroit")
date1
unclass(date1)
```
You get a nicely formatted ISO8601 date when you print the date, but
when it is unclassed, you get a large integer, which corresponds to the
number of seconds since midnight, UTC on January 1, 1970. POSIXct dates
are stored as integer seconds.
## Using POSIXlt
POSIXlt is another version of the POSIX standard, but in a more
human-readable format. The 'lt' stands for local time, but it is often
thought of as list time, as the data of POSIXlt is stored in a list
format of human-readable items.
POSIXlt stores separate items in a list. The list includes 10 distinct
items:
- seconds (sec)
- minutes (min)
- hours (hour)
- day of month (mday)
- month (mon) - numeric, with January as zero, and December as 11
- year (year) - since 1900 as zero, so the year 2018 = 118
- weekday numeric (wday), with Sunday = 0, Saturday = 6
- daylight savings (isdst), 0 for no, 1 for yes
- time zone (zone)
- Offset from UTC in seconds (gmtoff)
In the code chunk below, we will start with the same ISO 8601 date, with
a time zone. We will then print it out, and then show it as it is
stored, are using the unclass function. We will then extract the month
and year. Copy and run the code chunk to see the results.
```{r POSIXlt}
date2 <- as.POSIXlt(date1)
date2
unclass(date2)
date2$mon
date2$year
```
Again, we get a nicely formatted ISO8601 date when you print the date,
but when it is unclassed, you get a list of date items. When we extract
the month and year, these can be surprising (January = 0, Year 2018 =
118). POSIXlt dates are stored as a list of date items.
## Formatting dates
- as.Date
- strptime
- formatting codes
- the parsedate package
### Code Chunk Icons
+-------------------------------+-------------------------------------+
| Icon | Uses |
+===============================+=====================================+
| Settings Gear | Allows you to |
| | |
| | - Name the code chunk |
| | |
| | - Set options for this chunk |
+-------------------------------+-------------------------------------+
| Run Chunks Above (down arrow) | Runs all of the preceding code |
| | chunks, including the setup chunk |
+-------------------------------+-------------------------------------+
| Run Chunk (rightward arrow) | Runs the entire current chunk |
+-------------------------------+-------------------------------------+
: Code Chunk Icons
## Including Plots
```{r plot, echo=TRUE, warning=FALSE}
covid %>%
ggplot() +
aes(x = pan_day, y = ct_result) +
geom_point() +
labs(title = "COVID Testing in First 100 Days of Pandemic",
x = "Pandemic Day, 2020",
y = "Cycle Threshold \n45 is a Negative Test")
```
## Including Tables
```{r table, echo=TRUE, warning=FALSE}
covid %>%
count(demo_group, gender) %>%
gt() %>%
tab_header(title = "Demographics of COVID Testing",
subtitle = "By Group and Gender") %>%
tab_source_note(source_note = "From CHOP, 2020") %>%
cols_label(demo_group = "Group",
gender = "Gender",
n = "Count")
```
## Other languages in code chunks
## Code Chunk Options
+-------------+--------------------------+--------------------------+
| Option | Values | Output |
+=============+==========================+==========================+
| eval | TRUE/FALSE | Whether or not the code |
| | | is run. |
+-------------+--------------------------+--------------------------+
| echo | TRUE/FALSE | Show or hide the code |
+-------------+--------------------------+--------------------------+
| include | TRUE/FALSE | Whether or not the |
| | | resulting output of a |
| | | code chunk is displayed |
| | | in the document. `FALSE` |
| | | means that the code |
| | | *will* run, but will not |
| | | display results. |
| | | `include = FALSE` is |
| | | often used for the setup |
| | | chunk. |
+-------------+--------------------------+--------------------------+
| warning | TRUE/FALSE | Whether warnings |
| | | generated from your code |
| | | will be displayed in the |
| | | document. |
+-------------+--------------------------+--------------------------+
| message | TRUE/FALSE | Whether messages |
| | | generated from your code |
| | | will be displayed in the |
| | | document. |
+-------------+--------------------------+--------------------------+
| fig.align | default, left, right, | Where on the page the |
| | center | output figure should |
| | | align. Text options |
| | | should be in quotes, |
| | | like\ |
| | | `f ig.align = "right"` |
+-------------+--------------------------+--------------------------+
| fig.width | default = 7 | figure width in inches |
+-------------+--------------------------+--------------------------+
| fig .height | default = 7 | figure height in inches |
+-------------+--------------------------+--------------------------+
| error | TRUE/FALSE | If `TRUE`, will not stop |
| | | building the document if |
| | | there is an error in a |
| | | code chunk. |
+-------------+--------------------------+--------------------------+
| cache | TRUE/FALSE | If `TRUE`, will store |
| | | the results and not |
| | | re-run the chunk. |
| | | Helpful for long, slow |
| | | calculations. But watch |
| | | out for this if your |
| | | data change and your |
| | | results do not(!!). |
+-------------+--------------------------+--------------------------+
: Code Chunk Options
::: tryit
## Try Out Other Chunk Options
Try adding different chunk options, including
- `include = FALSE`
- `eval = FALSE, echo = TRUE`
- `eval=TRUE, echo = FALSE`
:::
## The `setup` chunk
## The Easy Button - Visual Markdown Editing
The mean cycle threshold in this study was
`r longmcq(c(answer = "r covid %>% mean(ct_result, na.rm = TRUE) %>% format(digits = 5)", "r covid %>% median(ct_result, na.rm = TRUE) %>% format(digits = 7)", "r covid %>% sd(ct_result, na.rm=TRUE) %>% format(digits =3)"))`
<br>
The standard deviation of the cycle threshold in this study was
`r longmcq(c("r covid %>% mean(ct_result, na.rm = TRUE) %>% format(digits = 5)", "r covid %>% median(ct_result, na.rm = TRUE) %>% format(digits = 7)", answer = "r covid %>% sd(ct_result, na.rm=TRUE) %>% format(digits =3)"))`
<br>
:::
## A Quick Quiz
1. Which code chunk option hides the code?
`r mcq(c("eval = TRUE", "warning = FALSE", answer = "echo = FALSE"))`
2. Which code chunk always comes first, and includes libraries and data
import steps? `r mcq(c("plot", answer = "setup", "top_chunk"))`
3. What is the name of the code block (and the markup language it is
written in), set off with 3 hyphens(---) at the very top of your
Rmarkdown document, that tells `pandoc` how to format the final
document? `r mcq(c("HTML", answer = "YAML", "SPAM", "formatter"))`
4. What symbols do you use to make text bold in Rmarkdown?
`r mcq(c("one asterisk", answer = "two asterisks", "underscores", "Exclamation points"))`
5. Which {knitr} function do you use to add images to your document
with a code chunk?
`r mcq(c("paste()", answer = "include_image()", "insert()", "insert_pic()"))`