forked from batpigandme/tidynomicon
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrmarkdown.Rmd
560 lines (464 loc) · 18.9 KB
/
rmarkdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
# Creating Reports {#rmarkdown}
```{r setup, include=FALSE}
source("etc/common.R")
```
Every serious scholar dreams of recording their knowledge in a leather-bound tome
that will lie forgotten and dusty on the shelves of an out-of-the-way library
at a small college with a dubious reputation
until someone naïve enough to believe that they can control forces beyond mortal ken
stumbles upon it and unleashes ravenous horrors to prey upon the sanity of the innocent.
Sadly,
in these diminished times we must settle for dry expositions of trivia
typeset in two columns using a demure font and sequentially-numbered citations.
But there is yet hope.
One of R's greatest strengths is a package called `knitr`
that translates documents written in a format called R Markdown into HTML, PDF, and e-books.
R Markdown files are a kind of programmable document;
authors can interleave prose with chunks of R (or other languages)
and `knitr` will run that code as it processes the document
to create tables and diagrams.
To aid this,
the RStudio IDE includes tools to create new documents,
insert and run code chunks,
preview documents' structure and output,
and much more.
That's the good news.
The bad news is that Markdown isn't a standard:
it's more like a set of ad hoc implementations flying in loose formation.
While basic elements like headings and links are more or less the same in R Markdown
as they are in (for example) [GitHub Flavored Markdown](glossary.html#gfm),
people who have used other dialects may trip over small differences.
This chapter starts by introducing Markdown and publishing workflow,
then shows how to embed code and customize reports.
We close by showing how to publish reports on GitHub and Netlify,
which both offer free hosting for small websites.
## Learning Objectives
- Create a Markdown file that includes headings, links, and external images.
- Compile that file to produce an HTML page.
- Customize the page via its YAML header.
- Add R code chunks to a page.
- Format tabular output programmatically.
- Share common startup code between pages.
- Create and customize parameterized documents.
- Publish pages on GitHub by putting generated HTML in the `docs` directory.
- Publish pages by copying them to Netlify.
## How can I create and preview a simple page?
To begin,
create a file called `first.Rmd` and add the following text to it:
<!-- rmarkdown/first.Rmd -->
```markdown
## Methods {-}
Something *really important*.
An [external link](https://tidynomicon.tech).
[Another link][rstudio]
<!-- link table below -->
[rstudio]: https://rstudio.com
```
This shows several key features of Markdown:
- A level-1 heading (`h1` in HTML) is put on a line starting with `#`.
A level-2 heading (`h2`) uses two of these and so on.
- Putting `{-}` immediately after the heading title suppresses numbering.
Without this,
our examples would all be included in this book's table of contents,
which isn't what we want.
(This is one of R Markdown's extensions to Markdown.)
- Paragraphs are separated by blank lines.
- Text can be put in single asterisks for *italics*
or double asterisk for **bold**.
- To create a link,
put the visible text inside square brackets
and the URL inside parentheses immediately after it.
This is the reverse of HTML's order,
which puts the URL inside the opening `a` tag
*before* the contained text that is displayed.
- Links can also be written by putting text inside square brackets
and an identifier immediately after it,
also in square brackets.
Those identifiers can then be associated with links in a table at the bottom of the page.
- Comments are written as they are in HTML.
Here's what the HTML corresponding to our simple Markdown document looks like:
> ## Methods {-}
>
> Something *really important*.
>
> An [external link](https://tidynomicon.tech).
>
> [Another link][rstudio]
>
> <!-- link table below -->
>
> [rstudio]: https://rstudio.com
To preview it,
go to the mini-toolbar at the top of your document in the RStudio IDE
and click "knit":
to call the appropriate function from `knitr`
(or a function from one of the libraries built on top of it for generating books or slides).
```{r knit-button, echo=FALSE, fig.cap="The 'knit' Button"}
knitr::include_graphics("figures/rmarkdown/knit-button.png")
```
## How can I run code and include its output in a page?
If this is all R Markdown could do,
it would be nothing more than an idiosyncratic way to create HTML pages.
What makes it powerful is the ability to include code chunks
that are evaluated as the document is knit,
and whose output is included in the final page.
Put this in a file called `second.Rmd`:
````markdown
Displaying the colors:
`r ''````{r}
colors <- c('red', 'green', 'blue')
colors
```
````
The triple back-quotes mark the start and end of a block of code;
putting `{r}` immediately after the back-quotes at the start
tells `knitr` to run the code
and include its output in the generated page,
which therefore looks like this:
<blockquote>
<p>Displaying the colors:</p>
<pre class="r"><code>colors <- c('red', 'green', 'blue')
colors</code></pre>
<pre><code>## [1] "red" "green" "blue"</code></pre>
</blockquote>
We can put any code we want inside code blocks.
We don't have to execute it all at once:
the `Code` pulldown in RStudio's main menu offers a variety of ways
to run regions of code.
The IDE also gives us a keyboard shortcut to insert a new code chunk,
so there really is no excuse for not making notes as we go along.
We can control execution and formatting by putting options inside the curly braces
at the start of the code block:
- `{r label}` gives the chunk a label that we can cross-reference.
Labels must be unique within documents,
just like the `id` attributes of HTML elements.
- `{r include=FALSE}` tells `knitr` to run the code
but *not* to include either the code or its output in the finished document.
While the option name is confusing—the code is actually included in processing—this
is handy when we have setup code that loads libraries or does other things
that our readers probably don't care about.
- `{r eval=FALSE}` displays the code but doesn't run it,
and is often used for tutorials like this one.
- `{r echo=FALSE}` hides the code but includes the output.
This is most often used for displaying static images
as we will see below.
These options can be combined by separating them with commas.
In particular,
it's good style to give every chunk a unique label,
so a document might look like this:
````markdown
# My Thesis {-}
`r ''````{r setup, include=FALSE}
# Load tidyverse but don't display messages.
library(tidyverse)
```
`r ''````{r read-data, message=FALSE}
earthquakes <- read_csv('earthquakes.csv')
```
A profound quotation to set the scene.
And then some analysis:
`r ''````{r calculate-depth-by-magnitude}
depth_by_magnitude <- earthquakes %>%
mutate(round_mag = round(Magnitude)) %>%
group_by(round_mag) %>%
summarize(depth = mean(Depth_Km))
depth_by_magnitude
```
Now let's visualize that:
`r ''````{r plot-depth-by-magnitude}
depth_by_magnitude %>%
ggplot() +
geom_point(mapping = aes(x = round_mag, y = depth))
```
````
In order:
- The document title is a level-1 header with suppressed numbering.
- The first code chunk is called `setup`
and neither it nor its output are included in the output page.
- The second chunk is called `read-data`.
It is shown in the output,
but its output is not.
- There is then a (very) short paragraph.
- The third code chunk calculates the mean depth by rounded magnitude.
Both the code and its output are included;
the output is just R's textual display of the `depth_by_magnitude` table.
- After an even shorter paragraph,
there is another named chunk whose output is a plot rather than text.
`knitr` runs `ggplot2` to create the plot and includes it in the page.
When this page is knit,
the result is:
<blockquote>
<h1>My Thesis</h1>
<pre class="r"><code>earthquakes <- read_csv('earthquakes.csv')</code></pre>
<p>A profound quotation to set the scene. And then some analysis:</p>
<pre class="r"><code>depth_by_magnitude <- earthquakes %>%
mutate(round_mag = round(Magnitude)) %>%
group_by(round_mag) %>%
summarize(depth = mean(Depth_Km))
depth_by_magnitude</code></pre>
<pre><code>## # A tibble: 5 x 2
## round_mag depth
## <dbl> <dbl>
## 1 2 9.85
## 2 3 9.15
## 3 4 8.64
## 4 5 8
## 5 6 8.1</code></pre>
<p>Now let’s visualize that:</p>
<pre class="r"><code>depth_by_magnitude %>%
ggplot() +
geom_point(mapping = aes(x = round_mag, y = depth))</code></pre>
<p><img src="figures/rmarkdown/earthquakes.png"/></p>
</blockquote>
Note that if we want to include a static image (such as a screenshot) in a report,
we can use Markdown's own syntax:
```markdown
![Summoning Ritual](figures/summoning-ritual.jpg)
```
or create an R code chunk with a call to `knitr::include_graphics`:
````markdown
`r ''````{r summoning-ritual, echo=FALSE, fig.cap="Summoning Ritual"}
knitr::include_graphics("figures/summoning-ritual.jpg")
```
````
The options for the latter give the code chunk an ID,
prevent it from being echoed in the final document,
and most importantly,
give the figure a caption.
While it requires a bit of extra typing,
it produces more predictable results across different output formats.
## How can I format tables in a page?
Tables are the undemonstrative yet reliable foundation on which data science is built.
While they are not as showy as their graphical counterparts,
they permit closer scrutiny,
and are accessible both to people with visual challenges
and to the machines whose inevitable triumph over us
shall usher in an agorithmic age free of superstition and mercy.
The simplest way to format tables is to use `knitr::kable`:
```{r load-earthquakes, include=FALSE}
earthquakes <- read_csv('rmarkdown/earthquakes.csv')
```
```{r simple-kable}
earthquakes %>%
head(5) %>%
kable()
```
Our output is more attractive if we install and load the `kableExtra` package
and use it to style the table.
We must call its functions *after* we call `kable()`,
just as we call the styling functions for plots after `ggplot()`.
Below,
we select four columns from our earthquake data and format them as a narrow table
with two decimal places for latitude and longitude,
one for magnitude and depth,
and some multi-column headers:
```{r kable-styling}
earthquakes %>%
select(lat = Latitude, long = Longitude,
mag = Magnitude, depth = Depth_Km) %>%
head(5) %>%
kable(digits = c(2, 2, 1, 1)) %>%
kable_styling(full_width = FALSE) %>%
add_header_above(c('Location' = 2, 'Details' = 2))
```
## How can I share code between pages?
If you are working on several related reports,
you may want to share some code between them.
The best way to do this with R Markdown is to put that code in a separate `.R` file
and then load that at the start of each document using the `source` function.
For example,
all of the chapters in this book begin with:
````markdown
`r ''````{r setup, include=FALSE}
source('common.R')
```
````
The chunk is named `setup`, and neither it nor its output are displayed.
All the chunk does is load and run `common.R`,
which contains the following lines:
```{r common-R, eval=FALSE}
library(tidyverse)
library(reticulate)
library(rlang)
library(knitr)
knitr::opts_knit$set(width = 69)
```
The first few load libraries that various chapters depend on;
the last one tells `knitr` to set the line width option to 69 characters.
## How can I parameterize documents?
`knitr` has many other options besides line width,
and the tools built on top of it,
like [Blogdown][blogdown] and [Bookdown][bookdown],
have many (many) more.
Rather than calling a function to set them,
you can and should add a header to each document.
If we use `File...New File...RMarkdown` to create a new R Markdown file,
its header looks like this:
```markdown
---
title: "fourth"
author: "Greg Wilson"
date: "18/09/2019"
output: html_document
---
```
1. The header starts with exactly three dashes on a line of their own and ends the same way.
A common mistake is to forget the closing dashes;
another is to use too many or too few,
or to include whitespace in the line.
2. The content of the header is formatted using [YAML][yaml],
which stands for "Yet Another Markup Language".
In its simplest form it contains key-value pairs:
the keys are words,
the values can be numbers, quoted strings, or a variety of other things,
and the two are separated by a comma.
This header tells `knitr` what the document's title is,
who its author is,
when it was created
(which really ought to be written as an ISO-formatted date, but worse sins await us),
and what output format we want by default.
When we knit the document,
`knitr` reads the header but does *not* include it in the output.
Instead,
its values control `knitr`'s operation (e.g., select HTML as the output format)
or are inserted into the document itself (e.g., the title).
Let's edit the YAML header so that it looks like this:
```markdown
---
title: "fourth"
author: "Greg Wilson"
date: "2019-09-18"
output:
html_document:
theme: united
toc: true
---
```
1. The date is now in an unambiguous, sortable format.
This doesn't impact our document,
but makes us feel better.
2. We have added two sub-keys under `html_document`
(which we have made a sub-key of `output` so that we can nest things beneath it).
The first tells `knitr` to use the `united` theme,
which gives us a different set of fonts and margins.
The second tells it to create a table of contents at the start of the document
with links to all of the section headers.
YAML can [be quite complicated to understand][norway].
Luckily,
a package called `ymlthis` is being developed to create and check files' headers.
[Its documentation][ymlthis] and capabilities are both steadily growing,
it's a great way to experiment with new or obscure options.
But YAML can do more than control the way `knitr` processes the document:
we can also use it to create [parameterized reports](glossary.html#parameterized-report).
````markdown
---
title: "Fifth Report"
params:
country: Canada
---
This report looks at defenstration rates in `r knitr::inline_expr('params$country')`.
`r ''````{r load-data}
data <- read_csv(here::here('data', glue(params$country, '.csv')))
```
````
This document's YAML header contains the key `params`,
under which is a sub-key for each parameter we want to create.
When the document is knit,
these parameters are put in a [named list](glossary.html#named-list) called `params`
and can be referred to like any other variable.
If we want to display it inline,
we use a back-ticked code fragment that starts with the letter 'r';
if we want to use it in a fenced code block,
it's no different from any other variable.
Parameters don't have to be single values:
they can, for example, be lists of mysterious ailments whose inexorable spread you are vainly trying to halt.
Parameters can also be provided on the command line:
```shell
Rscript -e "rmarkdown::render('fifth.Rmd', params=list(country='Lesotho'))"
```
will create a page called `fifth.html` that reports defenestration rates in Lesotho.
## How can I publish pages on GitHub?
Many programmers use [GitHub Pages][github-pages] to publish websites
for project documentation, personal blogs,
and desperate entreaties to other-worldly forces
(Iä! Iä! Git rebase fhtagn!).
In its original incarnation,
GitHub Pages worked used as follows:
1. Authors created Markdown and HTML files with content they want to publish.
They also created a configuration file called `_config.yml`
with settings for their site (such as its title).
2. All of this was put on a Git branch called `gh-pages`.
Whenever that branch was updated on GitHub,
a [static site generator](glossary.html#static-site-generator) called [Jekyll][jekyll]
would find and transform those files
using templates and inclusions taken from the `_layouts` and `_includes` directories respectively.
Keeping the `gh-pages` branch synchronized with other work proved to be a minor headache,
so GitHub now offers two other options
(which can be configured from a project's settings in the GitHub browser interface):
1. Publish directly from the root directory of the project's `master` branch.
2. Publish from the `docs` folder in the `master` branch.
One other piece of background is how Jekyll determines what to publish:
- It ignores files and directories whose names start with an underscore,
or that it is specifically told to exclude in `_config.yml`.
- If an HTML or Markdown file starts with a YAML header,
Jekyll translates it.
- If the file does not include such a header,
Jekyll simply copies it as-is.
All of this gives us a simple way to publish an R Markdown website:
1. Make sure the project is configured to publish from the root directory of the `master` branch.
2. Compile our R Markdown files locally to create HTML in the root directory of our local copy.
3. Commit that HTML to the `master` branch.
4. Push.
This works because the HTML files generated by `knitr` don't contain YAML headers,
so they are copied as-is.
If we want to style those pages with our own CSS or add some JavaScript,
we can tell `knitr` to include files of our choice during translation:
```markdown
---
title: "Defenestration By Country"
output:
html_document:
includes:
in_header: extra-header.html
after_body: extra-footer.html
---
...content...
```
Behind the scenes,
`knitr` translates our R Markdown into plain Markdown,
which is then turned into HTML by yet another tool called [Pandoc][pandoc].
If we are brave,
we can create an entirely new Pandoc HTML template and format our files with that:
```markdown
---
title: "Defenestration by Season"
output:
html_document:
template: seasonal-report.html
---
...content...
```
This works,
but having the generated HTML in our root directory is messy.
Given that we can configure GitHub Pages to publish from the `docs` folder,
why don't we put our HTML there?
After all,
`knitr::knit` has an `output` parameter with which we can specify a location for the output file.
The answer is that `knitr` becomes rather vexed when the output directory is not the same as
the current working directory.
Programmers being programmers,
there are several ways around this:
1. Put up with it.
2. Write a small function that changes the current wording directory to `docs`,
knits `../report.Rmd`,
then changes the working directory back to the project root.
3. Use something like [Make] to build everything on the command line
and then move all the generated files into `docs`.
None of this is made any less frustrating by the fact that other tools in the `knitr` family,
such as Bookdown,
*do* allow users to specify the output directory through a configuration parameter.
## Key Points
```{r keypoints, child="keypoints/rmarkdown.md"}
```
```{r links, child="etc/links.md"}
```