This repository has been archived by the owner on Apr 2, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy path11-writing-data.Rmd
143 lines (109 loc) · 3.7 KB
/
11-writing-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
layout: page
title: R for reproducible scientific analysis
subtitle: Writing data
minutes: 20
---
```{r, include=FALSE}
source("tools/chunk-options.R")
opts_chunk$set(fig.path = "fig/11-writing-data-")
# Silently load in the data so the rest of the lesson works
library(ggplot2)
gapminder <- read.csv("data/gapminder-FiveYearData.csv", header=TRUE)
# Temporarily create a cleaned-data directory so that the writing examples work
# The students should have created this in topic 2.
dir.create("cleaned-data")
```
> ## Learning objectives {.objectives}
>
> * To be able to write out plots and data from R
>
### Saving plots
You have already seen how to save the most recent plot you create in `ggplot2`,
using the command `ggsave`. As a refresher:
```{r, eval=FALSE}
ggsave("My_most_recent_plot.pdf")
```
You can save a plot from within RStudio using the 'Export' button
in the 'Plot' window. This will give you the option of saving as a
.pdf or as .png, .jpg or other image formats.
Sometimes you will want to save plots without creating them in the
'Plot' window first. Perhaps you want to make a pdf document with
multiple pages: each one a different plot, for example. Or perhaps
you're looping through multiple subsets of a file, plotting data from
each subset, and you want to save each plot, but obviously can't stop
the loop to click 'Export' for each one.
In this case you can use a more flexible approach. The function
`pdf` creates a new pdf device. You can control the size and resolution
using the arguments to this function.
```{r, eval=FALSE}
pdf("Life_Exp_vs_time.pdf", width=12, height=4)
ggplot(data=gapminder, aes(x=year, y=lifeExp, colour=country)) +
geom_line()
# You then have to make sure to turn off the pdf device!
dev.off()
```
Open up this document and have a look.
> #### Challenge 1 {.challenge}
>
> Rewrite your 'pdf' command to print a second
> page in the pdf, showing a facet plot (hint: use `facet_grid`)
> of the same data with one panel per continent.
>
The commands `jpeg`, `png` etc. are used similarly to produce
documents in different formats.
### Writing data
At some point, you'll also want to write out data from R.
We can use the `write.table` function for this, which is
very similar to `read.table` from before.
Let's create a data-cleaning script, for this analysis, we
only want to focus on the gapminder data for Australia:
```{r}
aust_subset <- gapminder[gapminder$country == "Australia",]
write.table(aust_subset,
file="cleaned-data/gapminder-aus.csv",
sep=","
)
```
Let's switch back to the shell to take a look at the data to make sure it looks
OK:
```{r, engine='bash'}
head cleaned-data/gapminder-aus.csv
```
Hmm, that's not quite what we wanted. Where did all these
quotation marks come from? Also the row numbers are
meaningless.
Let's look at the help file to work out how to change this
behaviour.
```{r, eval=FALSE}
?write.table
```
By default R will wrap character vectors with quotation marks
when writing out to file. It will also write out the row and
column names.
Let's fix this:
```{r}
write.table(
gapminder[gapminder$country == "Australia",],
file="cleaned-data/gapminder-aus.csv",
sep=",", quote=FALSE, row.names=FALSE
)
```
Now lets look at the data again using our shell skills:
```{r, engine='bash'}
head cleaned-data/gapminder-aus.csv
```
That looks better!
> #### Challenge 2 {.challenge}
>
> Write a data-cleaning script file that subsets the gapminder
> data to include only data points collected since 1990.
>
> Use this script to write out the new subset to a file
> in the `cleaned-data/` directory.
>
```{r, echo=FALSE}
# We remove after rendering the lesson, because we don't want this in the lesson
# repository
unlink("cleaned-data", recursive=TRUE)
```