-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathB5_plotting.Rmd
242 lines (173 loc) · 7.17 KB
/
B5_plotting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
title: 'Plotting and Graphics'
---
Visit these sites for some ideas.
- http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html
- http://gallery.r-enthusiasts.com/
- http://cran.r-project.org/web/views/Graphics.html
## Base R graphics
The command `plot(x,y)` will plot vector x as the independent
variable and vector y as the dependent variable. Within the command
line, you can specify the title of the graph, the name of the x-axis,
and the name of the y-axis.
- main=’title’
- xlab=’name of x axis’
- ylab=’name of y axis’
The command `lines(x,y)` adds a line segment to an existing plot. The
command `points(x,y)` adds points to the plot. A legend can be created
using `legend`, though getting the legend right for base graphics can
be a bit challenging. To get a basic idea of what R offers, it has a
build-in demo that can be run with `demo(graphics)`.
Try this yourself:
```{r}
x = 1:100
y = rnorm(100,3,1) # 100 random normal deviates with mean=3, sd=1
plot(x,y)
```
```{r}
plot(x,y,main='My First Plot')
```
```{r}
# change point type
plot(x,y,pch=3)
```
```{r}
# change color
plot(x,y,pch=4,col=2)
# draw lines between points
lines(x,y,col=3)
```
```{r}
z=sort(y)
# plot a sorted variable vs x
plot(x,z,main='Random Normal Numbers',
xlab='Index',ylab='Random Number')
```
```{r}
# A basic histogram
hist(z, main="Histogram",
sub="Random normal")
# A "density" plot
plot(density(z), main="Density plot",
sub="Random normal")
# A smaller "bandwidth" to capture more detail
plot(density(z, adjust=0.5),
sub="smaller bandwidth")
```
## Plotting with ggplot2
The `ggplot2` package is a relatively novel approach to generating highly informative publication-quality graphics. The "gg" stands for "Grammar of Graphics". In short, instead of thinking about a single function that produces a plot, `ggplot2` uses a "grammar" approach, akin to building more and more complex sentences to layer on more information or nuance. See the [ggplot2 graphics gallery](http://www.r-graph-gallery.com/portfolio/ggplot2-package/) for some examples _with accompanying code_.
The `ggplot2` package assumes that data are in the form of a data.frame. In some cases, the data will need to be manipulated into a form that matches assumptions that `ggplot2` uses. In particular, if one has a *matrix* of numbers associated with different subjects (samples, people, etc.), the data will usually need to be transformed into a "long" data frame.
To use the `ggplot2` package, it must be installed and loaded. Assuming that installation has been done already, we can load the package directly:
```{r load}
library(ggplot2)
```
### mtcars data
We are going to use the mtcars dataset, included with R, to experiment with `ggplot2`.
```{r mtcarsLoad}
data(mtcars)
```
- Exercise: Explore the `mtcars` dataset using `View`, `summary`, `dim`, `class`, etc.
We can also take a quick look at the relationships between the variables using the `pairs` plotting function.
```{r mtcarsPlot,fig.show="hide"}
pairs(mtcars)
```
That is a useful view of the data. We want to use `ggplot2` to make an
informative plot, so let's approach this in a piecewise fashion. We
first need to decide what type of plot to produce and what our basic
variables will be. In this case, we have a number of choices.
```{r ggplotS1,eval=FALSE}
ggplot(mtcars,aes(x=disp,y=hp))
```
First, a little explanation is necessary. The `ggplot` function takes
as its first argument a `data.frame`. The second argument is the
"aesthetic", `aes`. The `x` and `y` take column names from the
`mtcars` `data.frame` and will form the basis of our scatter plot.
But why did we get that "Error: No layers in plot"? Remember that
*ggplot2* is a "grammar of graphics". We supplied a subject, but no
verb (called a *layer* by ggplot2). So, to generate a plot, we need to
supply a verb. There are many possibilities. Each "verb" or *layer*
typically starts with "geom" and then a descriptor. An example is
necessary.
```{r ggplotS2}
ggplot(mtcars,aes(x=disp,y=hp)) + geom_point()
```
We finally produced a plot. The power of *ggplot2*, though, is the
ability to make very rich plots by adding "grammar" to the "plot
sentence". We have a number of other variables in our `mtcars`
`data.frame`. How can we add another value to a two-dimensional plot?
```{r ggplotS3}
ggplot(mtcars,aes(x=disp,y=hp,color=cyl)) + geom_point()
```
The color of the points is a based on the numeric variable `wt`, the weight of the car. Can we do more? We can change the size of the points, also.
```{r ggplotS4}
ggplot(mtcars,aes(x=disp,y=hp,color=wt,size=mpg)) + geom_point()
```
So, on our 2D plot, we are now plotting four variables. Can we do
more? We can manipulate the shape of the points in addition to the
color and the size.
```{r ggplotS5,eval=FALSE}
ggplot(mtcars,aes(x=disp,y=hp)) + geom_point(aes(size=mpg,color=wt,shape=cyl))
```
Why did we get that error? Ggplot2 is trying to be helpful by telling
us that a "continuous varialbe cannot be mapped to 'shape'". Well, in
our `mtcars` `data.frame`, we can look at `cyl` in detail.
```{r cylCheck}
class(mtcars$cyl)
summary(mtcars$cyl)
table(mtcars$cyl)
```
The `cyl` variable is "kinda" continuous in that it is numeric, but it
could also be thought of as a "category" of engines. R has a specific
data type for "category" data, called a *factor*. We can easily
convert the `cyl` column to a factor like so:
```{r cylFactor}
mtcars$cyl = as.factor(mtcars$cyl)
```
Now, we can go ahead with our previous approach to make a
2-dimensional plot that displays the relationships between *five*
variables.
```{r ggplotS6}
ggplot(mtcars,aes(x=disp,y=hp)) + geom_point(aes(size=mpg,color=wt,shape=cyl))
```
- *Additional exercises*
- Use `geom_text` to add labels to your plot.
- Convert all your work to [plotly](https://plot.ly/ggplot2/) for interactive versions of the plots.
### NYC Flight data
I leave this section open-ended for you to explore further options
with the *ggplot2* package. The data represent the on-time data for
all flights that departed New York City in 2013.
```{r nycflights1}
# install.packages('nycflights13')
library(nycflights13)
data(flights)
head(flights)
```
Use ggplot and other plotting tools to explore the data and look for
features that might contribute to airport delays. Consider using other
"geoms" during your exploration.
## Graphics Devices and Saving Plots
To make a plot directly to a file use: `png()`, `postscript()`,
etc.
```{r}
png(file="myplot.png",width=480,height=480)
plot(density(z,adjust=2.0),sub="larger bandwidth")
dev.off()
```
On your own, save a pdf to a file. NOTE: The dimensions in `pdf()` are in *inches*.
To put multiple plots on a page, we can set the `mfrow` graphics
parameter.
```{r}
par(mfrow=c(2,1))
plot(density(z,adjust=2.0),sub="larger bandwidth")
hist(z)
```
# use dev.off() to turn off the two-row plotting
R can have multiple graphics “devices” open.
- To see a list of active devices: `dev.list()`
- To close the most recent device: `dev.off()`
- To close device 5: `dev.off(5)`
- To use device 5: `dev.set(5)`
```{r setup,echo=FALSE,results="hide",warning=FALSE,message=FALSE}
library(knitr)
opts_chunk$set(warning=FALSE,message=FALSE,cache=TRUE)
```