forked from columbia-data-club/intro-to-data
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathGraphics.Rmd
63 lines (52 loc) · 1.87 KB
/
Graphics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: "Graphics"
author: "Haoyu Zhang(hz2563)"
date: "2/17/2019"
output: slidy_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
data = read.csv("taxi-data.csv")
head(data)
```
## Bargraphs
```{r,echo=FALSE}
table(data$payment)
names(table(data$payment))
levels(data$payment)
barplot(height = table(data$payment), names.arg = names(table(data$payment)))
```
```{r}
#hist(data$payment)
```
## histogram
```{r}
#hist(data$payment) #see the difference between barplot and histogram
hist(data$payment_type, main = "Histogram of Payment_type")
hist(data$tip, main = "Histogram of Tip")
hist(data$tip, breaks= max(data$tip), main = "Histogram of Tip", xlim = c(-5,20),col = rainbow(5))
```
## Boxplot
Covariation is the tendency for the values of two or more variables to vary together in a related way.
Visualize the distribution of a categorical variable and a continuous variable using a boxplot.
```{r}
data1=data[data$total!=2626.3,]
#InterQuartile Range,IQR = 75%quantile - 25%quantile
boxplot(total~payment_type, data = data1, range = 2,ylab="Total expenses", xlab="Payment_type",main="Total vs Payment_type",ylim=c(-5,100))
# range parameter help you identify the outliers (default setting range = 1.5*IQR)
boxplot(distance~payment_type, data = data, ylim=c(-5,10), varwidth=F, names=levels(data$payment))
```
## Scatter
```{r}
plot(data$distance, data$tip, xlab = "Distance", ylab = "Tip amount", main = "Tip vs Distance", type = "p")
```
```{r}
# colors using the functions rainbow(n), heat.colors(n), terrain.colors(n), topo.colors(n), and cm.colors(n)
levels(data$payment)
plot(data$distance, data$tip, xlab = "Distance", ylab = "Tip amount", main = "Tip vs Distance", col = terrain.colors(5)[data$payment], pch=16)
legend("topright", legend = levels(data$payment), fill = terrain.colors(5), cex = 1.0)
# colors() returns all color names
abline(20,2)
```