-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathInferentialReport.Rmd
138 lines (95 loc) · 4.52 KB
/
InferentialReport.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Inferential Exercise"
author: "bbneo"
date: "03/18/2015"
output: pdf_document
---
# The Effect of Vitamin C on Tooth Growth in Guinea Pigs
## Introduction
The `ToothGrowth` dataset in `R` is a set of measurements on:
"length of odontoblasts (teeth) in each of **10 guinea pigs** at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid)."
```{r}
data(ToothGrowth)
```
See:
```{r, eval=FALSE}
help(ToothGrowth)
```
In this exercise, we will perform some basic exploratory and inferential data analysis on the `ToothGrowth` dataset.
## Basic Summary of the ToothGrowth Data
```{r, collapse=TRUE}
data(ToothGrowth)
summary(ToothGrowth)
head(ToothGrowth)
with(ToothGrowth, table(dose,supp))
```
## Statistical Effects of Dose & Supplement
### Overview
Summary boxplots from the data suggest some inferential questions to test.
```{r, echo=FALSE}
par(mfrow=c(1,2))
boxplot(len ~ supp, data = ToothGrowth,
main="By Supplement",
ylab="Length")
boxplot(len ~ dose, data = ToothGrowth,
main="By Dose level",
ylab="Length")
```
(The lower and upper ends of the [box-and-whisker plot](http://en.wikipedia.org/wiki/Box_plot) boxes represent the 25th and 75th quartiles of the data. The dashed lines represent the extent to the minimum and maximum values of the data, and the midline across the box represents the *median* of the data.)
There seems to be a clear dose response relationship between the supplements and toothgrowth, but no *clear* difference between the supplement types.
### Subsetting Growth by Dose and Supplement Type
Breaking the dose response relationship out *by* supplement type highlights some differences between the supplement types at different doses.
Although there does not seem to be a significant difference in growth between the supplements at the *highest* dose level, at the low and middle dose levels, the `OJ` supplement demonstrates more growth. I will present the statistical test results below.
```{r, echo=FALSE}
par(mfrow=c(1,2))
boxplot(len ~ dose,
data = ToothGrowth[ToothGrowth$supp=="OJ",],
ylim=c(0,35),
main="Suppl OJ",
xlab="Dose",
ylab="Length"
)
boxplot(len ~ dose,
data = ToothGrowth[ToothGrowth$supp=="VC",],
ylim=c(0,35),
main="Suppl VC",
xlab="Dose",
ylab="Length"
)
```
### Hypothesis Tests of growth vs. supplement *Type*
The underlying assumption to this analysis is that the growth measurements at given dose and supplement type are iid random variables. With small sample sizes of `10`, this makes the *t-test* appropriate. The sample groups are not paired (different animals). Based on the wide variation in the boxplot box sizes above, it seems prudent to use the default assumption of *unequal* variances between the groups.
#### Overall
Over the data for *all* dose levels, we cannot reject the hypothesis that the difference in tooth growth between the supplement types is zero, because the 95% CI includes zero.
```{r}
t.test( len ~ supp, ToothGrowth )
```
#### Comparing at the Same Dose Levels
Tooth growth between supplements at the *same* dose levels does show some significant differences though.
The table below summarizes *t*-test hypothesis testing results for the null hypothesis that the *mean difference in tooth growth between supplement types* is zero at different dose levels.
```{r, echo=FALSE}
#
# The `R` code below will build up a result dataframe with t-test results for
# the tooth growth between supplements, by dose levels:
#
dose <- c(0.5,1.0,2.0)
results <- data.frame()
for (d in dose) {
tdose <- t.test( len ~ supp,
subset(ToothGrowth,
dose==d))
row <- with(tdose,
cbind(d,statistic,parameter,
conf.int[1],conf.int[2],
p.value,estimate[1],estimate[2])
)
results <- rbind(results, row)
}
names(results) <- c("Dose","t.stat","df",
"95CI.lo", "95CI.hi",
"pval", "mean.OJ", "mean.VC")
round(results,3)
```
As the boxplots above suggest, the hypothesis testing results support the conclusions that:
1. Tooth growth is significantly greater at the 0.5 and 1.0 doses with `OJ`. The 95% CI does not include zero. (p-values = `r round(c(results[dose==0.5,]$pval,results[dose==1.0,]$pval),3)`, respectively).
2. At the highest dose level, 2.0, there is no significant difference in tooth growth. The 95% CI includes zero.