InferentialReport.Rmd

---
title: "Inferential Exercise"
author: "bbneo"
date: "03/18/2015"
output: pdf_document
---

# The Effect of Vitamin C on Tooth Growth in Guinea Pigs

## Introduction

The `ToothGrowth` dataset in `R` is a set of measurements on:

"length of odontoblasts (teeth) in each of **10   guinea pigs** at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid)."

```{r}
data(ToothGrowth)
```

See:

```{r, eval=FALSE}
help(ToothGrowth)
```

In this exercise, we will perform some basic exploratory and inferential data analysis on the `ToothGrowth` dataset.

## Basic Summary of the ToothGrowth Data

```{r, collapse=TRUE}
data(ToothGrowth)
summary(ToothGrowth)
head(ToothGrowth)
with(ToothGrowth, table(dose,supp))
```

## Statistical Effects of Dose & Supplement

### Overview 

Summary boxplots from the data suggest some inferential questions to test.


```{r, echo=FALSE}
par(mfrow=c(1,2))

boxplot(len ~ supp, data = ToothGrowth,
        main="By Supplement", 
        ylab="Length")

boxplot(len ~ dose, data = ToothGrowth,
        main="By Dose level", 
        ylab="Length")
```

(The lower and upper ends of the [box-and-whisker plot](http://en.wikipedia.org/wiki/Box_plot) boxes represent the 25th and 75th quartiles of the data.  The dashed lines represent the extent to the minimum and maximum values of the data, and the midline across the box represents the *median* of the data.)

There seems to be a clear dose response relationship between the supplements and toothgrowth, but no *clear* difference between the supplement types.

### Subsetting Growth by Dose and Supplement Type

Breaking the dose response relationship out *by* supplement type highlights some differences between the supplement types at different doses. 

Although there does not seem to be a significant difference in growth between the supplements at the *highest* dose level, at the low and middle dose levels, the `OJ` supplement demonstrates more growth.  I will present the statistical test results below.


```{r, echo=FALSE}
par(mfrow=c(1,2))

boxplot(len ~ dose, 
        data = ToothGrowth[ToothGrowth$supp=="OJ",],
        ylim=c(0,35),
        main="Suppl OJ",
        xlab="Dose",
        ylab="Length"
        )

boxplot(len ~ dose, 
        data = ToothGrowth[ToothGrowth$supp=="VC",],
        ylim=c(0,35),
        main="Suppl VC",
        xlab="Dose",
        ylab="Length"
        )
```


### Hypothesis Tests of growth vs. supplement *Type*

The underlying assumption to this analysis is that the growth measurements at given dose and supplement type are iid random variables.  With small sample sizes of `10`, this makes the *t-test* appropriate.  The sample groups are not paired (different animals).  Based on the wide variation in the boxplot box sizes above, it seems prudent to use the default assumption of *unequal* variances between the groups.  

#### Overall

Over the data for *all* dose levels, we cannot reject the hypothesis that the difference in tooth growth between the supplement types is zero, because the 95% CI includes zero.  

```{r}
t.test( len ~ supp, ToothGrowth )
```

#### Comparing at the Same Dose Levels

Tooth growth between supplements at the *same* dose levels does show some significant differences though.  

The table below summarizes *t*-test hypothesis testing results for the null hypothesis that the *mean difference in tooth growth between supplement types* is zero at different dose levels. 

```{r, echo=FALSE}
# 
# The `R` code below will build up a result dataframe with t-test results for 
# the tooth growth between supplements, by dose levels:
# 
dose <- c(0.5,1.0,2.0)

results <- data.frame()
for (d in dose) {
  tdose <- t.test( len ~ supp, 
                   subset(ToothGrowth,
                          dose==d))
  row <- with(tdose, 
              cbind(d,statistic,parameter,
                    conf.int[1],conf.int[2],
                    p.value,estimate[1],estimate[2])
              )
  results <- rbind(results, row)
}

names(results) <- c("Dose","t.stat","df",
                    "95CI.lo", "95CI.hi",
                    "pval", "mean.OJ", "mean.VC")

round(results,3)
```

As the boxplots above suggest, the hypothesis testing results support the conclusions that:

1. Tooth growth is significantly greater at the 0.5 and 1.0 doses with `OJ`.  The 95% CI does not include zero.  (p-values = `r round(c(results[dose==0.5,]$pval,results[dose==1.0,]$pval),3)`, respectively).  

2. At the highest dose level, 2.0, there is no significant difference in tooth growth.  The 95% CI includes zero.