-
Notifications
You must be signed in to change notification settings - Fork 9
/
crosstabulation.Rmd
89 lines (69 loc) · 2.23 KB
/
crosstabulation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: 'Crosstabulation'
---
```{r, include=FALSE, message=F}
library(tidyverse)
library(pander)
```
## Crosstabulations and $\chi^2$ {- #crosstabs}
We saw in a previous section
[how to create a frequency table of one or more variables](#frequency-tables).
Using that previous example, assume we already have a crosstabulation of `age`
and `prefers`
```{r, include=F}
lego.duplo.df <- readRDS("data/lego.RDS")
lego.table <- xtabs(~age+prefers, lego.duplo.df)
lego.duplo.df %>% glimpse
```
```{r}
lego.table
```
We can easily run the inferential $\chi^2$ (sometimes spelled "chi", but
pronounced "kai"-squared) test on this table:
```{r}
lego.test <- chisq.test(lego.table)
lego.test
```
Note that we can access each number in this output individually because the
`chisq.test` function returns a list. We do this by using the `$` syntax:
```{r}
# access the chi2 value alone
lego.test$statistic
```
Even nicer, you can use an R package to write up your results for you in APA
format!
```{r}
library(apa)
apa(lego.test, print_n=T)
```
[See more on automatically displaying statistics in APA format](#apa-output)
### Three-way tables {-}
You can also use `table()` or `xtabs()` to get 3-way tables of frequencies
(`xtabs` is probably better for this than `table`).
For example, using the `mtcars` dataset we create a 3-way table, and then
convert the result to a dataframe. This means we can print the table nicely in
RMarkdown using the `pander.table()` function, or process it further (e.g. by
[sorting](#sorting) or [reshaping](#reshaping) it).
```{r}
xtabs(~am+gear+cyl, mtcars) %>%
as_data_frame() %>%
pander()
```
Often, you will want to present a table in a wider format than this, to aid
comparisons between categories. For example, we might want our table to make it
easy to compare between US and non-US cars for each different number of
cylinders:
```{r}
xtabs(~am+gear+cyl, mtcars) %>%
as_data_frame() %>%
reshape2::dcast(am+gear~paste(cyl, "Cylinders")) %>%
pander()
```
Or our primary question might be related to the effect of `am`, in which case we
might prefer to incude separate columns for US and non-US cars:
```{r}
xtabs(~am+gear+cyl, mtcars) %>%
as_data_frame() %>%
reshape2::dcast(gear+cyl~paste0("US=", am)) %>%
pander()
```