-
Notifications
You must be signed in to change notification settings - Fork 9
/
quirks.Rmd
92 lines (69 loc) · 2.17 KB
/
quirks.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: 'Quirks'
---
```{r, include=FALSE}
library(tidyverse)
library(tufte)
library(pander)
```
# Dealing with quirks of R {#quirks}
## Rownames are evil {- #rownames}
Historically 'row names' were used on R to label individual rows in a dataframe.
It turned out that this is generally a bad idea, because sorting and some
summary functions would get very confused and mix up row names and the data
itself.
It's now considered best practice to avoid row names for this reason.
Consequently, the functions in the `dplyr` library remove row names when
processing dataframes. For example here we see the row names in the `mtcars`
data:
```{r}
mtcars %>%
head(3)
```
But here we don't because `arrange` has stripped them:
```{r}
mtcars %>%
arrange(mpg) %>%
head(3)
```
Converting the results of `psych::describe()` also returns rownames, which can
get lots if we sort the data.
We see row names here:
```{r}
psych::describe(mtcars) %>%
as_data_frame() %>%
head(3)
```
But not here (just numbers in their place):
```{r}
psych::describe(mtcars) %>%
as_data_frame() %>%
arrange(mean) %>%
head(3)
```
##### Preserving row names {-}
If you want to preserve row names, it's best to convert the names to an extra
colum in the data. So, the example below does what we probably want:
```{r}
# the var='car.name' argument is optional, but can be useful
mtcars %>%
rownames_to_column(var="car.name") %>%
arrange(mpg) %>%
head(3)
```
<!-- TODO ADD THIS BACK WHEN THIS BUG FIXED: https://github.com/tidyverse/broom/issues/231 -->
<!-- Another good way of preserving row names when converting R objects to dataframes is to use the `broom` library. Its `tidy()` function often does something sensible to convert an object to a dataframe, and has other benefits too, like extracting the relevant parts of the output, and naming columns consistently. -->
<!-- Some example of `broom` in action: -->
<!-- ```{r} -->
<!-- psych::describe(mtcars, fast=T) %>% as_data_frame() -->
<!-- broom::tidy() %>% -->
<!-- pander -->
<!-- select(column, mean, sd, n) %>% -->
<!-- head(3) %>% -->
<!-- pander -->
<!-- ``` -->
```{r}
lm(mpg~wt, data=mtcars) %>%
broom::tidy() %>%
pander
```