Skip to content

Commit

Permalink
A vignette about labelled vectors
Browse files Browse the repository at this point in the history
  • Loading branch information
larmarange committed Sep 29, 2024
1 parent 82a49b7 commit c037a71
Show file tree
Hide file tree
Showing 2 changed files with 113 additions and 0 deletions.
113 changes: 113 additions & 0 deletions vignettes/articles/labelled.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: "Using labelled vectors (`haven_labelled` class) with REDCapTidieR"
output: rmarkdown::html_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

```{r, eval=!(Sys.getenv("NOT_CRAN") == "true"), include=FALSE}
knitr::knit_exit()
```

## Several options for importing categorical variables

When importing data from REDCap using `read_redcap()`, you have several options determining how to import coded values.

```{r, include = FALSE}
# Load credentials
redcap_uri <- Sys.getenv("REDCAP_URI")
superheroes_token <- Sys.getenv("SUPERHEROES_REDCAP_API")
library(REDCapTidieR)
```

``` r
library(REDCapTidieR)
superheroes_token <- "123456789ABCDEF123456789ABCDEF04"
redcap_uri <- "https://my.institution.edu/redcap/api/"
```

If you use `raw_or_label = "raw"`, you will get the raw coded values for categorical variables, keeping the original coding of your data. However, you will use the information regarding the meaning of each code. You will have to get from REDCap a dictionary table explaining the meaning of each code.

```{r}
superheroes <-

Check warning on line 37 in vignettes/articles/labelled.Rmd

View workflow job for this annotation

GitHub Actions / lint

file=vignettes/articles/labelled.Rmd,line=37,col=15,[trailing_whitespace_linter] Trailing whitespace is superfluous.
read_redcap(
redcap_uri,
superheroes_token,
raw_or_label = "raw"
) |>
extract_tibble("heroes_information")
superheroes
```

Alternatively, you could opt for `raw_or_label = "label"` (the default) where each code will be replaced the corresponding label and all categorical variables will be transformed into factors, ready to be used for analysis. But, here, you will lose the original coding of the data. It could be problematic if you need to keep a track of original codes (e.g. for data cleaning) or if you intend to re-export the data at a latter step (e.g. in Stata or SPSS format) where it would be relevant to keep the original coding.

```{r}
superheroes <-

Check warning on line 50 in vignettes/articles/labelled.Rmd

View workflow job for this annotation

GitHub Actions / lint

file=vignettes/articles/labelled.Rmd,line=50,col=15,[trailing_whitespace_linter] Trailing whitespace is superfluous.
read_redcap(
redcap_uri,
superheroes_token,
raw_or_label = "label"
) |>
extract_tibble("heroes_information")
superheroes
```

A third and final option is to opt for `raw_or_label = "haven_labelled"`. In that case, categorical variables will be imported as labelled vectors, using the `"haven_labelled"` class introduced by the `{haven}` package (cf. `vignette("semantics", package = "haven")`). In this case, your categorical variables will be imported using their original coding and the corresponding value labels will be attached to them as meta-data.

```{r}
superheroes <-

Check warning on line 63 in vignettes/articles/labelled.Rmd

View workflow job for this annotation

GitHub Actions / lint

file=vignettes/articles/labelled.Rmd,line=63,col=15,[trailing_whitespace_linter] Trailing whitespace is superfluous.
read_redcap(
redcap_uri,
superheroes_token,
raw_or_label = "haven_labelled"
) |>
extract_tibble("heroes_information")
superheroes
```

## Pros & Cons of labelled vectors

The `"haven_labelled"` was initially developed for importing data from SPSS, Stata or SAS who use values labels to store categorical variables. This format allows to store both the original coding and the labels attached to each value.

The `{labelled}` package provides several functions to manipulate value labels, such as `labelled::set_value_labels()`, `labelled::get_value_labels()`, `labelled::add_value_labels()` or `labelled::remove_value_labels()`.

It is possible to search through the variables and/or to generate a variable dictionary using `labelled::look_for()` (cf. `vignette("look_for", package = "labelled")`).

```{r}
library(labelled)
superheroes |> look_for()
```

However, labelled vectors are not intended for data analysis. For descriptive statistics, plots, or model computing, categorical variables should be coded as factors. It could be easily done with `labelled::to_factor()` or `labelled::unlabelled()` (both could be applied to a full data frame). If you opt for importing your data as labelled vectors, you should therefore chose one of the two following approaches.

![](labelled_approaches.png)

In **approach A**, `haven_labelled` vectors are converted into factors or into numeric/character vectors just after data import, using `labelled::unlabelled()`, `labelled::to_factor()` or `unclass()`. Then, data cleaning, recoding and analysis are performed using classic **R** vector types.

In **approach B**, `haven_labelled` vectors are kept for data cleaning and coding, allowing to preserved original recoding, in particular if data should be re-exported after that step. Functions provided by `{labelled}` will be useful for managing value labels. However, as in approach A, `haven_labelled` vectors will have to be converted into classic factors or numeric vectors before data analysis (in particular modelling) as this is the way categorical and continuous variables should be coded for analysis functions.

## Variable labels

Variable labels should not be confounded with value labels. A variable label is a textual description of a variable and does not modify the class of the vector, while values labels are a textual description of certain values of a vector. Adding a value label modifies the class of the vector into `"haven_labelled"`.

The `{labelled}` package also provides function to manipulate variable labels, such as `labelled::set_variable_labels()` or `labelled::get_variable_labels()`.

The function `REDCapTidieR::make_labelled()` allows to add variable labels to data frames exported from REDCap.

```{r}
superheroes <-

Check warning on line 103 in vignettes/articles/labelled.Rmd

View workflow job for this annotation

GitHub Actions / lint

file=vignettes/articles/labelled.Rmd,line=103,col=15,[trailing_whitespace_linter] Trailing whitespace is superfluous.
read_redcap(
redcap_uri,
superheroes_token,
raw_or_label = "haven_labelled"
) |>
make_labelled() |>
extract_tibble("heroes_information")
superheroes |> look_for()
```
Binary file added vignettes/articles/labelled_approaches.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c037a71

Please sign in to comment.