diff --git a/pkgdown/_pkgdown.yml b/pkgdown/_pkgdown.yml index ede73fed..a188ddc1 100644 --- a/pkgdown/_pkgdown.yml +++ b/pkgdown/_pkgdown.yml @@ -31,6 +31,8 @@ navbar: - text: "Exporting to Excel" desc: "Convert Data Tibbles to XLSX Sheets" href: articles/export_to_xlsx.html + - text: "Using Labelled Vectors with REDCapTidieR" + href: articles/labelled.html search: exclude: ['news/index.html'] diff --git a/tests/testthat/_snaps/write.md b/tests/testthat/_snaps/write.md index d77388fc..c6943f93 100644 --- a/tests/testthat/_snaps/write.md +++ b/tests/testthat/_snaps/write.md @@ -16,7 +16,8 @@ 9 api_no_access_2 API No Access 2 10 survey Survey 11 repeat_survey Repeat Survey - 12 REDCap Metadata + 12 labelled_vignette Labelled Vignette + 13 REDCap Metadata Repeating or Nonrepeating? # of Rows in Data # of Columns in Data 2 structure data_rows data_cols 3 nonrepeating 4 4 @@ -28,19 +29,21 @@ 9 nonrepeating 4 5 10 nonrepeating 4 9 11 repeating 3 10 - 12 + 12 nonrepeating 4 7 + 13 Data size in Memory % of Data Missing NA Sheet # 2 data_size data_na_pct form_complete_pct Sheet # 3 2.28 kB 0.25 0 1 4 1.94 kB 0.5 0 2 5 2.58 kB 0 0 3 - 6 7.71 kB 0.293103448275862 0 4 + 6 7.71 kB 0.28448275862069 0 4 7 7.40 kB 0.75 0 5 8 1.78 kB 1 0 6 9 2.06 kB 1 0 7 10 3.73 kB 0.392857142857143 0 8 11 3.94 kB 0.142857142857143 0 9 - 12 10 + 12 3.04 kB 0 0 10 + 13 11 [[1]][[2]] Record ID Text Box Input Text Box Input REDCap Instrument Completed? @@ -77,7 +80,7 @@ 2 record_id text note calculated dropdown_single radio_single 3 1 text notes 2 one B 4 2 2 three C - 5 3 + 5 3 2 6 4 2 NA NA NA 2 radio_duplicate_label checkbox_multiple___1 checkbox_multiple___2 @@ -225,6 +228,20 @@ 5 2022-11-09 12:21:04 Complete [[1]][[11]] + Record ID Text Box Radio Buttons Checkbox: A Checkbox: B Checkbox: C + 2 record_id text_box_1 radio_buttons_1 checkbox___1 checkbox___2 checkbox___3 + 3 1 Record 1 A Checked Unchecked Unchecked + 4 2 Record 2 B Checked Checked Unchecked + 5 3 Record 3 C Unchecked Checked Checked + 6 4 Record 4 A Unchecked Unchecked Unchecked + REDCap Instrument Completed? + 2 form_status_complete + 3 Complete + 4 Complete + 5 Complete + 6 Complete + + [[1]][[12]] REDCap Instrument Name REDCap Instrument Description 2 redcap_form_name redcap_form_label 3 @@ -293,6 +310,11 @@ 66 repeat_survey Repeat Survey 67 repeat_survey Repeat Survey 68 repeat_survey Repeat Survey + 69 labelled_vignette Labelled Vignette + 70 labelled_vignette Labelled Vignette + 71 labelled_vignette Labelled Vignette + 72 labelled_vignette Labelled Vignette + 73 labelled_vignette Labelled Vignette Variable / Field Name 2 field_name 3 record_id @@ -361,6 +383,11 @@ 66 repeatsurvey_checkbox_v2___one 67 repeatsurvey_checkbox_v2___two 68 repeatsurvey_checkbox_v2___three + 69 text_box_1 + 70 radio_buttons_1 + 71 checkbox___1 + 72 checkbox___2 + 73 checkbox___3 Field Label Field Type 2 field_label field_type 3 Record ID text @@ -429,6 +456,11 @@ 66 Checkbox Field: Choice 1 checkbox 67 Checkbox Field: Choice 2 checkbox 68 Checkbox Field: Choice 3 checkbox + 69 Text Box text + 70 Radio Buttons radio + 71 Checkbox: A checkbox + 72 Checkbox: B checkbox + 73 Checkbox: C checkbox Section Header Prior to this Field 2 section_header 3 @@ -497,6 +529,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Choices, Calculations, or Slider Labels 2 select_choices_or_calculations 3 @@ -565,6 +602,11 @@ 66 one, Choice 1 | two, Choice 2 | three, Choice 3 67 one, Choice 1 | two, Choice 2 | three, Choice 3 68 one, Choice 1 | two, Choice 2 | three, Choice 3 + 69 + 70 1, A | 2, B | 3, C + 71 1, A | 2, B | 3, C + 72 1, A | 2, B | 3, C + 73 1, A | 2, B | 3, C Field Note Text Validation Type OR Show Slider Number 2 field_note text_validation_type_or_show_slider_number 3 @@ -633,6 +675,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Minimum Accepted Value for Text Validation 2 text_validation_min 3 @@ -701,6 +748,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Maximum Accepted Value for Text Validation Is this Field an Identifier? 2 text_validation_max identifier 3 @@ -769,6 +821,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Branching Logic (Show field only if...) Is this Field Required? 2 branching_logic required_field 3 @@ -837,6 +894,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Custom Alignment Question Number (surveys only) Matrix Group Name 2 custom_alignment question_number matrix_group_name 3 @@ -905,6 +967,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Matrix Ranking? Field Annotation Data Type Count of Missing Values 2 matrix_ranking field_annotation skim_type n_missing 3 @@ -916,7 +983,7 @@ 9 character 0 10 character 3 11 character 3 - 12 numeric 1 + 12 numeric 0 13 factor 2 14 factor 2 15 factor 4 @@ -973,6 +1040,11 @@ 66 logical 0 67 logical 0 68 logical 0 + 69 character 0 + 70 factor 0 + 71 logical 0 + 72 logical 0 + 73 logical 0 Proportion of Non-Missing Values Shortest Value (Fewest Characters) 2 complete_rate character.min 3 @@ -984,7 +1056,7 @@ 9 1 1 10 0.25 4 11 0.25 5 - 12 0.75 + 12 1 13 0.5 14 0.5 15 0 @@ -1041,6 +1113,11 @@ 66 1 67 1 68 1 + 69 1 8 + 70 1 + 71 1 + 72 1 + 73 1 Longest Value (Most Characters) Count of Empty Values Count of Unique Values 2 character.max character.empty character.n_unique 3 @@ -1109,6 +1186,11 @@ 66 67 68 + 69 8 0 4 + 70 + 71 + 72 + 73 Count of Values that are all Whitespace Mean Standard Deviation 2 character.whitespace numeric.mean numeric.sd 3 @@ -1177,6 +1259,11 @@ 66 67 68 + 69 0 + 70 + 71 + 72 + 73 Minimum 25th Percentile Median 75th Percentile Maximum 2 numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 3 @@ -1245,6 +1332,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Histogram Is the Categorical Value Ordered? Count of Unique Values 2 numeric.hist factor.ordered factor.n_unique 3 @@ -1313,6 +1405,11 @@ 66 67 68 + 69 + 70 FALSE 3 + 71 + 72 + 73 Most Frequent Values Proportion of TRUE Values Count of Logical Values 2 factor.top_counts logical.mean logical.count 3 @@ -1381,6 +1478,11 @@ 66 0.666666666666667 TRU: 2, FAL: 1 67 0.333333333333333 FAL: 2, TRU: 1 68 0.333333333333333 FAL: 2, TRU: 1 + 69 + 70 A: 2, B: 1, C: 1 + 71 0.5 FAL: 2, TRU: 2 + 72 0.5 FAL: 2, TRU: 2 + 73 0.25 FAL: 3, TRU: 1 Earliest Latest Median Count of Unique Values Earliest 2 Date.min Date.max Date.median Date.n_unique POSIXct.min 3 @@ -1449,6 +1551,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Latest Median Count of Unique Values Minimum 2 POSIXct.max POSIXct.median POSIXct.n_unique difftime.min 3 @@ -1517,6 +1624,11 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 Maximum Median Count of Unique Values 2 difftime.max difftime.median difftime.n_unique 3 @@ -1585,11 +1697,16 @@ 66 67 68 + 69 + 70 + 71 + 72 + 73 [[2]] tab_name tab_sheet tab_ref - 1 Table1 1 A2:I12 + 1 Table1 1 A2:I13 2 Table2 2 A2:D6 3 Table3 3 A2:D6 4 Table4 4 A2:E6 @@ -1599,9 +1716,10 @@ 8 Table8 8 A2:E6 9 Table9 9 A2:I6 10 Table10 10 A2:J5 - 11 Table11 11 A2:AZ68 + 11 Table11 11 A2:G6 + 12 Table12 12 A2:AZ73 tab_xml - 1
+ 1
2
3
4
@@ -1611,7 +1729,8 @@ 8
9
10
- 11
+ 11
+ 12
tab_act 1 1 2 1 @@ -1624,6 +1743,7 @@ 9 1 10 1 11 1 + 12 1 [[3]] [[3]]$fileVersion @@ -1661,7 +1781,8 @@ [8] "" [9] "" [10] "" - [11] "" + [11] "" + [12] "" [[3]]$functionGroups NULL @@ -1715,11 +1836,12 @@ [9] "" [10] "" [11] "" - [12] "" - [13] "" + [12] "" + [13] "" + [14] "" [[5]] - [1] 1 2 3 4 5 6 7 8 9 10 11 + [1] 1 2 3 4 5 6 7 8 9 10 11 12 [[6]] [1] "Table of Contents" "Nonrepeated" @@ -1727,6 +1849,6 @@ [5] "Data Field Types" "Text Input Validation Types" [7] "API No Access" "API No Access 2" [9] "Survey" "Repeat Survey" - [11] "REDCap Metadata" + [11] "Labelled Vignette" "REDCap Metadata" diff --git a/tests/testthat/test-utils.R b/tests/testthat/test-utils.R index 299be04e..fd5053fc 100644 --- a/tests/testthat/test-utils.R +++ b/tests/testthat/test-utils.R @@ -219,8 +219,10 @@ test_that("link_arms works", { expect_s3_class(out, "tbl") # output contains expected columns - expected_cols <- c("arm_num", "unique_event_name", "form", "arm_name", - "event_name", "custom_event_label", "event_id") + expected_cols <- c( + "arm_num", "unique_event_name", "form", "arm_name", + "event_name", "custom_event_label", "event_id" + ) expect_setequal(expected_cols, names(out)) # all arms are represented in output (test redcap has 2 arms) diff --git a/vignettes/articles/images/labelled_approaches.png b/vignettes/articles/images/labelled_approaches.png new file mode 100644 index 00000000..9fa127f4 Binary files /dev/null and b/vignettes/articles/images/labelled_approaches.png differ diff --git a/vignettes/articles/labelled.Rmd b/vignettes/articles/labelled.Rmd new file mode 100644 index 00000000..1f8a2991 --- /dev/null +++ b/vignettes/articles/labelled.Rmd @@ -0,0 +1,142 @@ +--- +title: "Using Labelled Vectors with REDCapTidieR" +output: rmarkdown::html_document +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r, eval=!(Sys.getenv("NOT_CRAN") == "true"), include=FALSE} +knitr::knit_exit() +``` + +## Options for Importing Categorical Variables + +When importing data from REDCap using `read_redcap()`, you have several options for handling coded categorical variables. These options determine how the coded values are represented in your R environment. + +For this vignette, we will be using a sample [classic project](https://chop-cgtinformatics.github.io/REDCapTidieR/articles/glossary.html#classic-project) with a [form](https://chop-cgtinformatics.github.io/REDCapTidieR/articles/glossary.html#form) that comprises most common REDCap data types. + +```{r, include = FALSE} +# Load credentials +redcap_uri <- Sys.getenv("REDCAP_URI") +token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API") +library(REDCapTidieR) +``` + +``` r +library(REDCapTidieR) +token <- "123456789ABCDEF123456789ABCDEF04" +redcap_uri <- "https://my.institution.edu/redcap/api/" +``` + +Using `raw_or_label = "raw"` retrieves the raw coded values for categorical variables. This approach preserves the original coding, but you'll need to separately reference the data dictionary from REDCap to interpret the meaning of each code. + +```{r, warning=FALSE} +redcap_form <- + read_redcap( + redcap_uri, + token, + raw_or_label = "raw" + ) |> + extract_tibble("labelled_vignette") + +redcap_form +``` + +The default option, `raw_or_label = "label"`, replaces each code with its corresponding label and converts categorical variables into factors. This is convenient for analysis but discards the original numeric codes, which may be necessary for tasks like data cleaning or re-exporting to other formats (e.g., Stata or SPSS). + +```{r, warning=FALSE} +redcap_form <- + read_redcap( + redcap_uri, + token, + raw_or_label = "label" + ) |> + extract_tibble("labelled_vignette") + +redcap_form +``` + +A third option, `raw_or_label = "haven_labelled"`, imports categorical variables as labelled vectors using the "haven_labelled" class from the haven package (cf. `vignette("semantics", package = "haven")`). This method imports your categorical variables using their original coding and attaches the corresponding value labels to them as metadata. + +```{r, warning=FALSE} +redcap_form <- + read_redcap( + redcap_uri, + token, + raw_or_label = "haven" + ) |> + extract_tibble("labelled_vignette") + +redcap_form +``` + +## Pros & Cons of Labelled Vectors + +The `"haven_labelled"` class was originally developed to import data from statistical software like SPSS, Stata, or SAS, which use value labels for categorical variables. This format allows you store both the original coding and the labels attached to each value. + +### Advantages + +- **Preservation of Original Coding**: Both numeric codes and labels are retained, which is useful for data cleaning and re-exporting. +- **Metadata Management**: The labelled package offers functions to manage value labels effectively. + +You can manipulate value labels using functions such as: + +- `labelled::set_value_labels()` +- `labelled::get_value_labels()` +- `labelled::add_value_labels()` +- `labelled::remove_value_labels()` + +Additionally, you can search through variables or generate a variable dictionary with `labelled::look_for()` (cf. `vignette("look_for", package = "labelled")`): + +```{r} +library(labelled) +redcap_form |> + look_for() +``` + +### Disadvantages + +Labelled vectors are not optimized for data analysis tasks like descriptive statistics, plotting, or modeling. For these purposes, categorical variables should be converted to factors or numeric vectors. + +### Recommended Approaches + +![labelled Approaches](images/labelled_approaches.png) + +**Approach A**: Convert `haven_labelled` vectors to factors or numeric/character vectors just after import using functions like `labelled::unlabelled()`, `labelled::to_factor()`, or `unclass()`. Proceed with data cleaning, recoding, and analysis using standard R vector types. + +**Approach B**: Retain `haven_labelled` vectors for data cleaning and coding to preserve original labels, especially if you plan to re-export the data. Use labelled functions to manage value labels, but convert the vectors to factors or numeric types before performing analysis or modeling. + +## Managing Variable Labels + +It's important to distinguish between value labels and variable labels: + +- **Value Labels**: Describe the meaning of specific values within a vector and change the vector's class to `"haven_labelled"`. +- **Variable Labels**: Provide a textual description of the entire variable without altering its class. + +The labelled package offers functions to handle variable labels, such as: + +- `labelled::set_variable_labels()` +- `labelled::get_variable_labels()` + +Using `REDCapTidieR::make_labelled()` allows you to add variable labels to data frames exported from REDCap: + +```{r, warning=FALSE} +redcap_form <- + read_redcap( + redcap_uri, + token, + raw_or_label = "haven" + ) |> + make_labelled() |> + extract_tibble("labelled_vignette") + +redcap_form |> + look_for() +``` + +This ensures that your data not only retains value labels but also includes descriptive labels for each variable, enhancing the readability and usability of your dataset.