Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Missing Data Codes #182

Merged
merged 5 commits into from
Mar 25, 2024
Merged

Handle Missing Data Codes #182

merged 5 commits into from
Mar 25, 2024

Conversation

ezraporter
Copy link
Collaborator

Description

This PR updates our handling of redcap projects using missing data codes. The new behavior is:

  • If raw_or_label = "raw" missing data codes kept in the data as it comes back from the API
  • Otherwise:
    • Non-logical values are converted to NA in yesno, truefalse, and checkbox fields with a warning if this occurs
    • Values with no associated label in the metadata are converted to NA in dropdown and radio fields with a warning if this occurs
    • Missing data codes are retained in all other fields
    • options(redcaptidier.allow.mdc = TRUE) can be set to silence these warnings

Benchmarks initially looked like we had increased run time for two of our redcaps but that turned out to be a false positive after increasing the number of microbenchmark iterations.

Proposed Changes

  • Add check_field_is_logical() check and refactor multi_choice_to_labels() to incorporate it. This handles checking and parsing to logical
  • Add check_extra_field_values() check and add to multi_choice_to_labels()

Screenshots
Logical field warning:
Screenshot 2024-03-21 at 5 13 22 PM

Categorical field warning:
Screenshot 2024-03-21 at 5 13 35 PM

Issue Addressed

Relates to #181

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • New/revised functions have associated tests
  • New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • New/revised functions use appropriate naming conventions
  • New/revised functions don't repeat code
  • Code changes are less than 250 lines total
  • Issues linked to the PR using GitHub's list of keywords
  • The appropriate reviewer is assigned to the PR
  • The appropriate developers are assigned to the PR
  • Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • I checked that new files follow naming conventions and are in the right place
  • I checked that documentation is complete, clear, and without typos
  • I added/edited comments to explain "why" not "how"
  • I checked that all new variable and function names follow naming conventions
  • I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • I checked that new tests address important edge cases

@ezraporter ezraporter added the bug Something isn't working label Mar 21, 2024
@ezraporter ezraporter requested review from skadauke and rsh52 March 21, 2024 21:16
@ezraporter ezraporter self-assigned this Mar 21, 2024
@rsh52
Copy link
Collaborator

rsh52 commented Mar 22, 2024

A few things here, in general I think the actual code and handling is sound and I'll separate out my thoughts below.

Warning Redundancy

Would it be possible to consolidate the warnings so that one warning encompasses all fields? Or at least each type (the logical versus extra fields checks)? Right now the setup makes a separate warning for each field, which would make for a lot of warnings in most cases where this comes up. I added a second yesno field with an UNK MDC to show below, you can remove it if you like:

read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"),
+                   token = Sys.getenv("REDCAPTIDIER_MDC_API"), raw_or_label = "label")$redcap_data[[1]]
# A tibble: 3 × 9                                                                                                                                                                           
  record_id yesno yesno2 text  checkbox___1 checkbox___2 checkbox___3 dropdown form_status_complete
      <dbl> <lgl> <lgl>  <chr> <lgl>        <lgl>        <lgl>        <fct>    <fct>               
1         1 TRUE  NA     text  TRUE         TRUE         FALSE        C        Complete            
2         2 NA    NA     UNK   FALSE        FALSE        FALSE        NA       Complete            
3         3 NA    NA     NA    FALSE        FALSE        FALSE        NA       Incomplete          
Warning messages:
1: In read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MDC_API"),  :
  ! `yesno` is type 'yesno' but contains non-logical values: UNKThese were converted to `NA` resulting in possible data lossDoes your REDCap project utilize missing data codes?
ℹ Silence this warning with `options(redcaptidier.allow.mdc = TRUE)` or set `raw_or_label = 'raw'` to access missing data codes
2: In read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MDC_API"),  :
  ! `yesno2` is type 'yesno' but contains non-logical values: UNKThese were converted to `NA` resulting in possible data lossDoes your REDCap project utilize missing data codes?
ℹ Silence this warning with `options(redcaptidier.allow.mdc = TRUE)` or set `raw_or_label = 'raw'` to access missing data codes
3: In read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MDC_API"),  :
  ! `dropdown` contains values with no labels: UNKThese were converted to `NA` resulting in possible data lossDoes your REDCap project utilize missing data codes?
ℹ Silence this warning with `options(redcaptidier.allow.mdc = TRUE)` or set `raw_or_label = 'raw'` to access missing data codes

I think aesthetically something similar to the warning we had for mixed-structure data would be what to aim for:

> read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"),
+             token = Sys.getenv("REDCAPTIDIER_MIXED_STRUCTURE_API"))
Error in `clean_redcap_long()` at REDCapTidieR/R/read_redcap.R:278:5:Instruments detected that have both repeating and nonrepeating instances defined in the project: mixed_structure_1 and
  mixed_structure_form_completeSet `allow_mixed_structure` to `TRUE` to override. See Mixed Structure Instruments for more information.
Run `rlang::last_trace()` to see where the error occurred.

Raw vs Label Discrepancy

I might be wrong here, but we may have the raw/label order backwards? See below for the output from REDCapR:

> redcap_read_oneshot(redcap_uri = Sys.getenv("REDCAP_URI"),
+                              token = Sys.getenv("REDCAPTIDIER_MDC_API"), raw_or_label = "label")$data
3 records and 10 columns were read from REDCap in 1.7 seconds.  The http status code was 200.                                                                                               
  record_id   yesno  yesno2    text checkbox___1 checkbox___2 checkbox___3 checkbox___unk dropdown form_1_complete
1         1     Yes    <NA>    text      Checked      Checked    Unchecked      Unchecked        C        Complete
2         2 Unknown Unknown Unknown    Unchecked    Unchecked    Unchecked        Checked  Unknown        Complete
3         3    <NA>    <NA>    <NA>    Unchecked    Unchecked    Unchecked      Unchecked     <NA>      Incomplete

Currently the MDC is set as:
image

Documentation Support

Would you also mind adding something to this PR that mentions this in one of the vignettes/articles? At the moment the only way a user would know about the options() is to encounter them in the wild from the warning since there's nothing in our supporting documentation. Something short in the "Get Started" at the end should be fine. This will also be a good place to note what isn't supported (text fields, etc.).

@ezraporter
Copy link
Collaborator Author

@rsh52 give this another look once the CI passes. I consolidated the warnings and updated the vignette. As discussed earlier, there isn't a way of getting "Unknown" instead of UNK without reading the project metadata so I left that.

Copy link
Collaborator

@rsh52 rsh52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for making those changes and docs additions. Looks good from my end, I still think at some point we may want to pivot to using reading project info from REDCapR but this will solve the issue now.

@ezraporter ezraporter merged commit 85244b4 into main Mar 25, 2024
6 checks passed
@ezraporter ezraporter deleted the missing-data-codes branch March 25, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants