-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] yesno
fields are converted to NA
when the field contains missing data codes
#181
Comments
Thanks for the bug report! I wasn't able to reproduce this issue with the test REDCap projects we have which include
REDCapR::redcap_metadata_read(redcap_uri = [YOUR_URI], token = [YOUR_TOKEN])
REDCapR::redcap_read_oneshot(
redcap_uri = [YOUR_URI],
token = [YOUR_TOKEN],
export_survey_fields = TRUE,
export_data_access_groups = TRUE,
guess_max = .Machine$integer.max
) It would also be helpful if you could provide the results of running |
Thanks for your quick response! I am not able to share the token and URL since there is PHI in the database. When I run this, it looks like those variables do accurately get pulled, unlike when using read_redcap. |
Are you able to share what those results look like after manually removing PHI? Alternatively, if you have the ability to create REDCap projects at your institution, are you able to create an example that demonstrates the bug without PHI? Since it sounds like downloading the data from REDCap isn't where the bug is originating, it would be helpful to see your |
Here’s what it looks like with read_redcap. mri_relapse and gad_relapse are N/A for all record IDs.
[A screenshot of a computer Description automatically generated]
Here’s what it looks like when using REDCapR directly. data_mri_relapse and data_gad_relapse are accurately pulled.
[A screenshot of a computer Description automatically generated]
Here’s sessionInfo
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Hmisc_5.1-2 redcapAPI_2.8.4 DemographicTable_0.1.6 scales_1.3.0 ggrepel_0.9.5
[6] officer_0.6.5 flextable_0.9.5 broom_1.0.5 kableExtra_1.4.0 skimr_2.1.5
[11] labelled_2.12.0 knitr_1.45 REDCapR_1.1.0 REDCapTidieR_1.0.0 lubridate_1.9.3
[16] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[21] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gridExtra_2.3 rlang_1.1.3 magrittr_2.0.3 e1071_1.7-14 compiler_4.3.3
[6] getPass_0.2-4 systemfonts_1.0.6 vctrs_0.6.5 httpcode_0.3.0 pkgconfig_2.0.3
[11] crayon_1.5.2 fastmap_1.1.1 backports_1.4.1 ellipsis_0.3.2 utf8_1.2.4
[16] promises_1.2.1 rmarkdown_2.26 tzdb_0.4.0 haven_2.5.4 ragg_1.3.0
[21] bit_4.0.5 xfun_0.42 jsonlite_1.8.8 later_1.3.2 uuid_1.2-0
[26] parallel_4.3.3 cluster_2.1.6 R6_2.5.1 stringi_1.8.3 RColorBrewer_1.1-3
[31] rpart_4.1.23 Rcpp_1.0.12 assertthat_0.2.1 lobstr_1.1.2 base64enc_0.1-3
[36] nnet_7.3-19 httpuv_1.6.14 timechange_0.3.0 tidyselect_1.2.1 rstudioapi_0.15.0
[41] yaml_2.3.8 curl_5.2.1 shiny_1.8.0 withr_3.0.0 askpass_1.2.0
[46] evaluate_0.23 foreign_0.8-86 proxy_0.4-27 zip_2.3.1 xml2_1.3.6
[51] formattable_0.2.1 pillar_1.9.0 checkmate_2.3.1 generics_0.1.3 vroom_1.6.5
[56] hms_1.1.3 munsell_0.5.0 chron_2.3-61 xtable_1.8-4 class_7.3-22
[61] glue_1.7.0 gdtools_0.3.7 tools_4.3.3 gfonts_0.2.0 data.table_1.15.2
[66] grid_4.3.3 colorspace_2.1-0 repr_1.1.6 htmlTable_2.4.2 Formula_1.2-5
[71] cli_3.6.2 labelVector_0.1.2 textshaping_0.3.7 fontBitstreamVera_0.1.1 fansi_1.0.6
[76] viridisLite_0.4.2 svglite_2.1.3 keyring_1.3.2 gtable_0.3.4 digest_0.6.35
[81] fontquiver_0.2.1 crul_1.4.0 htmlwidgets_1.6.4 htmltools_0.5.7 lifecycle_1.0.4
[86] httr_1.4.7 mime_0.12 fontLiberation_0.1.0 openssl_2.1.1 bit64_4.0.5
From: Ezra Porter ***@***.***>
Date: Monday, March 18, 2024 at 4:37 PM
To: CHOP-CGTInformatics/REDCapTidieR ***@***.***>
Cc: pbilodeau94 ***@***.***>, Author ***@***.***>
Subject: Re: [CHOP-CGTInformatics/REDCapTidieR] [BUG] yesno variables are pulled as `NA` (Issue #181)
Are you able to share what those results look like after manually removing PHI? Alternatively, if you have the ability to create REDCap projects at your institution, are you able to create an example that demonstrates the bug without PHI?
Since it sounds like downloading the data from REDCap isn't where the bug is originating, it would be helpful to see your sessionInfo() so I can check if this is specific to particular package versions.
—
Reply to this email directly, view it on GitHub<#181 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGXP2ZQ575T4AN5RNZY4H63YY5GANAVCNFSM6AAAAABE4FK7W6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUHE2DCOBQGE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Thanks! I don't think the screenshots made it through. Can you try to resend them? You may need to use the github.com UI instead of email for it to work |
Ah okay the issue is with those "UNK" values, which I guess your project is getting through the Missing Data Codes customization. I have to think a little more about how we should handle these but agree the current behavior isn't right so we'll implement some fix. If you have a second to think about it, what would your expectation as a user be for a |
NA
yesno
fields are converted to NA
when the field contains missing data codes
Ah yes that makes sense. Yes, I think converting UNK to NA and keeping it as TRUE/FALSE makes the most sense.
From: Ezra Porter ***@***.***>
Date: Monday, March 18, 2024 at 5:54 PM
To: CHOP-CGTInformatics/REDCapTidieR ***@***.***>
Cc: pbilodeau94 ***@***.***>, Author ***@***.***>
Subject: Re: [CHOP-CGTInformatics/REDCapTidieR] [BUG] yesno variables are pulled as `NA` (Issue #181)
Ah okay the issue is with those "UNK" values, which I guess your project is getting through the Missing Data Codes<https://kb.wisc.edu/smph/informatics/page.php?id=108107> customization. I have to think a little more about how we should handle these but agree the current behavior isn't right so we'll implement some fix.
If you have a second to think about it, what would your expectation as a user be for a yesno field that has UNK values? Convert the field to a factor with levels Yes, No, UNK? Convert UNK to NA and have the field remain TRUE/FALSE? Something else?
—
Reply to this email directly, view it on GitHub<#181 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGXP2ZTCAQIFKTYXCBGLQRDYY5PCTAVCNFSM6AAAAABE4FK7W6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBVGEYTEOBTGU>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Since Missing Data Codes (MDCs) can be applied to any field type, I think we need to consider what we want to happen for each of the various field types if a MDC is enabled here. The trickiest would be the boolean-type fields (yes/no, true/false, multiple choice??). The two options I see are 1) we no longer handle these fields as boolean but turn them into factors or 2) keep them boolean and recode MDC values as NA. I guess the default behavior could #2. However since that is also how an empty value is coded that could cause data loss which shouldn't happen silently. So I think we might want to have Thoughts? |
IMO this sounds better than (1), I don't think the default should be to change the data types. Sort of similar to how we went about #177, this isn't exactly a typical setup so if a user wants to change something it's good to do so consciously. Handling this is going to be non-trivial, to my knowledge there's no easy way to detect that missing data codes exist (unless I'm totally overlooking something!). I made a test database to verify, below is a sample data and metadata output for a yesno and a text field with an Browse[2]> db_data
record_id yesno text form_1_complete
1 1 1 text 2
2 2 0 text 2
3 3 UNK UNK 0
Browse[2]> db_metadata
# A tibble: 3 × 18
field_name form_name section_header field_type field_label select_choices_or_calculat…¹ field_note
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 record_id NA NA text Record ID NA NA
2 yesno form_1 NA yesno YesNo NA NA
3 text form_1 NA text Text NA NA
# ℹ abbreviated name: ¹select_choices_or_calculations
# ℹ 11 more variables: text_validation_type_or_show_slider_number <chr>, text_validation_min <chr>,
# text_validation_max <chr>, identifier <chr>, branching_logic <chr>, required_field <chr>,
# custom_alignment <chr>, question_number <chr>, matrix_group_name <chr>, matrix_ranking <chr>,
# field_annotation <chr> With our current setup, I suppose the best bet is to check each variable type in the metadata, then check for values that don't coincide with expected values. I'm not sure how we would do this for plain text fields. This may also add more processing time, which supports not having this as a default behavior. |
There’s no way to detect project-level MDCs using the API. The codebook in the metadata has information on which MDCs were enabled, but not at the variable level. Not sure exactly how your API call is set up (e.g whether there’s a missing data function), but the easiest way may be to ask the user to specify which MDCs were used in the specific project as a custom function.
From: Rich Hanna ***@***.***>
Date: Tuesday, March 19, 2024 at 9:10 AM
To: CHOP-CGTInformatics/REDCapTidieR ***@***.***>
Cc: pbilodeau94 ***@***.***>, Author ***@***.***>
Subject: Re: [CHOP-CGTInformatics/REDCapTidieR] [BUG] `yesno` fields are converted to `NA` when the field contains missing data codes (Issue #181)
So I think we might want to have read_redcap() throw a warning when this is detected suggesting that the user define the desired behavior. This could be done with an optional arg to read_redcap(), e.g. boolean_mdc_fields_as_factor?
IMO this sounds better than (1), I don't think the default should be to change the data types. Sort of similar to how we went about #177<#177>, this isn't exactly a typical setup so if a user wants to change something it's good to do so consciously.
Handling this is going to be non-trivial, to my knowledge there's no easy way to detect that missing data codes exist (unless I'm totally overlooking something!). I made a test database to verify, below is a sample data and metadata output for a yesno and a text field with an UNK MDC:
Browse[2]> db_data
record_id yesno text form_1_complete
1 1 1 text 2
2 2 0 text 2
3 3 UNK UNK 0
Browse[2]> db_metadata
# A tibble: 3 × 18
field_name form_name section_header field_type field_label select_choices_or_calculat…¹ field_note
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 record_id NA NA text Record ID NA NA
2 yesno form_1 NA yesno YesNo NA NA
3 text form_1 NA text Text NA NA
# ℹ abbreviated name: ¹select_choices_or_calculations
# ℹ 11 more variables: text_validation_type_or_show_slider_number <chr>, text_validation_min <chr>,
# text_validation_max <chr>, identifier <chr>, branching_logic <chr>, required_field <chr>,
# custom_alignment <chr>, question_number <chr>, matrix_group_name <chr>, matrix_ranking <chr>,
# field_annotation <chr>
With our current setup, I suppose the best bet is to check each variable type in the metadata, then check for values that don't coincide with expected values. I'm not sure how we would do this for plain text fields. This may also add more processing time, which supports not having this as a default behavior.
—
Reply to this email directly, view it on GitHub<#181 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGXP2ZSM4CXIAFIGW34OG53YZA2L5AVCNFSM6AAAAABE4FK7W6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGEZTONRVHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
It looks like the missing data codes are returned in the Export Project Info API method so we theoretically could get them. There's really only 3 cases we currently have with processing fields of different types. For the first 2 it's pretty trivial to remove missing data codes without knowing what they are. For the third it's basically impossible. That leaves us in a bind if we want a solution that:
I think the least of all evils is to convert to Case 1: Current behavior: Convert to Possible new behavior: Check for non- Case 2: Current behavior: Convert to Possible new behavior: Check for data values not in the levels read from metadata and warn if found. Convert to factor, implicitly converting missing codes to Case 3: Everything else Current behavior: Do nothing Possible new behavior: For now, do nothing. When We can add an option to silence warnings so users can shut them off when they don't have control of the REDCap project. |
After further discussion some additional behavioral things to note for MDCs:
|
Considering that I suspect only a small subset of REDCap databases use MDCs, possibly breaking changes to the supertibble data model, and likely performance hit with robustly supporting MDCs, I think what Ezra outlines above is a reasonable solution for now (i.e. implement cases 1 and 2). We can consider implementing case (3) after someone demonstrates the need for this by opening a feature request. We can write a warning message that states that we are not currently supporting MDCs but if this is needed they should file an issue on our GitHub page. |
I noticed when working on this that we currently don't convert fields to |
Like how |
Yeah okay I agree and it actually will keep our code simpler to keep as is. PR coming soon but I'm leaning towards also keeping the missing data codes when |
@pbilodeau94 when you have a chance let us know if #182 solved your issue. You can try it out by installing the dev version |
Yes, looks like it's all set now! Thank you!! |
Current Behavior
When using read_redcap, all yes/no variables are all listed as N/A. Dummy factor variables for multiple checkboxes get pulled correctly as TRUE/FALSE.
How to Reproduce the Bug:
nmosd <- read_redcap(url, token) %>%
make_labelled(format_labels = fmt_strip_trailing_colon) %>%
bind_tibbles()
The text was updated successfully, but these errors were encountered: