Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data Fix, Partial Keys Fix #205

Merged
merged 20 commits into from
Oct 15, 2024
Merged

Missing Data Fix, Partial Keys Fix #205

merged 20 commits into from
Oct 15, 2024

Conversation

rsh52
Copy link
Collaborator

@rsh52 rsh52 commented Sep 20, 2024

Description

This PR no longer implements join_data_tibbles(), that is being moved out into a new branch. Instead, this PR seeks to adress a few bugs, one minor (documented below), and one major where we are mistakenly reporting mixed data structure outputs.

I will be posting comments on the code itself below to explain some of the changes.

The Missing Data Issue

Because we weren't making use of redcap_event_instance correctly, we were not accurately reporting all of the data associated with each REDCap event. See below from the Mixed Structure REDCap for reference:

image

And here is what currently comes out of the REDCapTidieR data tibbles from main:

Current Output with Missing Data
> sprtbl_mixed
# A REDCapTidieR Supertibble with 3 instruments
  redcap_form_name     redcap_form_label    redcap_data redcap_metadata   redcap_events structure data_rows data_cols
  <chr>                <chr>                <list>      <list>            <list>        <chr>         <int>     <int>
1 nonrepeat_form       Nonrepeat Form       <tibble>    <tibble [2 × 17]> <tibble>      nonrepea4         5
2 repeat_form          Repeat Form          <tibble>    <tibble [2 × 17]> <tibble>      mixed             2         5
3 mixed_structure_form Mixed Structure Form <tibble>    <tibble [2 × 17]> <tibble>      mixed             3         5
# ℹ 3 more variables: data_size <lbstr_by>, data_na_pct <formttbl>, form_complete_pct <formttbl>
> sprtbl_mixed$redcap_data
[[1]]
# A tibble: 4 × 5
  record_id redcap_event       redcap_event_instance nonrepeat_1 form_status_complete
      <dbl> <chr>                              <dbl> <chr>       <fct>               
1         1 non_repeating                         NA Nonrepeat 1 Incomplete          
2         1 repeating_separate                    NA Nonrepeat 2 Incomplete          
3         1 repeating_together                     1 A           Incomplete          
4         1 repeating_together                     2 B           Incomplete          

[[2]]
# A tibble: 2 × 5
  record_id redcap_event  redcap_form_instance repeat_1 form_status_complete
      <dbl> <chr>                        <dbl> <chr>    <fct>               
1         1 non_repeating                    1 Repeat 1 Incomplete          
2         1 non_repeating                    2 Repeat 2 Incomplete          

[[3]]
# A tibble: 3 × 5
  record_id redcap_event       redcap_form_instance mixed_structure_1 form_status_complete
      <dbl> <chr>                             <dbl> <chr>             <fct>               
1         1 non_repeating                         1 Mixed Nonrepeat 1 Incomplete          
2         1 repeating_separate                    1 Mixed Repeat 1    Incomplete          
3         1 repeating_separate                    2 Mixed Repeat 2    Incomplete     

In the second data tibble (the repeat form), we should expect to see 4 entries, 2 for the non repeating event due to 2 repeating form instances, and 2 for the repeating together event entries.

In the third data tibble (the mixed structure form), we should expect to see 5 entries and are again missing the 2 repeating together event entries.

Here is the output with the proposed code changes:

Proposed Output with Missing Data Added
> sprtbl_mixed
# A REDCapTidieR Supertibble with 3 instruments
  redcap_form_name   redcap_form_label redcap_data redcap_metadata redcap_events structure data_rows data_cols data_size data_na_pct form_complete_pct
  <chr>              <chr>             <list>      <list>          <list>        <chr>         <int>     <int> <lbstr_b> <formttbl>  <formttbl>       
1 nonrepeat_form     Nonrepeat Form    <tibble>    <tibble>        <tibble>      nonrepea4         5 2.59 kB   0%          0%               
2 repeat_form        Repeat Form       <tibble>    <tibble>        <tibble>      mixed             4         6 2.67 kB   0%          0%               
3 mixed_structure_fMixed Structure<tibble>    <tibble>        <tibble>      mixed             5         6 2.94 kB   0%          0%               
> sprtbl_mixed$redcap_data
[[1]]
# A tibble: 4 × 5
  record_id redcap_event       redcap_event_instance nonrepeat_1 form_status_complete
      <dbl> <chr>                              <dbl> <chr>       <fct>               
1         1 non_repeating                         NA Nonrepeat 1 Incomplete          
2         1 repeating_separate                    NA Nonrepeat 2 Incomplete          
3         1 repeating_together                     1 A           Incomplete          
4         1 repeating_together                     2 B           Incomplete          

[[2]]
# A tibble: 4 × 6
  record_id redcap_event       redcap_form_instance redcap_event_instance repeat_1 form_status_complete
      <dbl> <chr>                             <dbl>                 <dbl> <chr>    <fct>               
1         1 non_repeating                         1                    NA Repeat 1 Incomplete          
2         1 non_repeating                         2                    NA Repeat 2 Incomplete          
3         1 repeating_together                   NA                     1 A        Incomplete          
4         1 repeating_together                   NA                     2 B        Incomplete          

[[3]]
# A tibble: 5 × 6
  record_id redcap_event       redcap_form_instance redcap_event_instance mixed_structure_1 form_status_complete
      <dbl> <chr>                             <dbl>                 <dbl> <chr>             <fct>               
1         1 non_repeating                         1                    NA Mixed Nonrepeat 1 Incomplete          
2         1 repeating_separate                    1                    NA Mixed Repeat 1    Incomplete          
3         1 repeating_separate                    2                    NA Mixed Repeat 2    Incomplete          
4         1 repeating_together                   NA                     1 A                 Incomplete          
5         1 repeating_together                   NA                     2 B                 Incomplete        

Proposed Changes

List changes below in bullet format:

  • Update add_event_mapping() with output from new function get_repeat_event_types() and make repeat_type available in the redcap_events column of the supertibble
  • Update clean_redcap_()
    • Update convert_mixed_instrument() to detect RT data and shift instances over to redcap_event_instance
  • Update add_partial_keys() to handle regex for arms that don't end in an integer (See [BUG] add_partial_keys() Doesn't fully capture arm regex #206)
  • Update test suite accordingly
  • Add new join_data_tibbles() function and tests <- in new branch

Remaining TODO's

Issue Addressed

Closes #206
Closes #204

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • New/revised functions have associated tests
  • [NA] New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • New/revised functions use appropriate naming conventions
  • New/revised functions don't repeat code
  • Code changes are less than 250 lines total
  • Issues linked to the PR using GitHub's list of keywords
  • The appropriate reviewer is assigned to the PR
  • The appropriate developers are assigned to the PR
  • [NA] Pre-release package version incremented using usethis::use_version()
    • A part of the upcoming 1.2 release

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • I checked that new files follow naming conventions and are in the right place
  • I checked that documentation is complete, clear, and without typos
  • I added/edited comments to explain "why" not "how"
  • I checked that all new variable and function names follow naming conventions
  • I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • I checked that new tests address important edge cases

@rsh52 rsh52 added bug Something isn't working enhancement New feature or request labels Sep 20, 2024
@rsh52 rsh52 self-assigned this Sep 20, 2024
@@ -84,7 +84,9 @@ clean_redcap_long <- function(db_data_long,
# Retrieve mixed structure fields and forms in reference df
mixed_structure_ref <- get_mixed_structure_fields(db_data_long) %>%
filter(.data$rep_and_nonrep & !str_ends(.data$field_name, "_form_complete")) %>%
left_join(db_metadata_long %>% select("field_name", "form_name"),
left_join(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change, just lintr.

@@ -292,7 +295,7 @@ read_redcap <- function(redcap_uri,
out <- add_metadata(out, db_metadata, redcap_uri, token, suppress_redcapr_messages)

if (is_longitudinal) {
out <- add_event_mapping(out, linked_arms)
out <- add_event_mapping(out, linked_arms, repeat_event_types)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add repeat_event_types reference table to add_event_mapping()

#'
add_event_mapping <- function(supertbl, linked_arms) {

add_event_mapping <- function(supertbl, linked_arms, repeat_event_types) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_event_mapping() now accepts a reference table for the repeating event types designations. This table is joined onto the linked_arms data to make repeat_type available in the eventual redcap_events supertibble column.

@@ -18,12 +18,13 @@
add_partial_keys <- function(db_data,
var = NULL) {
if (!is.null(enexpr(var))) {
pattern <- "^(\\w+?)_arm_(\\d)$"
# Include handling for instances where REDCap appends with "_1b" or similar
pattern <- "^(\\w+?)_arm_(\\d+\\w?)$"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix regex to address #206

R/utils.R Outdated

db_data <- db_data %>%
mutate(
redcap_event = sub(pattern, "\\1", {{ var }}),
redcap_arm = as.integer(sub(pattern, "\\2", {{ var }}))
redcap_arm = as.factor(sub(pattern, "\\2", {{ var }}))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to factor to capture non-integer arm values as mentioned in #206

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think character is probably better here since these will usually be numbers and aren't the type of categorical variables you normally expect as factor to represent

@@ -416,14 +421,38 @@ convert_mixed_instrument <- function(db_data_long, mixed_structure_ref) {
)
)

repeat_together_present <- any(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now want to separate RT events so their instances get captured under redcap_event_instance, we first need to check for the right behavior for RTs (i.e. no repeating instrument but a repeat instance detected).

!is.na(db_data_long$redcap_repeat_instance)
)

if (!"redcap_event_instance" %in% names(db_data_long) && repeat_together_present) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RT detected but no column supplied for redcap_event_instance, add it as an empty column.

relocate(.data$redcap_event_instance, .after = .data$redcap_repeat_instance)
}

if (repeat_together_present) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, shift the redcap repeat instance values over to the redcap event instance column.

@rsh52 rsh52 changed the title join_data_tibbles, Enable Repeat Event Type Supertibble Defintion join_data_tibbles, Enable Repeat Event Type Supertibble Definion Sep 23, 2024
(
!is.na(.data$redcap_form_instance) |
if_any(starts_with("redcap_event_instance"), ~ !is.na(.))
) &
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter for when either redcap_form_instance is NA or redcap_event_instance is NA (if it exists).

R/join_data_tibbles.R Outdated Show resolved Hide resolved
@rsh52 rsh52 changed the title join_data_tibbles, Enable Repeat Event Type Supertibble Definion join_data_tibbles, Enable Repeat Event Type Supertibble Definition Oct 14, 2024
@rsh52 rsh52 changed the title join_data_tibbles, Enable Repeat Event Type Supertibble Definition Missing Data Fix, Partial Keys Fix Oct 14, 2024
@rsh52 rsh52 marked this pull request as ready for review October 14, 2024 19:34
@rsh52 rsh52 requested a review from ezraporter October 14, 2024 19:35
Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny comment but looks good otherwise

R/utils.R Outdated

db_data <- db_data %>%
mutate(
redcap_event = sub(pattern, "\\1", {{ var }}),
redcap_arm = as.integer(sub(pattern, "\\2", {{ var }}))
redcap_arm = as.factor(sub(pattern, "\\2", {{ var }}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think character is probably better here since these will usually be numbers and aren't the type of categorical variables you normally expect as factor to represent

@rsh52
Copy link
Collaborator Author

rsh52 commented Oct 15, 2024

I think character is probably better here since these will usually be numbers and aren't the type of categorical variables you normally expect as factor to represent

Good call, thanks. I thought a factor would be better for how redcap_arm gets handled later but it doesn't seem to really impact anything and you're right this is likely more representative.

@ezraporter ezraporter self-requested a review October 15, 2024 17:54
@rsh52 rsh52 merged commit 5c4a7de into main Oct 15, 2024
4 checks passed
@rsh52 rsh52 deleted the join-data-tibbles branch October 15, 2024 19:35
@rsh52 rsh52 mentioned this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] add_partial_keys() Doesn't fully capture arm regex [BUG] Misleading Mixed Data Structure Outputs
2 participants