-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Misleading Mixed Data Structure Outputs #204
Comments
Hm, going to need to think more about this. After doing some debugging, the solution I proposed essentially is just leading us back to re-implementing db_data_long
# A tibble: 5 × 8
record_id redcap_event_name redcap_repeat_instrument redcap_repeat_instance mixed_structure_1 mixed_structure_1_complete mixed_structure_2 mixed_structure_2_complete
<dbl> <chr> <chr> <dbl> <chr> <fct> <chr> <fct>
1 1 repeating_together_arm_1 NA 1 RT1 Complete RT1 Complete
2 1 repeating_together_arm_1 NA 2 RT2 Complete RT2 Complete
3 1 repeating_separate_arm_1 mixed_structure_1 1 RS1 Complete NA NA
4 1 repeating_separate_arm_1 mixed_structure_1 2 RS2 Complete NA NA
5 1 repeating_separate_arm_1 mixed_structure_2 1 NA NA RS1 Complete In trying to separate rows based on RS, what we're looking for is sparsity that we nixed on principle when consolidating repeating instances under |
This ensures RSs are joined correctly, may repeat information from the table being joined on, but that this is technically a "correct" join. Versus a "stacked" table where the observations are staggered, which is what the REDCapR/sparse output gives you. What we need to do: update how we determine repeat events and move those values over to |
After banging my head against this for a while, I don't think this is actually correct either and am worried that getting this to work would create an extremely opaque output. Consider this output, with a newly-implemented Browse[1]> x
# A tibble: 4 × 7
record_id redcap_event redcap_form_instance redcap_event_instance mixed_structure_1 form_status_complete .repeat_type
<dbl> <chr> <dbl> <dbl> <chr> <fct> <chr>
1 1 repeating_together NA 1 RT1 Complete repeat_together
2 1 repeating_together NA 2 RT2 Complete repeat_together
3 1 repeating_separate 1 NA RS1 Complete repeat_separate
4 1 repeating_separate 2 NA RS2 Complete repeat_separate
Browse[1]> y
# A tibble: 3 × 7
record_id redcap_event redcap_form_instance redcap_event_instance mixed_structure_2 form_status_complete .repeat_type
<dbl> <chr> <dbl> <dbl> <chr> <fct> <chr>
1 1 repeating_together NA 1 RT1 Complete repeat_together
2 1 repeating_together NA 2 RT2 Complete repeat_together
3 1 repeating_separate 1 NA RS1 Complete repeat_separate
Browse[1]> dplyr::full_join(x,y,by)
# A tibble: 4 × 10
record_id redcap_event redcap_form_instance redcap_event_instance mixed_structure_1 form_status_complete.x .repeat_type.x mixed_structure_2 form_status_complete.y .repeat_type.y
<dbl> <chr> <dbl> <dbl> <chr> <fct> <chr> <chr> <fct> <chr>
1 1 repeating_together NA 1 RT1 Complete repeat_together RT1 Complete repeat_together
2 1 repeating_together NA 2 RT2 Complete repeat_together RT2 Complete repeat_together
3 1 repeating_separate 1 NA RS1 Complete repeat_separate RS1 Complete repeat_separate
4 1 repeating_separate 2 NA RS2 Complete repeat_separate NA NA NA Before we said that the desired output would be for those last three I'll think some more on this, but think this warrants a larger discussion. In the end we need to make sure we "intelligently join" on columns that are transparent to the user and handling this succinctly with the various behaviors an individual form/table gives proves very challenging. |
Expected Behavior
When joining mixed data structures with
read_redcap(..., allow_mixed_structure = TRUE)
, we expect tables to be made with meaningful, unique primary keys.Current Behavior
During development of #199, we found that joining tables that both "repeat together" (i.e. as events, RT) and "repeat separate" (i.e. as instruments, RS) don't provide the appropriate primary keys to allow for distinction of RS rows during joins.
How to Reproduce the Bug:
Using the REDCap for a single record as set up below:
Currently
read_redcap()
gives us the following output:Using
join_data_tibbles()
with a "full join" we get this:The issue here is that in row 3, data for
mixed_structure_1
andmixed_structure_2
should exist on separate rows because they are RS instances. Asread_redcap()
is currently set up, it is impossible to separate these because the primary keys for both are identical (record_id
,redcap_event
,redcap_form_instance
). This is a by product of how we decided to mixredcap_form_instance
s meaning between repeat events and instrumentsSolution Proposal
To fix this we will need to do the following:
redcap_events
column of the supertibble. This will need to identify if an event is a RS/RT for us to reference in other functions likejoin_data_tibbles()
redcap_form_instance
is used in these examples for RS rows. This may involve revising howredcap_form_instance
/redcap_event_instance
is defined but boils down to needing an additional primary key to help identify separate repeating instances. @ezraporter after making an example I thinkredcap_form_instance
/redcap_event_instance
might still not address this fully since these share the same event information. I think instead we need to identify RS data and then give it an additional column with the form name it came from. This might be able to get shifted back to the join function's responsibilities.Checklist
Before submitting this issue, please check and verify below that the submission meets the below criteria:
The text was updated successfully, but these errors were encountered: