[FEATURE] extract_tibble should allow users to join specified tables #111

rsh52 · 2022-12-12T19:00:49Z

Feature Request Description

In addition to extracting selected tibbles, extract_tibbles should allow users the option to join them as a singular tibble output instead of the list. As found in recent projects, the next logical step often times when using extract_tibbles is joining.

Proposed Solution

Prototyped logic is available in our internal Prodigy Reporter. The new argument (suggest: join_tibbles = TRUE/FALSE) should kick off join operations. Since we abstract some column names, i.e. form_status_complete, we need to account for duplicated colnames in the tibbles themselves.

# Load Libraries ===============================================================
library(REDCapTidieR)
library(tidyverse)
library(tidyselect)
library(rlang)

# tibble List Selection Function ===============================================
tibble_list_select <- function(supertibble, tbls) {
  tbls <- eval_select(data = supertibble, expr = enquo(tbls))
  supertibble[tbls]
}

# Join Operation ===============================================================
join_tibbles <- function(extracted_tibbles, record_id) {
  # First: compile all names related to tibbles
  # Second: Identify names that exist in multiple tibbles (not record_id)
  # Third: Append identified names with name of the tibble they belong to
  
  duplicate_colnames <- extracted_tibbles %>%
    map(names) %>%
    unlist() %>%
    tibble(name = .) %>%
    count(name) %>%
    # don't append table name to pk: infseq_id
    filter(n > 1 & name != record_id) %>% # <-- Need to functionally call out record_id in case of name change -->
    pull(name)
  
  extracted_tibbles <- map2(
    extracted_tibbles,
    names(extracted_tibbles),
    .f = function(df, df_name) {
      # [duplicate_col] -> [duplicate_col].[table_name]
      rename_with(
        df,
        .cols = any_of(duplicate_colnames),
        .fn = function(col) paste0(col, ".", df_name)
      )
    }
  )
  
  # Multi-left_join using reduce, filter for inputs resulting in include == TRUE
  out <- reduce(
    extracted_tibbles,
    dplyr::left_join,
    by = record_id # <-- Need to functionally update this -->
  )
  
  out
}

Here's how I envision this being implemented, but imagine the external functions as internal to extract_tibbles instead:

# Example ======================================================================
redcap_uri <- Sys.getenv("REDCAP_URI")
token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")

supertibble <- read_redcap(redcap_uri, token)

extracted_tibbles <- supertibble |>
  extract_tibbles() 

extracted_tibbles |> 
  tibble_list_select(tbls = c(contains("nonrepeat"), repeated)) |>
  join_tibbles(record_id = "record_id")

You should be able to copy and paste all of this into a script and use REDCapTidieR 0.2.0 to view the proposed output. Open to suggestions on naming conventions for identified duplicate columns (currently [duplicate_col].[table_name]).

Checklist

The issue is atomic
The issue description is documented
The issue title describes the problem succinctly
Developers are assigned to the issue
Labels are assigned to the issue

The text was updated successfully, but these errors were encountered:

rsh52 · 2022-12-12T19:01:22Z

@skadauke @ezraporter tagged for posterity (and critiques) ✨

skadauke · 2022-12-14T21:58:30Z

Discussed an alternative, higher-level API for this using the existing extract_tibble() function. The following would return a single tibble with demographics and disease_response instruments joined together appropriately.

supertbl |>
  extract_tibble(demographics, disease_response)

One question is what "appropriately" means. Another question is how to make this syntax concise and expressive while at the same time not limiting flexibility. We will see use cases for table joins during development of the Prodigy reporter and aim to implement a solution with 0.3 in a few months.

skadauke · 2022-12-16T15:08:41Z

I had one more thought, not sure if it's possible or even a good idea. What if

supertbl |>
  extract_tibble(everything())

returns a tibble that's (mostly) the same as the block matrix? The use case here might be that people could make changes inside the supertibble and then send those changes back to the REDCap instance. I know I said we don't want to touch writing, but it's a thought. And this could guide how we plan what a structure of a table in which nonrepeating and repeating instruments are combined.

rsh52 added the enhancement New feature or request label Dec 12, 2022

rsh52 self-assigned this Dec 12, 2022

rsh52 added this to the 0.3 milestone Dec 12, 2022

rsh52 added the backlog not to be worked on now label Dec 13, 2022

skadauke changed the title ~~[FEATURE] extract_tibbles should allow users to join specified tables~~ [FEATURE] extract_tibble should allow users to join specified tables Dec 16, 2022

rsh52 mentioned this issue Dec 16, 2022

Implement argument checking for exported functions #114

Merged

15 tasks

rsh52 modified the milestones: 0.3, 0.4 Feb 20, 2023

rsh52 removed this from the 0.4 milestone May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] extract_tibble should allow users to join specified tables #111

[FEATURE] extract_tibble should allow users to join specified tables #111

rsh52 commented Dec 12, 2022

rsh52 commented Dec 12, 2022

skadauke commented Dec 14, 2022 •

edited

Loading

skadauke commented Dec 16, 2022 •

edited

Loading

[FEATURE] extract_tibble should allow users to join specified tables #111

[FEATURE] extract_tibble should allow users to join specified tables #111

Comments

rsh52 commented Dec 12, 2022

Feature Request Description

Proposed Solution

Checklist

rsh52 commented Dec 12, 2022

skadauke commented Dec 14, 2022 • edited Loading

skadauke commented Dec 16, 2022 • edited Loading

skadauke commented Dec 14, 2022 •

edited

Loading

skadauke commented Dec 16, 2022 •

edited

Loading