Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_checkboxes #196

Merged
merged 39 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
9b0471b
Reduce function initial draft
Jul 10, 2024
859926f
Reduce function fixes
Jul 11, 2024
c42399d
Small fixes
Jul 11, 2024
8862ac1
Draft tests, add no_val param
Jul 12, 2024
96c3309
Add keep param
Jul 12, 2024
218fca4
Fix `keep` param
Jul 15, 2024
54e3a99
Update documentation and API
Jul 15, 2024
ec5c19d
Update combine_checkbox api and docs
Jul 15, 2024
6b1fb48
Add check for if no fields exist in selection
Jul 15, 2024
d55dc00
Add check_fields_are_checkboxes function
Jul 15, 2024
261342d
Minor cleaning
Jul 15, 2024
e1d4eb8
Update version, test recheck workflow
Jul 15, 2024
7207f09
Test recheck workflow file
Jul 15, 2024
4f861e1
Fix linting
Jul 15, 2024
eb11152
Add combine_checkboxes() to pkgdown
Jul 15, 2024
7348324
Remove revdepcheck, update renv
Jul 15, 2024
62080af
Add standard checks for params
Jul 15, 2024
522d01d
Filename update
Jul 16, 2024
3a395cf
Filename change
Jul 16, 2024
347d2a3
Rename test file
rsh52 Jul 17, 2024
cce0d12
Fix record_id_field assign, remove rowwise call
rsh52 Jul 17, 2024
c250eda
Remove instrument_identifiers, use bind_cols
rsh52 Jul 17, 2024
b0a8564
Implement parse_labels, clean code, fix tests
rsh52 Jul 17, 2024
21f8879
Remove record_id field, lint
rsh52 Jul 17, 2024
31797c6
Apply additional cleanup suggestions
rsh52 Jul 18, 2024
2dfac9a
Add extract_metadata fnctn, tests
rsh52 Jul 18, 2024
ed55292
Support multiple values_to, logicals, new checks
Jul 24, 2024
c0b3885
Linting
Jul 24, 2024
7789a22
Update API, clean up, new methods, new docs
Jul 29, 2024
c185e39
Add check_metadata_fields_exist, update details
Aug 2, 2024
abdc512
Consoldiate and rework checkbox value conversion
Aug 2, 2024
50d47d6
Add names_repair strategy support
Aug 5, 2024
a6d150d
Remove names_suffix, restructure prefix/sep
Aug 5, 2024
0f868b8
Add names_glue spec
Aug 5, 2024
abefbee
Add glue support with names_glue
Aug 5, 2024
06d1337
Make glue dependency, remove install check
rsh52 Aug 12, 2024
dcb1029
Update glue spec handling
rsh52 Aug 12, 2024
0295650
check_equal_col_summaries() implementation
rsh52 Aug 12, 2024
127dd46
Update error message check_equal_col_summaries()
rsh52 Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ importFrom(purrr,map_lgl)
importFrom(purrr,pluck)
importFrom(purrr,pmap)
importFrom(purrr,pmap_chr)
importFrom(purrr,reduce)
importFrom(purrr,some)
importFrom(readr,parse_character)
importFrom(readr,parse_date)
Expand Down
2 changes: 1 addition & 1 deletion R/REDCapTidieR-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#' @importFrom lobstr obj_size
#' @importFrom lubridate is.difftime is.period is.POSIXt is.Date
#' @importFrom purrr compose map map2 map_int map_lgl pluck pmap_chr some pmap
#' discard flatten_chr map2_chr
#' discard flatten_chr map2_chr reduce
#' @importFrom REDCapR redcap_arm_export redcap_event_instruments redcap_instruments
#' redcap_metadata_read redcap_read_oneshot sanitize_token
#' @importFrom rlang .data !!! abort as_closure caller_arg caller_env catch_cnd
Expand Down
2 changes: 1 addition & 1 deletion R/checks.R
Original file line number Diff line number Diff line change
Expand Up @@ -664,7 +664,7 @@ check_extra_field_values_message <- function(extra_field_values, call = caller_e
#' Check fields exist for checkbox combination
#'
#' @param fields Vector of character strings to check the length of
#' @param expr A quosure expression
#' @param expr An expression
#' @param call The calling environment to use in the error message
#'
#' @keywords internal
Expand Down
208 changes: 138 additions & 70 deletions R/combine_checkboxes.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@
#'
#' @param supertbl A supertibble generated by [read_redcap()]. Required.
#' @param tbl The `redcap_form_name` of the data tibble to extract. Required.
#' @param cols <[`tidy-select`][tidyr_tidy_select]> Checbox columns to combine to
#' @param cols <[`tidy-select`][tidyr_tidy_select]> Checkbox columns to combine to
#' single column. Required.
#' @param values_to A string specifying the name of the column to combine checkbox
#' values under. Required.
#' @param names_prefix String added to the start of every variable name.
#' @param names_suffix String added to the end of every variable name.
#' @param names_sep String to separate new column names from `names_prefix` and/or
#' `names_suffix`.
#' @param multi_value_label A string specifying the value to be used when multiple
#' checkbox fields are selected. Default "Multiple".
#' @param values_fill Value to use when no checkboxes are selected. Default `NA`.
Expand All @@ -25,11 +27,10 @@
#' @examples
#' \dontrun{
#' supertbl <- read_redcap(redcap_uri, token)
#' combined_tbl <- combine_checkboxes(
#' combine_checkboxes(
#' supertbl = supertbl,
#' tbl = "demographics",
#' cols = starts_with("race"),
#' values_to = "race_combined",
#' multi_value_label = "Multiple",
#' values_fill = NA
#' )
Expand All @@ -40,87 +41,84 @@
combine_checkboxes <- function(supertbl,
tbl,
cols,
values_to,
names_prefix = "",
names_suffix = NULL,
names_sep = "_",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that these parameters control the names of the new columns and not how variable names (ex. race___0) are parsed into names and values?

A couple comments:

  1. If we go with this I think we need to make it clearer that we're assuming the checkbox fields are in name___value format and not giving the user any control over how that's parsed.
  2. I don't think the defaults are right. In this example the output col is called _race. The default should probably just produce race:
db_label |>
  combine_checkboxes(
    "demographics",
    starts_with("race")
  )
  1. I see your "Maybe" in the PR comment and also think names_glue would be super valuable if we can do it 😊

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that these parameters control the names of the new columns and not how variable names (ex. race___0) are parsed into names and values?

They control the structure of the names, but the names themselves come from .value in get_metadata_spec(), i.e. the field name prior to the ___ checkbox changes.

If we go with this I think we need to make it clearer that we're assuming the checkbox fields are in name___value format and not giving the user any control over how that's parsed.

How about we just add a check_metadata_fields_exist() in get_metadata_spec() similar to what I have in the parent function for check_fields_exist()? If checkbox names are changed and they don't appear in the metadata field_name column, we can throw an error and suggestion. We should expect users don't manipulate the metadata tibbles, but we need the connection between the metadata tibble and the data tibble to remain intact. I think this supports our "if you change things, you need to take some responsibility for them" mindset.

I don't think the defaults are right. In this example the output col is called _race. The default should probably just produce race

I'm open to changing it, but when I was thinking through outputs I was worried about clashing with other possibly existing column names. See this example, if we're ok with that being the default in the event of a clash then I can rework this.

Click me
data <- tibble(
  id = 1,
  prefix = "prefix",
  x___1 = TRUE,
  x___2 = FALSE,
  y___1 = TRUE,
  y___2 = TRUE,
  z___1 = FALSE,
  x = "val"
)

metadata <- tibble(
  field_name = c("id", "prefix", "x___1", "x___2", "y___1", "y___2", "z___1", "x"),
  field_type = c("text", "text", rep("checkbox", 5), "text"),
  select_choices_or_calculations = c(NA, NA, rep("1, A | 2, B", 4), "3, C", NA),
)

suprtbl <- tibble(
  redcap_form_name = "tbl",
  redcap_data = list(data),
  redcap_metadata = list(metadata)
) |>
  as_supertbl()

combine_checkboxes(supertbl = suprtbl,
                   tbl = "tbl",
                   cols = c(starts_with("x__"), starts_with("y"), "z___1"),
                   names_sep = "") %>%
  pull(redcap_data)

New names:`x` -> `x...8``x` -> `x...9`
[[1]]
# A tibble: 1 × 11
     id prefix x___1 x___2 y___1 y___2 z___1 x...8 x...9 y        z    
  <dbl> <chr>  <lgl> <lgl> <lgl> <lgl> <lgl> <chr> <fct> <fct>    <fct>
1     1 prefix TRUE  FALSE TRUE  TRUE  FALSE val   A     Multiple NA   

I see your "Maybe" in the PR comment and also think names_glue would be super valuable if we can do it 😊

ugh... Fine. I knew it was coming but figured I'll get the rest of this ironed out first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just add a check_metadata_fields_exist() in get_metadata_spec() similar to what I have in the parent function for check_fields_exist()? If checkbox names are changed and they don't appear in the metadata field_name column, we can throw an error and suggestion. We should expect users don't manipulate the metadata tibbles, but we need the connection between the metadata tibble and the data tibble to remain intact. I think this supports our "if you change things, you need to take some responsibility for them" mindset.

Okay I'm fine with this. I would also add something to the documentation noting the pattern we're looking for.

I'm open to changing it, but when I was thinking through outputs I was worried about clashing with other possibly existing column names. See this example, if we're ok with that being the default in the event of a clash then I can rework this.

I would resolve this with a warning or possibly error if the fields already exist. pivot_longer() for example errors and directs the user to the names_repair parameter to provide a repair strategy:

tibble(x=1:3, y=4:6, value = 10) |>
 pivot_longer(x:y)

#>Error in `tidyr::pivot_longer()`:
#>! Names must be unique.
#>✖ These names are duplicated:
#>  * "value" at locations 1 and 3.
#>ℹ Use argument `names_repair` to specify repair strategy.

That may be too sophisticated for us but we may actually be able to recreate that behavior pretty easily with vctrs::vec_as_names() which is referenced in the docs for the names_repair parameter.

ugh... Fine. I knew it was coming but figured I'll get the rest of this ironed out first.

Haha if it ends up being too tricky that's fine!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright added support for names_repair and names_glue.

names_glue I'm still iffy on, the use case in the pivot_wider() documentation is a bit more complicated than I believe we can support here, but try the current set up out and let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that pivot_wider() supports more but I think we're still providing a lot value with what we have.

Imagine a user has a meals instrument with some checkboxes like this:

field_name
breakfast___apple
breakfast___orange
breakfast___spinach
lunch___apple
lunch___orange
lunch___spinach
dinner___apple
dinner___orange
dinner___spinach

They could do:

supertbl |>
  combine_checkboxes(
    "meals",
    matches("breakfast|lunch|dinner"),
    names_glue = "checkbox_{.value}_all"
  ) |>
  combine_checkboxes(
    "meals",
    matches("breakfast|lunch|dinner") & matches("apple|orange"),
    names_glue = "checkbox_{.value}_fruit"
  )

How slick is that?

multi_value_label = "Multiple",
values_fill = NA,
raw_or_label = "label",
keep = TRUE) {
# Check args ---
check_arg_is_supertbl(supertbl, req_cols = c("redcap_data", "redcap_metadata"))
check_arg_is_character(names_prefix, len = 1)
check_arg_is_character(names_suffix, len = 1, null.ok = TRUE)
check_arg_is_character(names_sep, len = 1, any.missing = TRUE)
check_arg_is_character(tbl, len = 1, any.missing = FALSE)
check_arg_is_character(values_to, len = 1, any.missing = FALSE)
check_arg_is_character(multi_value_label, len = 1, any.missing = TRUE)
check_arg_is_character(values_fill, len = 1, any.missing = TRUE)
check_arg_choices(raw_or_label, choices = c("label", "raw"))
check_arg_is_logical(keep, len = 1, any.missing = FALSE)

# Save user cols to enquosure
cols_exp <- enquo(cols)

# Extract tbl from supertbl
data_tbl <- supertbl %>%
extract_tibble(tbl)

# Get field names from cols_exp, check that fields exist
field_names <- names(eval_select(cols_exp, data = data_tbl))
check_fields_exist(fields = field_names, expr = cols_exp)
# Save user cols to quo
cols_exp <- enquo(cols)

# Define values_to as the count of TRUEs/1s for the given checkbox field
# Assign TRUE if multiple selections made, and FALSE if one or zero made
data_tbl_mod <- data_tbl %>%
mutate(
!!values_to := case_when(
rowSums(select(., eval_tidy(cols_exp))) > 1 ~ TRUE,
TRUE ~ FALSE
)
)
# Evaluate the cols expression to get the selected column names
selected_cols <- names(eval_select(cols_exp, data = data_tbl))
check_fields_exist(fields = selected_cols, expr = cols_exp) # Check supplied fields exist

# Get metadata reference table, check that chosen fields are checkboxes
metadata <- get_metadata_ref(data_tbl_mod, supertbl, tbl, field_names)
metadata_tbl <- supertbl$redcap_metadata[supertbl$redcap_form_name == tbl][[1]]
metadata_spec <- get_metadata_spec(metadata_tbl, selected_cols, names_prefix, names_suffix, names_sep)
ezraporter marked this conversation as resolved.
Show resolved Hide resolved

# Replace TRUEs/1s with raw/label values from metadata
data_tbl_mod <- data_tbl_mod %>%
mutate(across(
field_names,
~ replace_true(.x,
cur_column(),
metadata = metadata,
raw_or_label = raw_or_label
# Define .new_col as the count of TRUEs/1s for the given checkbox field
# Assign TRUE if multiple selections made, and FALSE if one or zero made
data_tbl_mod <- data_tbl
.new_col <- unique(metadata_spec$.new_value)

for (i in seq_along(.new_col)) {
ezraporter marked this conversation as resolved.
Show resolved Hide resolved
cols_to_sum <- metadata_spec$field_name[metadata_spec$.new_value == .new_col[i]] # nolint: object_usage_linter

data_tbl_mod <- data_tbl_mod %>%
mutate(
!!.new_col[i] := case_when(
rowSums(select(., cols_to_sum)) > 1 ~ TRUE,
.default = FALSE
)
)
))

}

# Convert values_to from TRUE/FALSE to multi_value_label or identified single val
# Replace TRUEs/1s with raw/label values from metadata
data_tbl_mod <- data_tbl_mod %>%
mutate(
across(field_names, as.character), # enforce to character strings
across(!!values_to, ~ as.character(.))
) %>%
mutate(
!!values_to := ifelse(!!sym(values_to) == "TRUE",
multi_value_label,
coalesce(!!!syms(field_names))
across(
selected_cols,
~ replace_true(.x,
cur_column(),
metadata = metadata_spec,
raw_or_label = raw_or_label
)
),
!!values_to := ifelse(is.na(!!sym(values_to)),
values_fill,
!!sym(values_to)
)
) %>%
mutate(
!!values_to := factor(!!sym(values_to),
levels = c(metadata[[raw_or_label]], multi_value_label, values_fill)
)
across(selected_cols, as.character) # enforce to character strings
)

# Use the metadata_spec table to fill values in .new_col
data_tbl_mod <- reduce(.new_col, function(tbl, col_item) {
ezraporter marked this conversation as resolved.
Show resolved Hide resolved
convert_metadata_spec(col_item, metadata_spec, tbl, raw_or_label, multi_value_label, values_fill)
}, .init = data_tbl_mod)

final_tbl <- bind_cols(
data_tbl,
data_tbl_mod %>% select(!!values_to)
data_tbl_mod %>% select(!!.new_col)
)

# Keep or remove original multi columns
if (!keep) {
final_tbl <- final_tbl %>%
select(-field_names)
select(-selected_cols)
}

# Update the supertbl data tibble
Expand All @@ -129,40 +127,60 @@ combine_checkboxes <- function(supertbl,
supertbl
}

#' @title Utility function for getting metadata raw and label values for checkboxes
#' @title Get metadata specification table
#'
#' @param data a data tibble
#' @param supertbl A supertibble generated by [read_redcap()].
#' @param tbl The name of the REDCap form (instrument) containing the checkbox
#' fields.
#' @param field_names Character string vector of field names for checkbox combination
#' @inheritParams combine_checkboxes
#' @param metadata_tbl A metadata tibble from the supertibble generated by [read_redcap()].
#' @param selected_cols Character string vector of field names for checkbox combination
#'
#' @returns a tibble
#'
#' @keywords internal
get_metadata_ref <- function(data,
supertbl,
tbl,
field_names) {
get_metadata_spec <- function(metadata_tbl,
selected_cols,
names_prefix,
names_suffix,
names_sep) {
# Create a metadata reference table linking field name to raw and label values
out <- supertbl$redcap_metadata[supertbl$redcap_form_name == tbl][[1]] %>%
filter(.data$field_name %in% field_names) %>%
# TODO: original_field a temporary placeholder for future multi-field and mapping dev
out <- metadata_tbl %>%
filter(.data$field_name %in% selected_cols) %>%
mutate(
original_field = sub("___.*$", "", .data$field_name)
.value = sub("___.*$", "", .data$field_name),
.new_value = case_when(!is.null(names_suffix) ~ paste(names_prefix, .value, names_suffix, sep = names_sep),
.default = paste(names_prefix, .data$.value, sep = names_sep)
)
)

# Make sure selection is checkbox metadata field type
check_fields_are_checkboxes(out)

# TODO: Make more robust for multi-field and mapping, using original_field above
parsed_vals <- parse_labels(first(out$select_choices_or_calculations))
# Bind raw/label values per original field grouping
parsed_vals <- tibble()

for (i in seq_along(unique(out$.value))) {
index <- unique(out$.value)[i]
out_filtered <- out %>% filter(.data$.value == index)

parsed_vals <- rbind(parsed_vals, parse_labels(first(out_filtered$select_choices_or_calculations)))
}

bind_cols(out, parsed_vals) %>%
select(.data$field_name, .data$raw, .data$label)
select(.data$field_name, .data$raw, .data$label, .data$.value, .data$.new_value) %>%
relocate(c(.data$.value, .data$.new_value), .after = .data$field_name)
}

#' @noRd
#' @title Replace checkbox TRUEs with raw_or_label values
#'
#' @inheritParams combine_checkboxes
#' @param col A vector
#' @param col_name A string
#' @param metadata A metadata tibble from the original supertibble
#'
#' @description
#' Simple utility function for replacing checkbox field values.
#'
#' @returns A character string
#'
#' @keywords internal
replace_true <- function(col, col_name, metadata, raw_or_label) {
# Replace TRUEs/1s with the appropriate raw or label value from the metadata
Expand All @@ -173,3 +191,53 @@ replace_true <- function(col, col_name, metadata, raw_or_label) {
# Convert non-TRUEs to NA, since values can be either "FALSE" or "0" for unchecked values
return(col)
}

#' @title Use metadata_spec to convert new column values
#'
#' @description
#' [convert_metadata_spec()] uses the `metadata_spec` table provided by [get_metadata_spec()]
#' to automatically convert new column values to either:
#'
#' - A `raw_or_label` checkbox value when only a single value is detected
#' - `mult_value_label` when multiple values are detected
#' - `values_fill` when `NA` is detected
#'
#' @inheritParams combine_checkboxes
#' @param .new_col_item A character string
#' @param metadata_spec A tibble output from [convert_metadata_spec()]
#' @param data_tbl_mod A modified data tibble
#'
#' @returns a tibble
#'
#' @keywords internal
convert_metadata_spec <- function(.new_col_item,
metadata_spec,
data_tbl_mod,
raw_or_label,
multi_value_label,
values_fill) {
.col_group <- metadata_spec$field_name[metadata_spec$.new_value == .new_col_item]

metadata_overwrite <- metadata_spec %>%
filter(.data$field_name %in% .col_group) %>%
pull(raw_or_label)

data_tbl_mod <- data_tbl_mod %>%
mutate(
!!.new_col_item := ifelse(!!sym(.new_col_item),
multi_value_label,
coalesce(!!!syms(.col_group))
),
!!.new_col_item := ifelse(is.na(!!sym(.new_col_item)),
values_fill,
!!sym(.new_col_item)
)
) %>%
mutate(
!!.new_col_item := factor(!!sym(.new_col_item),
levels = c(metadata_overwrite, multi_value_label, values_fill)
)
)

return(data_tbl_mod)
}
20 changes: 20 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -954,3 +954,23 @@ safe_set_variable_labels <- function(data, labs) {
labs_to_keep <- intersect(names(labs), colnames(data))
labelled::set_variable_labels(data, !!!labs[labs_to_keep])
}

#' @title
#' Extract a specific metadata tibble from a supertibble
#'
#' @description
#' Utility function to extract a specific metadata tibble from a supertibble
#' given a `redcap_form_name`
#'
#' @param supertbl A supertibble generated by [read_redcap()].
#' @param redcap_form_name A character string identifying the `redcap_form_name`
#' the metadata tibble is associated with.
#'
#' @return
#' A tibble
#'
#' @keywords internal

extract_metadata_tibble <- function(supertbl, redcap_form_name) {
supertbl$redcap_metadata[supertbl$redcap_form_name == redcap_form_name][[1]]
}
2 changes: 1 addition & 1 deletion man/check_fields_exist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 11 additions & 6 deletions man/combine_checkboxes.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading