Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_checkboxes #196

Merged
merged 39 commits into from
Aug 13, 2024
Merged

combine_checkboxes #196

merged 39 commits into from
Aug 13, 2024

Conversation

rsh52
Copy link
Collaborator

@rsh52 rsh52 commented Jul 15, 2024

Description

This PR seeks to add out first "analytics" function as a tool for data analysts working with the supertibble after it's been created.

This first function, combine_checkboxes(), seeks to take the wide-form output of checkbox fields in a data tibble and do the following:

  • Combine under a single column
  • The single column is named by the user
  • The single column changes from TRUE/FALSE / 1/0 to showing:
    • raw or label values associated with the checkbox if a single value is selected
    • A user specified string if multiple values are selected (multi_value_label)
    • A user specified string of no values are selected (values_fill)

I implemented some additional things like check_* style functions, but interested in thoughts, feedback, etc. I figure this will go through a few iterations before being finalized.

Proposed Changes

List changes below in bullet format:

  • Create analytics function combine_checkboxes()
  • Implement check for when no fields supplied by the user are detected (check_fields_exist()) and provide helpful error message
  • Implement check for when fields supplied detected but are not checkbox field types (check_fields_are_checkboxes()) and provide helpful error message
  • Add documentation and tests for above functions
  • Add combine_checkboxes() to pkgdown site

Additional Changes:

  • revdepcheck is no longer maintained, so I'm trying out a new workflow from r-devel called recheck. Another option is to use tools::check_packages_in_dir().
  • Removed revdepcheck folder to stop renv issues
  • renv lockfile updated as well

Issue Addressed

Closes #194

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • New/revised functions have associated tests
  • [NA] New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • New/revised functions use appropriate naming conventions
  • New/revised functions don't repeat code
  • [NA] Code changes are less than 250 lines total
  • Issues linked to the PR using GitHub's list of keywords
  • The appropriate reviewer is assigned to the PR
  • The appropriate developers are assigned to the PR
  • Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • I checked that new files follow naming conventions and are in the right place
  • I checked that documentation is complete, clear, and without typos
  • I added/edited comments to explain "why" not "how"
  • I checked that all new variable and function names follow naming conventions
  • I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • I checked that new tests address important edge cases

@rsh52 rsh52 added the enhancement New feature or request label Jul 15, 2024
@rsh52 rsh52 self-assigned this Jul 15, 2024
@rsh52 rsh52 marked this pull request as ready for review July 15, 2024 20:49
@rsh52 rsh52 requested a review from ezraporter July 15, 2024 20:49
Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clearly does what we want but I think the code could be clearer. Maybe we can discuss a bit more together. Generally:

  1. Carrying around the instrument identifiers creates a lot of complication I don't think is needed. Since we're not changing the number of rows we should just be able to bind_cols().
  2. There's a lot of NSE where I think you'd be better off creating temporary columns with a set name and renaming at the end.
  3. The label parsing code seems duplicative with parse_labels().

API Thoughts

I could see a user wanting to apply this function to many checkboxes across many instruments. I think we need to give them a way of doing this that's better than calling this function over and over.

Can we generalize to allow multiple checkbox fields in a single instrument to be done in one call?

Do we want a function (map_supertbl()?) that can apply a transformation iteratively to rows of a supertibble? Possibly with the ability to select specific data tibbles and vary arguments across them.

R/combine_checkboxes.R Outdated Show resolved Hide resolved
R/combine_checkboxes.R Outdated Show resolved Hide resolved
R/combine_checkboxes.R Outdated Show resolved Hide resolved
R/combine_checkboxes.R Outdated Show resolved Hide resolved
R/combine_checkboxes.R Outdated Show resolved Hide resolved
@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 17, 2024

API Thoughts

I could see a user wanting to apply this function to many checkboxes across many instruments. I think we need to give them a way of doing this that's better than calling this function over and over.

Can we generalize to allow multiple checkbox fields in a single instrument to be done in one call?

Agreed and figured that would be the next part of this PR. Should be just a matter of opening up some internal mutates with some grouping/.by specifiers.

Do we want a function (map_supertbl()?) that can apply a transformation iteratively to rows of a supertibble? Possibly with the ability to select specific data tibbles and vary arguments across them.

I think that's a great idea, and can be potentially useful for other functions we come up with (or maybe that users come up with).

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 18, 2024

I could see a user wanting to apply this function to many checkboxes across many instruments. I think we need to give them a way of doing this that's better than calling this function over and over.

Can we generalize to allow multiple checkbox fields in a single instrument to be done in one call?

@ezraporter Coming back to this I have a couple of thoughts and questions.

I could see giving a custom "tidyselect" option at default for cols called all_checkboxes() (or just default this to everything() and have documentation specify it will be applied to all checkboxes) and, if enabled, have that look for all unique checkbox fields in a given form and combine them. I still think the core function needs to be done on a single form/row of the supertibble, but the mapping function can be what we use to iterate over the supertibble.

For times when users don't want to combine all checkboxes but want to combine more than one, they're most likely to use starts_with() to grab all checkbox___* fields (unless we were to look solely at the original field name before the ___). What are your thoughts for the API for how users would implement multiple tidyselect calls for a single parameter?

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 18, 2024

For times when users don't want to combine all checkboxes but want to combine more than one, they're most likely to use starts_with() to grab all checkbox___* fields (unless we were to look solely at the original field name before the ___). What are your thoughts for the API for how users would implement multiple tidyselect calls for a single parameter?

Actually I think we can easily update the internals so that users could just give cols something like starts_with("race") | starts_with("ethnicity"). I didn't piece together that eval_select() was already smart enough to handle it. Still curious about doing a default "everything".

@ezraporter
Copy link
Collaborator

ezraporter commented Jul 18, 2024

I could see giving a custom "tidyselect" option at default for cols called all_checkboxes() (or just default this to everything() and have documentation specify it will be applied to all checkboxes) and, if enabled, have that look for all unique checkbox fields in a given form and combine them.

Love this idea and I think all_checkboxes() is the way to go. Better to keep it distinct from everything() IMO.

I still think the core function needs to be done on a single form/row of the supertibble, but the mapping function can be what we use to iterate over the supertibble.

Agree

Actually I think we can easily update the internals so that users could just give cols something like starts_with("race") | starts_with("ethnicity").

My concern here would be that tidyselect deals with sets of variables and I haven't seen examples where groupings within that set are meaningful. For example, in usages I've seen starts_with("x") | starts_with("y") is equivalent to matches("^x|y"). I think we'd be breaking that in our case. (Although maybe that's okay!)

My idea would be to keep the tidyselect in cols as just a selector of a set of columns and add other parameters to specify how the column names should be parsed into a name-value pair:

# Some dummy data

data <- tibble(
  id = 1,
  x___1 = TRUE,
  x___2 = NA,
  y___1 = NA,
  y___2 = TRUE
)

metadata <- tibble(
  field_name = c("x___1", "x___2", "y___1", "y___2"),
  field_type = "checkbox",
  select_choices_or_calculations = "1, A | 2, B"
)

suprtbl <- tibble(
  redcap_form_name = "tbl",
  redcap_data = list(data),
  redcap_metadata = list(metadata)
) |>
  as_supertbl()

# Proposed API

combine_checkboxes(
  supertbl = suprtbl,
  tbl = "tbl",
  cols = c(starts_with("x"), starts_with("y")),
  sep = "___", # make this the default?
  values_to = c("x", "y")
) |>
  extract_tibble("tbl")

## A tibble: 1 × 7
#     id x___1 x___2 y___1 y___2 x     y    
#  <dbl> <lgl> <lgl> <lgl> <lgl> <fct> <fct>
#1     1 TRUE  NA    NA    TRUE  A     B    

# possibly give a way to infer names (inspired by across(.names = ))
combine_checkboxes(
  supertbl = suprtbl,
  tbl = "tbl",
  cols = c(starts_with("x"), starts_with("y")),
  sep = "___",
  values_to = "{.col}"
)

# possibly allow for more complex specifications (inspired by separate_wider_regex)
combine_checkboxes(
  supertbl = suprtbl,
  tbl = "tbl",
  cols = c(starts_with("x"), starts_with("y")),
  patterns = c(name = ".+", "___", value = ".+"),
  values_to = "{.col}"
)

@ezraporter
Copy link
Collaborator

One more thought. In messing around with this I'm not sure how I feel about the values_to name. The analogy to pivot_longer() seems more tenuous than I initially thought and I'm wondering if something like names would be better.

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 18, 2024

Actually, I think we kind of need to support both logicals and c() for cols. Below these amount to the same:

> suprtbl$redcap_data[[1]] %>% select(starts_with("x") | starts_with("y"))
# A tibble: 1 × 4
  x___1 x___2 y___1 y___2
  <lgl> <lgl> <lgl> <lgl>
1 TRUE  NA    NA    TRUE 
> suprtbl$redcap_data[[1]] %>% select(c(starts_with("x"), starts_with("y")))
# A tibble: 1 × 4
  x___1 x___2 y___1 y___2
  <lgl> <lgl> <lgl> <lgl>
1 TRUE  NA    NA    TRUE 

Fortunately only the code below is (currently) really concerned with what gets passed to cols, and it already works with c(), it just needs to be updated so that the eval_tidy() works with the logicals:

# Get field names from cols_exp, check that fields exist
field_names <- names(eval_select(cols_exp, data = data_tbl))
check_fields_exist(fields = field_names, expr = cols_exp)
# Define values_to as the count of TRUEs/1s for the given checkbox field
# Assign TRUE if multiple selections made, and FALSE if one or zero made
data_tbl_mod <- data_tbl %>%
mutate(
!!values_to := case_when(
rowSums(select(., eval_tidy(cols_exp))) > 1 ~ TRUE,
TRUE ~ FALSE
)
)

But if you feel strongly about banning use of logicals, we'd need to implement some enforced block to users doing so.


I like the API suggestions, I need to give some more thought as to how they would get implemented in practice.

  • values_to = c("x", "y"): I assume we would need to set up some check to make sure values_to and cols are of equal size?
  • values_to = "{.col}": This one I like a lot, essentially converting back to the originally defined names before the REDCap changes?

One more thought. In messing around with this I'm not sure how I feel about the values_to name. The analogy to pivot_longer() seems more tenuous than I initially thought and I'm wondering if something like names would be better.

Hm, not sold either way. If we're trying to align with pivot_longer()-inspiration then names also doesn't really replicate what names_* does in that function right?

@ezraporter
Copy link
Collaborator

Actually, I think we kind of need to support both logicals and c() for cols.

Sorry, my point wasn't that we shouldn't support this. Just that both methods of selecting columns should be equivalent for the user (as I think they are now).

I like the API suggestions, I need to give some more thought as to how they would get implemented in practice.

  • values_to = c("x", "y"): I assume we would need to set up some check to make sure values_to and cols are of equal size?
  • values_to = "{.col}": This one I like a lot, essentially converting back to the originally defined names before the REDCap changes?

Some more thoughts that might help with implementation. I would break out what we need to do into 3 steps. The important point is that the result of each step basically gives you everything you need to carry out the rest of the data transformation.

Step 1: Select a bunch of checkbox fields from the data.

x___1, x___2, y___1, y___2

Step 2: Convert the result into a representation that maps checkbox field names to checkbox field values.

col value name
x 1 x___1
x 2 x___2
y 1 y___1
y 2 y___2

Step 3: Apply our existing logic to each group created by col

API-wise, you can sum up my position as: separate parameters should be responsible for doing step 1 and step 2. That is, cols just does step 1 and tells you nothing about step 2.

I think the representation in step 2 is kind of nice because all the validation can happen there:

  • Everything in name is a checkbox field
  • values_to has the same number of elements as groups of col

In this light, the parameters I have in my API suggestion above are just helpers to let the user define how we go from step 1 to step 2.

Hm, not sold either way. If we're trying to align with pivot_longer()-inspiration then names also doesn't really replicate what names_* does in that function right?

Yeah, this comment was just to say I'm not sold either now so don't stick with values_to because you think I love it 😄. The pivot_longer() connection seems less strong now that I've thought about the API more.

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 29, 2024

@ezraporter Check out the newer API when you have a moment. Probably still a bit more to go, but I think this addresses the 3 steps you outlined above. Changes made:

  • Dropped values_to in favor of a combination of names_prefix, names_suffix, and names_sep (which also draw from pivot_*()). I think this is a better approach and does a better job of ensuring there's no mismatch in the number of elements between the supplied checkbox fields and what they get converted to.
  • Updated some internal functions for code cleanup and changed some syntax/naming that I think is more understandable
  • Docs, tests, etc.

Things still to do pending your thoughts:

  • Give option for users to specify all_checkboxes() as an option to the cols param
  • (Maybe) add option for names_glue parameter similar to pivot_*()
    • I had started working on this but decided to forgo. It's more complicated than I thought it would be because of how we have to call metadata and data values separately, whereas pivot_*() doesn't have to do this and it's much simpler to work with. Not saying no to it, but want to make sure the value is there for including it.

@rsh52
Copy link
Collaborator Author

rsh52 commented Jul 31, 2024

Also confirming that the supertibble output works as expected when using make_labelled().

When applying make_labelled() on a modified supertibble using combine_checkboxes() all labels are created as usual, there just aren't any new ones for the newly created columns.

@ezraporter ezraporter self-requested a review August 1, 2024 19:54
Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are great improvements! Some optional refactor ideas in there. I do think we need to change the default names_* settings and should try to do names_glue if we can!

Comment on lines 44 to 46
names_prefix = "",
names_suffix = NULL,
names_sep = "_",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that these parameters control the names of the new columns and not how variable names (ex. race___0) are parsed into names and values?

A couple comments:

  1. If we go with this I think we need to make it clearer that we're assuming the checkbox fields are in name___value format and not giving the user any control over how that's parsed.
  2. I don't think the defaults are right. In this example the output col is called _race. The default should probably just produce race:
db_label |>
  combine_checkboxes(
    "demographics",
    starts_with("race")
  )
  1. I see your "Maybe" in the PR comment and also think names_glue would be super valuable if we can do it 😊

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that these parameters control the names of the new columns and not how variable names (ex. race___0) are parsed into names and values?

They control the structure of the names, but the names themselves come from .value in get_metadata_spec(), i.e. the field name prior to the ___ checkbox changes.

If we go with this I think we need to make it clearer that we're assuming the checkbox fields are in name___value format and not giving the user any control over how that's parsed.

How about we just add a check_metadata_fields_exist() in get_metadata_spec() similar to what I have in the parent function for check_fields_exist()? If checkbox names are changed and they don't appear in the metadata field_name column, we can throw an error and suggestion. We should expect users don't manipulate the metadata tibbles, but we need the connection between the metadata tibble and the data tibble to remain intact. I think this supports our "if you change things, you need to take some responsibility for them" mindset.

I don't think the defaults are right. In this example the output col is called _race. The default should probably just produce race

I'm open to changing it, but when I was thinking through outputs I was worried about clashing with other possibly existing column names. See this example, if we're ok with that being the default in the event of a clash then I can rework this.

Click me
data <- tibble(
  id = 1,
  prefix = "prefix",
  x___1 = TRUE,
  x___2 = FALSE,
  y___1 = TRUE,
  y___2 = TRUE,
  z___1 = FALSE,
  x = "val"
)

metadata <- tibble(
  field_name = c("id", "prefix", "x___1", "x___2", "y___1", "y___2", "z___1", "x"),
  field_type = c("text", "text", rep("checkbox", 5), "text"),
  select_choices_or_calculations = c(NA, NA, rep("1, A | 2, B", 4), "3, C", NA),
)

suprtbl <- tibble(
  redcap_form_name = "tbl",
  redcap_data = list(data),
  redcap_metadata = list(metadata)
) |>
  as_supertbl()

combine_checkboxes(supertbl = suprtbl,
                   tbl = "tbl",
                   cols = c(starts_with("x__"), starts_with("y"), "z___1"),
                   names_sep = "") %>%
  pull(redcap_data)

New names:`x` -> `x...8``x` -> `x...9`
[[1]]
# A tibble: 1 × 11
     id prefix x___1 x___2 y___1 y___2 z___1 x...8 x...9 y        z    
  <dbl> <chr>  <lgl> <lgl> <lgl> <lgl> <lgl> <chr> <fct> <fct>    <fct>
1     1 prefix TRUE  FALSE TRUE  TRUE  FALSE val   A     Multiple NA   

I see your "Maybe" in the PR comment and also think names_glue would be super valuable if we can do it 😊

ugh... Fine. I knew it was coming but figured I'll get the rest of this ironed out first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just add a check_metadata_fields_exist() in get_metadata_spec() similar to what I have in the parent function for check_fields_exist()? If checkbox names are changed and they don't appear in the metadata field_name column, we can throw an error and suggestion. We should expect users don't manipulate the metadata tibbles, but we need the connection between the metadata tibble and the data tibble to remain intact. I think this supports our "if you change things, you need to take some responsibility for them" mindset.

Okay I'm fine with this. I would also add something to the documentation noting the pattern we're looking for.

I'm open to changing it, but when I was thinking through outputs I was worried about clashing with other possibly existing column names. See this example, if we're ok with that being the default in the event of a clash then I can rework this.

I would resolve this with a warning or possibly error if the fields already exist. pivot_longer() for example errors and directs the user to the names_repair parameter to provide a repair strategy:

tibble(x=1:3, y=4:6, value = 10) |>
 pivot_longer(x:y)

#>Error in `tidyr::pivot_longer()`:
#>! Names must be unique.
#>✖ These names are duplicated:
#>  * "value" at locations 1 and 3.
#>ℹ Use argument `names_repair` to specify repair strategy.

That may be too sophisticated for us but we may actually be able to recreate that behavior pretty easily with vctrs::vec_as_names() which is referenced in the docs for the names_repair parameter.

ugh... Fine. I knew it was coming but figured I'll get the rest of this ironed out first.

Haha if it ends up being too tricky that's fine!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright added support for names_repair and names_glue.

names_glue I'm still iffy on, the use case in the pivot_wider() documentation is a bit more complicated than I believe we can support here, but try the current set up out and let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that pivot_wider() supports more but I think we're still providing a lot value with what we have.

Imagine a user has a meals instrument with some checkboxes like this:

field_name
breakfast___apple
breakfast___orange
breakfast___spinach
lunch___apple
lunch___orange
lunch___spinach
dinner___apple
dinner___orange
dinner___spinach

They could do:

supertbl |>
  combine_checkboxes(
    "meals",
    matches("breakfast|lunch|dinner"),
    names_glue = "checkbox_{.value}_all"
  ) |>
  combine_checkboxes(
    "meals",
    matches("breakfast|lunch|dinner") & matches("apple|orange"),
    names_glue = "checkbox_{.value}_fruit"
  )

How slick is that?

R/combine_checkboxes.R Outdated Show resolved Hide resolved
R/combine_checkboxes.R Outdated Show resolved Hide resolved
R/combine_checkboxes.R Outdated Show resolved Hide resolved
@rsh52 rsh52 requested a review from ezraporter August 5, 2024 20:58
Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is basically there. Just a couple comments on how to make our names_glue implementation a little safer

DESCRIPTION Outdated
@@ -38,6 +38,7 @@ Imports:
stats
Suggests:
covr,
glue,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can import glue. It's already a dependency of packages we import (tidyr, stringr at least) so we're not really changing anything by bumping it up from Suggests

.new_value = case_when(!is.null(names_suffix) ~ paste(names_prefix, .value, names_suffix, sep = names_sep),
.default = paste(names_prefix, .data$.value, sep = names_sep)
if (!is.null(names_glue)) {
check_installed("glue", reason = "to use `names_glue` in `combine_checkboxes()`")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove if we import glue

Suggested change
check_installed("glue", reason = "to use `names_glue` in `combine_checkboxes()`")

Comment on lines 152 to 154
mutate(
.value = sub("___.*$", "", .data$field_name),
.new_value = as.character(glue::glue(names_glue))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be more careful with this implementation of names_glue.

Right now we're giving names_glue access to everything in the metadata which can cause weird results:

db_label |>
  combine_checkboxes("demographics", starts_with("race"), names_glue = "xyz_{field_name}") |>
  extract_tibble("demographics")

In this example we actually create the wrong number of output columns silently.

I think we should:

  1. Use glue_data() rather than glue() to scope what the user actually has access to glue with:
# Could include more things we want to give the user access to here
glue_env <- select(out, .value)

out <- out |>
  mutate(.new_value = glue_data(glue_env, names_glue))
  1. Enforce the constraint that .new_value is the same within each level of .value. This ensures the user always gets the expected number of output columns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezraporter quick q. Happy to use glue_data() as discussed, but in this example the output is still going to be the same since field_name is still accessible from out. The intention here is to have it fail instead of falsely grabbing the field_name from the metadata (for now) right? See if this is the change we're looking for for (1):

get_metadata_spec <- function(metadata_tbl,
                              selected_cols,
                              names_prefix,
                              names_sep,
                              names_glue) {
  check_metadata_fields_exist(metadata_tbl, selected_cols)

  # Create a metadata reference table linking field name to raw and label values
  out <- metadata_tbl %>%
    filter(.data$field_name %in% selected_cols) %>%
    mutate(
      .value = sub("___.*$", "", .data$field_name)
    )

  if (!is.null(names_glue)) {
    # Similar to pivot_*, use of `names_glue` overrides use of names_prefix/sep
    glue_env <- select(out, .value) %>%
      mutate(.new_value = as.character(glue::glue_data(., names_glue))) %>%
      select(.new_value)

    out <- cbind(out, glue_env)
  } else {
    out <- out %>%
      mutate(
        .new_value = case_when(names_prefix != "" ~ paste(names_prefix, .value, sep = names_sep),
          .default = paste(names_prefix, .data$.value, sep = "")
        )
      )
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Enforce the constraint that .new_value is the same within each level of .value. This ensures the user always gets the expected number of output columns.

See what you think of the most recent commit for this (1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why we're converting things to factors. The validation I was talking about was something like:

check <- out |>
  group_by(.value) |>
  summarize(n=n_distinct(.new_value)) |>
  pull(n)

if (!all(check == 1)) {
  # Throw an error
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I thought you wanted more of an enforcement not a check. Can implement.

Out of curiosity what's an example with this set up of how someone would still trigger this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check function implemented, but unsure what the message should be so let me know how you'd like to tweak.

  error_data <- tibble::tribble(
    ~"id", ~"col1", ~"col2",
    1, "A", "A1",
    2, "B", "B1",
    3, "B", "B2"
  )

  check_equal_col_summaries(error_data, col1, col2)

Error in check_equal_col_summaries():
✖ Encountered unequal naming outputs.
! combine_checkboxes() call resulted in column output: A, B, and B and new column output: A1, B1, and B2.
Run rlang::last_trace() to see where the error occurred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity what's an example with this set up of how someone would still trigger this?

Mmm I see. I think in our current set-up it can't get triggered so maybe it's redundant at this point? There are some weird cases like this but it's pretty contrived:

data <- tibble::tibble(.value = c("A", "A", "B"))
vector_in_env <- 1:3

data |>
  mutate(.new_value = glue_data(data, "{vector_in_env}_{.value}")

All that is to say: maybe we should just drop it. If we keep it we probably want it to say something like "Checkbox field B resulted in multiple output columns, B1 and B2. Check that names_glue defines only 1 output column for each checkbox field."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yea I couldn't think of a way from the UI to trigger this but there's no real harm in keeping it in for now. I can put a comment in that says we may forgo it in the future. I'll update the error message to be closer to your suggestion.

@rsh52 rsh52 requested a review from ezraporter August 13, 2024 18:29
Copy link
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

@rsh52 rsh52 merged commit a6c8602 into main Aug 13, 2024
4 checks passed
@rsh52 rsh52 deleted the reduce_multi_to_single branch August 13, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Multiple Choice to Single Column Function
2 participants