Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect issue #564

Open
wants to merge 23 commits into
base: dev
Choose a base branch
from

Conversation

brookslogan
Copy link
Contributor

@brookslogan brookslogan commented Nov 8, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch 4 times, most recently from 7a8ec75 to 85503c1 Compare November 8, 2024 19:35
@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch 5 times, most recently from 1627870 to e6f7264 Compare November 12, 2024 02:18
@brookslogan
Copy link
Contributor Author

brookslogan commented Nov 12, 2024

  • resolve styler vs linter opinions on line length
  • check whether naming overrides forwarded through _mean and _sum
  • [-] add .prefix and .suffix naming features to base epi[x]_slide(?) --- but they are not the same feature... .prefix and .suffix for epi[x]_slide would be on the output columns, not on the input columns, and would not need to be glue syntax... might be confusing.
  • Update NEWS.md
  • [-] vignette updates?
    • Just did minimal changes to keep them running for now. Really should present epi_slide_{sum,mean} first, then the general version.
  • auto-grouping&ungrouping by epikey, and ensure epi_slide also auto-ungroups if auto-grouping?
  • Update epipredict to not break

@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from e6f7264 to 6a22c4b Compare November 12, 2024 19:55
- BREAKING CHANGE: adjust default output column naming scheme, disallow
  overwriting columns.
@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch 2 times, most recently from 882e960 to 9669773 Compare November 12, 2024 20:59
@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from fdc9ee7 to 79ed2f3 Compare November 12, 2024 22:42
- Show naming options, including with multi-column selections when we have
  accommodating example data sets
- Select away the pre-existing 7d aggregations in the example data set
- Ungroup output
@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from 845f4c0 to beb434c Compare November 14, 2024 20:51
@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from 45170b1 to f9a8356 Compare November 14, 2024 21:29
@brookslogan brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from ed23df0 to 6d8bcf6 Compare November 14, 2024 21:43
@brookslogan brookslogan marked this pull request as ready for review November 14, 2024 21:44
@brookslogan brookslogan changed the title Lcb/epi slide opt output renaming Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect bug Nov 14, 2024
@brookslogan brookslogan changed the title Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect bug Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect issue Nov 14, 2024
@nmdefries nmdefries self-requested a review November 14, 2024 21:58
@brookslogan
Copy link
Contributor Author

@nmdefries sorry for all the noise here; I tried to implement just 1 time utility from #494 but forgot it actually relied on the concept of a "unit" time delta/step; fixed that up and added the unit time step helper. Think that's fixed now. Also fixed some CHECK issues.

The core changes, and now these helpers, should hopefully be ready now.

R/slide.R Outdated Show resolved Hide resolved
R/slide.R Outdated Show resolved Hide resolved
Comment on lines +595 to +621
#' @param .prefix Optional [`glue::glue`] format string; name the slide result
#' column(s) by attaching this prefix to the corresponding input column(s).
#' Some shorthand is supported for basing the output names on `.window_size`
#' or other arguments; see "Prefix and suffix shorthand" below.
#' @param .suffix Optional [`glue::glue`] format string; like `.prefix`. The
#' default naming behavior is equivalent to `.suffix =
#' "_{.n}{.time_unit_abbr}{.align_abbr}{.f_abbr}"`. Can be used in combination
#' with `.prefix`.
#' @param .new_col_names Optional character vector with length matching the
#' number of input columns from `.col_names`; name the slide result column(s)
#' with these names. Cannot be used in combination with `.prefix` and/or
#' `.suffix`.
#'
#' @section Prefix and suffix shorthand:
#'
#' [`glue::glue`] format strings specially interpret content within curly
#' braces. E.g., `glue::glue("ABC{2 + 2}")` evaluates to `"ABC4"`. For `.prefix`
#' and `.suffix`, we provide `glue` with some additional variable bindings:
#'
#' - `{.n}` will be the number of time steps in the computation
#' corresponding to the `.window_size`.
#' - `{.time_unit_abbr}` will be a lower-case letter corresponding to the
#' `time_type` of `.x`
#' - `{.align_abbr}` will be `""` if `.align` is the default of `"right"`;
#' otherwise, it will be the first letter of `.align`
#' - `{.f_abbr}` will be a character vector containing a short abbreviation
#' for `.f` factoring in the input column type(s) for `.col_names`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: please move this section to man-roxygen/basic-slide-params.R so it can be included here and for the epi_slide_[mean/opt] docs too.

man-roxygen/basic-slide-params.R also has grouping/ungrouping language that might need to be updated.

Copy link
Contributor Author

@brookslogan brookslogan Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be in basic-slide-params.R, because it's not shared by epi_slide. It's already shared by {opt,mean,sum} because they are all in one topic. I'm not adding .prefix and .suffix to epi[x]_slide (yet at least), since it's actually a bit distinct: epi_slide_{opt,mean,sum} are applying these to input columns to name associated output columns, while epi[x]_slide would be applying these to names of columns generated by the user computation. #534 suggests this feature might be appreciated though.

Because {opt,mean,sum} are all in one topic, the examples are a bit redundant + the mean & sum examples are smashed next to each other without spacing/separation.

  • todo: fix smashing or just put all the examples in epi_slide_opt and remove redundant ones.

  • todo: update basic-slide-params.R grouping language

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic-slide-params.R already had the appropriate language regarding groupedness of .x (that was probably a doc bug before this PR, but this PR makes it true). I did update it to mention groupedness/ungroupedness of the output.

I reworked the examples, deleting some, reordering others, tweaking discussion, maybe added another. Could benefit from a sanity check.

R/slide.R Show resolved Hide resolved
R/slide.R Outdated Show resolved Hide resolved
R/slide.R Show resolved Hide resolved
)
}

# Grouping should ensure that we don't have duplicate time values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: How does grouping ensure lack of duplicates?

It's not clear to me that this is true (or at least, we're handling duplicates differently in different places, e.g. epi_slide does check for duplicated time_values, despite now grouping by the same keys as ...opt does).

I know we check for duplicate keys in as_epi_df.tbl_df, but I don't see it anywhere else for epi_dfs. dplyr_reconstruct.epi_df checks for duplicated column names, but not anything else. Notably, new_epi_df doesn't check for duplicates. It is described as being a "lower-level constructor", but I think a user could conceivably find and use it, and end up with an incorrect result here.

Maybe the issue is just that new_epi_df is exported when it needn't be?

Copy link
Contributor Author

@brookslogan brookslogan Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, right, it currently doesn't completely ensure it. Dmitry tested out placing a check in dplyr_reconstruct but it was too slow for how often it runs. I've looked into a couple of ways to address this:

  • Fixing our dplyr_extending usage which is pretty wonky, so we wouldn't run checks as often. (experimentation here, incomplete)

  • Making the check faster (Speed up duplicate detection in as_epi_df() #560)
    I didn't complete those, so

  • todo: add back duplicated time values check for now

new_epi_df shouldn't validate something complicated like this if we try to follow Hadley recommendations. Related: #312.

I'm not sure about not exporting new_epi_df; #94 suggests it's needed downstream, but I don't actually see it anywhere in epipredict right now. We should probably at least provide a helper epi_df that acts like tibble, which would be convenient and make new_epi_df not stand out as much. Filed off into #572.

tests/testthat/test-epi_slide.R Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make epi_slide_opt work with !!varname, maybe !!varnames, maybe !!!varnames
2 participants