Adjust `epi_slide_opt` output naming; adjust `epi_slide*` autogrouping; fix tidyselect issue #564

brookslogan · 2024-11-08T01:08:31Z

Checklist

Please:

Make sure this PR is against "dev", not "main" (unless this is a release
PR).
Request a review from one of the current main reviewers:
brookslogan, nmdefries.
Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

brookslogan · 2024-11-12T02:21:11Z

resolve styler vs linter opinions on line length
check whether naming overrides forwarded through _mean and _sum
[-] add .prefix and .suffix naming features to base epi[x]_slide(?) --- but they are not the same feature... .prefix and .suffix for epi[x]_slide would be on the output columns, not on the input columns, and would not need to be glue syntax... might be confusing.
Update NEWS.md
[-] vignette updates?
- Just did minimal changes to keep them running for now. Really should present epi_slide_{sum,mean} first, then the general version.
auto-grouping&ungrouping by epikey, and ensure epi_slide also auto-ungroups if auto-grouping?
Update epipredict to not break

- BREAKING CHANGE: adjust default output column naming scheme, disallow overwriting columns.

- Show naming options, including with multi-column selections when we have accommodating example data sets - Select away the pre-existing 7d aggregations in the example data set - Ungroup output

We may later need it to make default args, in which case we should probably export to be less "magical".

brookslogan · 2024-11-15T18:19:16Z

@nmdefries sorry for all the noise here; I tried to implement just 1 time utility from #494 but forgot it actually relied on the concept of a "unit" time delta/step; fixed that up and added the unit time step helper. Think that's fixed now. Also fixed some CHECK issues.

The core changes, and now these helpers, should hopefully be ready now.

R/slide.R

nmdefries · 2024-11-21T22:27:13Z

R/slide.R

+#' @param .prefix Optional [`glue::glue`] format string; name the slide result
+#'   column(s) by attaching this prefix to the corresponding input column(s).
+#'   Some shorthand is supported for basing the output names on `.window_size`
+#'   or other arguments; see "Prefix and suffix shorthand" below.
+#' @param .suffix Optional [`glue::glue`] format string; like `.prefix`. The
+#'   default naming behavior is equivalent to `.suffix =
+#'   "_{.n}{.time_unit_abbr}{.align_abbr}{.f_abbr}"`. Can be used in combination
+#'   with `.prefix`.
+#' @param .new_col_names Optional character vector with length matching the
+#'   number of input columns from `.col_names`; name the slide result column(s)
+#'   with these names. Cannot be used in combination with `.prefix` and/or
+#'   `.suffix`.
+#'
+#' @section Prefix and suffix shorthand:
+#'
+#' [`glue::glue`] format strings specially interpret content within curly
+#' braces. E.g., `glue::glue("ABC{2 + 2}")` evaluates to `"ABC4"`. For `.prefix`
+#' and `.suffix`, we provide `glue` with some additional variable bindings:
+#'
+#' - `{.n}` will be the number of time steps in the computation
+#'    corresponding to the `.window_size`.
+#' - `{.time_unit_abbr}` will be a lower-case letter corresponding to the
+#'     `time_type` of `.x`
+#' - `{.align_abbr}` will be `""` if `.align` is the default of `"right"`;
+#'    otherwise, it will be the first letter of `.align`
+#' - `{.f_abbr}` will be a character vector containing a short abbreviation
+#'    for `.f` factoring in the input column type(s) for `.col_names`


todo: please move this section to man-roxygen/basic-slide-params.R so it can be included here and for the epi_slide_[mean/opt] docs too.

man-roxygen/basic-slide-params.R also has grouping/ungrouping language that might need to be updated.

This can't be in basic-slide-params.R, because it's not shared by epi_slide. It's already shared by {opt,mean,sum} because they are all in one topic. I'm not adding .prefix and .suffix to epi[x]_slide (yet at least), since it's actually a bit distinct: epi_slide_{opt,mean,sum} are applying these to input columns to name associated output columns, while epi[x]_slide would be applying these to names of columns generated by the user computation. #534 suggests this feature might be appreciated though.

Because {opt,mean,sum} are all in one topic, the examples are a bit redundant + the mean & sum examples are smashed next to each other without spacing/separation.

todo: fix smashing or just put all the examples in epi_slide_opt and remove redundant ones.

todo: update basic-slide-params.R grouping language

basic-slide-params.R already had the appropriate language regarding groupedness of .x (that was probably a doc bug before this PR, but this PR makes it true). I did update it to mention groupedness/ungroupedness of the output.

I reworked the examples, deleting some, reordering others, tweaking discussion, maybe added another. Could benefit from a sanity check.

R/slide.R

nmdefries · 2024-11-21T23:48:02Z

R/slide.R

-        )
-      }
-
+      # Grouping should ensure that we don't have duplicate time values.


question: How does grouping ensure lack of duplicates?

It's not clear to me that this is true (or at least, we're handling duplicates differently in different places, e.g. epi_slide does check for duplicated time_values, despite now grouping by the same keys as ...opt does).

I know we check for duplicate keys in as_epi_df.tbl_df, but I don't see it anywhere else for epi_dfs. dplyr_reconstruct.epi_df checks for duplicated column names, but not anything else. Notably, new_epi_df doesn't check for duplicates. It is described as being a "lower-level constructor", but I think a user could conceivably find and use it, and end up with an incorrect result here.

Maybe the issue is just that new_epi_df is exported when it needn't be?

Ooh, right, it currently doesn't completely ensure it. Dmitry tested out placing a check in dplyr_reconstruct but it was too slow for how often it runs. I've looked into a couple of ways to address this:

Fixing our dplyr_extending usage which is pretty wonky, so we wouldn't run checks as often. (experimentation here, incomplete)

Making the check faster (Speed up duplicate detection in as_epi_df() #560)
I didn't complete those, so

todo: add back duplicated time values check for now

new_epi_df shouldn't validate something complicated like this if we try to follow Hadley recommendations. Related: #312.

I'm not sure about not exporting new_epi_df; #94 suggests it's needed downstream, but I don't actually see it anywhere in epipredict right now. We should probably at least provide a helper epi_df that acts like tibble, which would be convenient and make new_epi_df not stand out as much. Filed off into #572.

tests/testthat/test-epi_slide.R

Co-authored-by: nmdefries <[email protected]>

refactor(epi_slide_opt): simplify tidyselect

ac12ec1

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch 4 times, most recently from 7a8ec75 to 85503c1 Compare November 8, 2024 19:35

Fix typo + adjust naming of sliding median example

c75de38

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch 5 times, most recently from 1627870 to e6f7264 Compare November 12, 2024 02:18

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from e6f7264 to 6a22c4b Compare November 12, 2024 19:55

feat(epi_slide_opt)!: add .prefix =, .suffix =, .new_col_names =

fab1c08

- BREAKING CHANGE: adjust default output column naming scheme, disallow overwriting columns.

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch 2 times, most recently from 882e960 to 9669773 Compare November 12, 2024 20:59

Use different automatic names for slides on logical columns

79ed2f3

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from fdc9ee7 to 79ed2f3 Compare November 12, 2024 22:42

Update epi_slide_{sum,mean} examples w/ naming options, cleanup

05a0ca8

- Show naming options, including with multi-column selections when we have accommodating example data sets - Select away the pre-existing 7d aggregations in the example data set - Ungroup output

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from 845f4c0 to beb434c Compare November 14, 2024 20:51

Tweak&fix epi_slide docs regarding packing, nesting

5ffaf2f

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from 45170b1 to f9a8356 Compare November 14, 2024 21:29

brookslogan added 2 commits November 14, 2024 13:43

Make epi_slide* all use autogrouping + make autogrouping temporary

e099fd0

Update NEWS.md with slide naming & autogrouping changes, bugfix

6d8bcf6

brookslogan force-pushed the lcb/epi_slide_opt-output-renaming branch from ed23df0 to 6d8bcf6 Compare November 14, 2024 21:43

brookslogan marked this pull request as ready for review November 14, 2024 21:44

brookslogan changed the title ~~Lcb/epi slide opt output renaming~~ Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect bug Nov 14, 2024

brookslogan changed the title ~~Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect bug~~ Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect issue Nov 14, 2024

nmdefries self-requested a review November 14, 2024 21:58

docs(time_delta_to_n_steps): copyediting

03155dd

brookslogan added 5 commits November 14, 2024 14:05

docs(time_delta_to_n_steps): fix inaccurate equivalency statement

cac02c2

Remove invalid link to undocumented helper function

9cf5229

Address CHECK lints

632c8d9

Add unit_time_delta to make time_delta_to_n_steps make sense

ad15b8f

Don't export unit_time_delta (yet)

2752903

We may later need it to make default args, in which case we should probably export to be less "magical".

nmdefries reviewed Nov 22, 2024

View reviewed changes

Address CHECKS+workflow: don't document epidatasets re-exports

ed9524e

brookslogan mentioned this pull request Nov 22, 2024

Simplify workflow for updating documentation cmu-delphi/epidatasets#14

Open

brookslogan and others added 8 commits November 22, 2024 11:02

Fix error class name that referred to different function

5cc0d15

Co-authored-by: nmdefries <[email protected]>

Fix partial rename: aggregate_epi_df -> sum_groups_epi_df in messages

9d64bda

Describe output grouping in basic-slide-params.R

5c68195

docs: roughly describe grouping of epix_slide output

d369527

Rework epi_slide_{opt,mean,sum} examples as they are 1 help topic

6191b61

feat(epi_slide_opt): guard against multiple .f matches

6ab73d5

Improve slide output packing example comments

c78bcca

Co-authored-by: nmdefries <[email protected]>

docs: document (GHA)

53505d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust `epi_slide_opt` output naming; adjust `epi_slide*` autogrouping; fix tidyselect issue #564

Adjust `epi_slide_opt` output naming; adjust `epi_slide*` autogrouping; fix tidyselect issue #564

brookslogan commented Nov 8, 2024 •

edited

Loading

brookslogan commented Nov 12, 2024 •

edited

Loading

brookslogan commented Nov 15, 2024

nmdefries Nov 21, 2024

brookslogan Nov 22, 2024 •

edited

Loading

brookslogan Nov 23, 2024

nmdefries Nov 21, 2024

brookslogan Nov 22, 2024 •

edited

Loading

Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect issue #564

Are you sure you want to change the base?

Adjust epi_slide_opt output naming; adjust epi_slide* autogrouping; fix tidyselect issue #564

Conversation

brookslogan commented Nov 8, 2024 • edited Loading

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

brookslogan commented Nov 12, 2024 • edited Loading

brookslogan commented Nov 15, 2024

nmdefries Nov 21, 2024

Choose a reason for hiding this comment

brookslogan Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

brookslogan Nov 23, 2024

Choose a reason for hiding this comment

nmdefries Nov 21, 2024

Choose a reason for hiding this comment

brookslogan Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Adjust `epi_slide_opt` output naming; adjust `epi_slide*` autogrouping; fix tidyselect issue #564

Adjust `epi_slide_opt` output naming; adjust `epi_slide*` autogrouping; fix tidyselect issue #564

brookslogan commented Nov 8, 2024 •

edited

Loading

brookslogan commented Nov 12, 2024 •

edited

Loading

brookslogan Nov 22, 2024 •

edited

Loading

brookslogan Nov 22, 2024 •

edited

Loading