Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix (cutData.R): don't overwrite existing "year" columns w/ yearseason #406

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jack-davison
Copy link
Collaborator

@jack-davison jack-davison commented Jan 22, 2025

This PR fixes #404. cc @mooibroekd.

In short, when using cutData() w/ "yearseason" openair would overwrite existing year columns.

I've renamed the interim columns to be openair__year and openair__month, which are then deleted before the data is returned.

Using interim columns feels dodgy; a safer bet would be working on the vectors themselves I think. But this fix should work for the time being.

I note the function does leave a floating season column in the output which the user has not requested, but I don't want to remove it too in case the user already has a season column.

There may be broader improvements needed to how cutData() works in future - it can be very destructive to the user's own data if they're not careful.

devtools::load_all()
#> ℹ Loading openair

# create some date data for illustration
test <- data.frame(date = seq.Date(
  lubridate::date("2022-01-01"),
  lubridate::date("2022-12-31"),
  by = 1
))

# cut the data into years
test <- cutData(test, type = "year")

# check if we have any dates that are associated with 2023
test |>
  dplyr::filter(year == 2023)
#> [1] date year
#> <0 rows> (or 0-length row.names)

# now cut the data to yearseason
test <- cutData(test, type = "yearseason")

# perform the test again:
test |> filter(year == 2023)
#> [1] date       year       season     yearseason
#> <0 rows> (or 0-length row.names)

Created on 2025-01-22 with reprex v2.1.1

@davidcarslaw
Copy link
Collaborator

I agree on cutData being too destructive and that isn't good. I think this is probably more of a concern with in-built types such as "year", "wd". However, even for any columns this is the case e.g. for "ws". I think a better solution would be to have a system that a new column is returned with a suffix e.g. ws_cut or ws_cat? That would make the outcome predictable for all variables - what do you think? It would require some different handling in openair functions that use cutData but that would be OK...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cutData can overwrite earlier created columns with unexpected data
2 participants