Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new functions #54

Open
Nic-Chr opened this issue Apr 19, 2021 · 7 comments
Open

Adding new functions #54

Nic-Chr opened this issue Apr 19, 2021 · 7 comments

Comments

@Nic-Chr
Copy link
Contributor

Nic-Chr commented Apr 19, 2021

Is your feature request related to a problem? Please describe.
No problems!

Describe the solution you'd like
I have quite a few helper functions that I think might be useful to phsmethods, happy to discuss in more detail.

@rmccreath
Copy link
Member

A lot of really great work here and a lot that can be used in phsmethods, thanks for getting in touch @Nic-Chr
Some general comments before I take each function in turn:

  • There should probably be more error handling and testing for all of the functions.
  • The work on documentation/vignettes is really good to see but it might be worth adding a little more to these if the functions are going to be available in a general package.
  • It seems like a lot of work (so I'm happy to help) but I'm keen for each function to be added separately, potentially with its own issue raised.

No real comment required - good to go:

  • add_thousands_sep
  • format_perc
  • grand_total_row
  • missing_dates
  • nlr
  • npv
  • nth_elements
  • plr
  • ppv
  • prevalence_threshold
  • replace_nan 👵

Query or small change required - hold:

  • add_worksheet_with_table - not sure how often something like this would be required across PHS. Maybe the total row could be made optional or other functionality added around creating tables in Excel?
  • detach_all_packages - if using R (projects) properly, is there a need for using this?
  • df_count - this could be useful, just think the name could be more meaningful?
  • extract_files - I can see this being really useful but just want to review to ensure it can be generalised across PHS.
  • gcd - could be useful to have some limits/error handling around this? As it's recursive, memory issues could occur as the size of the vectors increase.
  • lcm - uses gcd so same comment as above
  • str_extract_ca - any scope to be more robust, e.g. casting to lower-case for matches?
  • str_extract_country - any scope to be more robust, e.g. casting to lower-case for matches?
  • str_extract_date - any scope to be more robust, e.g. dates with non-numeric elements?
  • str_extract_hb - any scope to be more robust, e.g. casting to lower-case for matches?
  • sub_lab_code_to_name - I'm being quite petty here, but could the function name be clearer?
  • test_characteristics - very specific output, would need to review usability across PHS.
  • ts_df - again, more around the usability across PHS.

These are all related to the match_area function and, while the functionality for these functions doesn't exist, some work to integrate may be required (e.g. changing the name of the functions):

  • ca_name_to_code
  • hb_name_to_code
  • pc_to_ca - uses lookup on Freddy, could this be stored on the package?
  • pc_to_hb - uses lookup on Freddy, could this be stored on the package?
  • pc_to_intzone - uses lookup on Freddy, could this be stored on the package?

Not needed/right for this package - stop:

  • age_to_band - functionality already exists in age_group function
  • ca_code_to_name - functionality already exists in match_area function
  • hb_code_to_name - functionality already exists in match_area function

Wider impact on phsmethods and future thinking

  • Some of these functions are simplistic in how they would be done, e.g. nth_elements or even nlr. However, this is a way we can standardise the approach to these problems and where they are common in our R programs, improve readability.
  • Adding a volume of new functions will also require some reconfiguration of the package documentation, e.g. splitting the overview of functions in the README into sections (data management, statistical)?
  • Similarly, the phsmethods training app will need to be updated.
  • Work to generalise the package for use outwith NHS Scotland and hosting on CRAN is underway. Does this impact on any of that?

I'll let @davidc92 come in on this now and we can make a plan of action! 🥳

@davidc92
Copy link
Contributor

Before I wade in, Nick, can you please open a separate issue for each function with a short description of what it is intended to do? It is not immediately clear from a lot of the names :) With separate issues we can manage their implementation properly. Let me know once they have all been opened and I will review them.

D

@Nic-Chr
Copy link
Contributor Author

Nic-Chr commented May 22, 2021

Hi Russell and David, thanks for the feedback, I've begun opening issues for each function separately.
I agree with you, the functionality does already exist for both ca_code_to_name and hb_code_to_name so may not be needed for the package but I do think there are a few additional features of age_to_band that build upon phsmethods::age_group.
One of the main added functionalities is that you can specify irregularly spaced age breaks. Another is the ability to pass arguments to factor, controlling things like factor labels, order and excluded levels (see the below example).

mylabels <- c("0 to 10", "11 to 19", "20 to 39", "40 to 59", "60 to 79", "80 and older")
age_to_band(1:100, age_breaks = c(0, 11, 20, 40, 60, 80), stringsAsFactors = TRUE, labels = mylabels, ordered = TRUE)

Maybe not a critical improvement but happy to hear your thoughts!

@Moohan
Copy link
Member

Moohan commented May 24, 2021

I think non-even age breaks would be a great addition. This is something I and others on my team do regularly. I think the challenge will be to implement it without changing the existing functionality and keeping the API simple... lots of tests needed!

@davidc92
Copy link
Contributor

Very happy for age_group to be amended to allow irregular bandings :)

@r6lm
Copy link

r6lm commented Apr 19, 2023

I created an issue for allowing custom bin lengths for the create_age_groups function here. As the discussion didn't continue on this thread, it would be great if we could continue it there – #93 (comment).

@Nic-Chr
Copy link
Contributor Author

Nic-Chr commented Mar 19, 2024

Thanks @davidc92 and agree with @r6lm in that updating age_group to accept custom breaks would be very useful.

I have a method here age_band.R which I have tried to make user friendly by creating a list of common age groupings and using one of these as the default breaks argument.

For example, the default breaks are set to age_breaks$by_20_to_80 which corresponds to the breakpoint vector: 0 20 40 60 80
The age breaks vector could look something like this and would be exported and visible to the user:

#' @export
age_breaks <- list("by_5_to_80" = seq.int(from = 0L, to = 80L, by = 5L),
                   "by_10_to_80" = seq.int(from = 0L, to = 80L, by = 10L),
                   "by_20_to_80" = seq.int(from = 0L, to = 80L, by = 20L),
                   "by_5_to_90" = seq.int(from = 0L, to = 90L, by = 5L),
                   "by_10_to_90" = seq.int(from = 0L, to = 90L, by = 10L),
                   "by_20_to_90" = seq.int(from = 0L, to = 90L, by = 20L))
> age_breaks
$by_5_to_80
 [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

$by_10_to_80
[1]  0 10 20 30 40 50 60 70 80

$by_20_to_80
[1]  0 20 40 60 80

$by_5_to_90
 [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

$by_10_to_90
 [1]  0 10 20 30 40 50 60 70 80 90

$by_20_to_90
[1]  0 20 40 60 80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants