Adding new functions #54

Nic-Chr · 2021-04-19T09:52:27Z

Is your feature request related to a problem? Please describe.
No problems!

Describe the solution you'd like
I have quite a few helper functions that I think might be useful to phsmethods, happy to discuss in more detail.

rmccreath · 2021-05-14T15:07:01Z

A lot of really great work here and a lot that can be used in phsmethods, thanks for getting in touch @Nic-Chr
Some general comments before I take each function in turn:

There should probably be more error handling and testing for all of the functions.
The work on documentation/vignettes is really good to see but it might be worth adding a little more to these if the functions are going to be available in a general package.
It seems like a lot of work (so I'm happy to help) but I'm keen for each function to be added separately, potentially with its own issue raised.

No real comment required - good to go:

add_thousands_sep
format_perc
grand_total_row
missing_dates
nlr
npv
nth_elements
plr
ppv
prevalence_threshold
replace_nan 👵

Query or small change required - hold:

add_worksheet_with_table - not sure how often something like this would be required across PHS. Maybe the total row could be made optional or other functionality added around creating tables in Excel?
detach_all_packages - if using R (projects) properly, is there a need for using this?
df_count - this could be useful, just think the name could be more meaningful?
extract_files - I can see this being really useful but just want to review to ensure it can be generalised across PHS.
gcd - could be useful to have some limits/error handling around this? As it's recursive, memory issues could occur as the size of the vectors increase.
lcm - uses gcd so same comment as above
str_extract_ca - any scope to be more robust, e.g. casting to lower-case for matches?
str_extract_country - any scope to be more robust, e.g. casting to lower-case for matches?
str_extract_date - any scope to be more robust, e.g. dates with non-numeric elements?
str_extract_hb - any scope to be more robust, e.g. casting to lower-case for matches?
sub_lab_code_to_name - I'm being quite petty here, but could the function name be clearer?
test_characteristics - very specific output, would need to review usability across PHS.
ts_df - again, more around the usability across PHS.

These are all related to the match_area function and, while the functionality for these functions doesn't exist, some work to integrate may be required (e.g. changing the name of the functions):

ca_name_to_code
hb_name_to_code
pc_to_ca - uses lookup on Freddy, could this be stored on the package?
pc_to_hb - uses lookup on Freddy, could this be stored on the package?
pc_to_intzone - uses lookup on Freddy, could this be stored on the package?

Not needed/right for this package - stop:

age_to_band - functionality already exists in age_group function
ca_code_to_name - functionality already exists in match_area function
hb_code_to_name - functionality already exists in match_area function

Wider impact on phsmethods and future thinking

Some of these functions are simplistic in how they would be done, e.g. nth_elements or even nlr. However, this is a way we can standardise the approach to these problems and where they are common in our R programs, improve readability.
Adding a volume of new functions will also require some reconfiguration of the package documentation, e.g. splitting the overview of functions in the README into sections (data management, statistical)?
Similarly, the phsmethods training app will need to be updated.
Work to generalise the package for use outwith NHS Scotland and hosting on CRAN is underway. Does this impact on any of that?

I'll let @davidc92 come in on this now and we can make a plan of action! 🥳

davidc92 · 2021-05-14T15:14:58Z

Before I wade in, Nick, can you please open a separate issue for each function with a short description of what it is intended to do? It is not immediately clear from a lot of the names :) With separate issues we can manage their implementation properly. Let me know once they have all been opened and I will review them.

D

Nic-Chr · 2021-05-22T17:58:20Z

Hi Russell and David, thanks for the feedback, I've begun opening issues for each function separately.
I agree with you, the functionality does already exist for both ca_code_to_name and hb_code_to_name so may not be needed for the package but I do think there are a few additional features of age_to_band that build upon phsmethods::age_group.
One of the main added functionalities is that you can specify irregularly spaced age breaks. Another is the ability to pass arguments to factor, controlling things like factor labels, order and excluded levels (see the below example).

mylabels <- c("0 to 10", "11 to 19", "20 to 39", "40 to 59", "60 to 79", "80 and older")
age_to_band(1:100, age_breaks = c(0, 11, 20, 40, 60, 80), stringsAsFactors = TRUE, labels = mylabels, ordered = TRUE)

Maybe not a critical improvement but happy to hear your thoughts!

Moohan · 2021-05-24T08:56:47Z

I think non-even age breaks would be a great addition. This is something I and others on my team do regularly. I think the challenge will be to implement it without changing the existing functionality and keeping the API simple... lots of tests needed!

davidc92 · 2021-05-24T12:42:34Z

Very happy for age_group to be amended to allow irregular bandings :)

r6lm · 2023-04-19T09:01:53Z

I created an issue for allowing custom bin lengths for the create_age_groups function here. As the discussion didn't continue on this thread, it would be great if we could continue it there – #93 (comment).

Nic-Chr · 2024-03-19T10:54:43Z

Thanks @davidc92 and agree with @r6lm in that updating age_group to accept custom breaks would be very useful.

I have a method here age_band.R which I have tried to make user friendly by creating a list of common age groupings and using one of these as the default breaks argument.

For example, the default breaks are set to age_breaks$by_20_to_80 which corresponds to the breakpoint vector: 0 20 40 60 80
The age breaks vector could look something like this and would be exported and visible to the user:

#' @export
age_breaks <- list("by_5_to_80" = seq.int(from = 0L, to = 80L, by = 5L),
                   "by_10_to_80" = seq.int(from = 0L, to = 80L, by = 10L),
                   "by_20_to_80" = seq.int(from = 0L, to = 80L, by = 20L),
                   "by_5_to_90" = seq.int(from = 0L, to = 90L, by = 5L),
                   "by_10_to_90" = seq.int(from = 0L, to = 90L, by = 10L),
                   "by_20_to_90" = seq.int(from = 0L, to = 90L, by = 20L))
> age_breaks
$by_5_to_80
 [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

$by_10_to_80
[1]  0 10 20 30 40 50 60 70 80

$by_20_to_80
[1]  0 20 40 60 80

$by_5_to_90
 [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

$by_10_to_90
 [1]  0 10 20 30 40 50 60 70 80 90

$by_20_to_90
[1]  0 20 40 60 80

Moohan mentioned this issue Jul 2, 2024

create_age_groups with bins of custom lengths #93

Open

Nic-Chr mentioned this issue Sep 19, 2024

Improve create_age_groups #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new functions #54

Adding new functions #54

Nic-Chr commented Apr 19, 2021

rmccreath commented May 14, 2021

davidc92 commented May 14, 2021

Nic-Chr commented May 22, 2021

Moohan commented May 24, 2021

davidc92 commented May 24, 2021

r6lm commented Apr 19, 2023

Nic-Chr commented Mar 19, 2024

Adding new functions #54

Adding new functions #54

Comments

Nic-Chr commented Apr 19, 2021

rmccreath commented May 14, 2021

No real comment required - good to go:

Query or small change required - hold:

Not needed/right for this package - stop:

Wider impact on phsmethods and future thinking

davidc92 commented May 14, 2021

Nic-Chr commented May 22, 2021

Moohan commented May 24, 2021

davidc92 commented May 24, 2021

r6lm commented Apr 19, 2023

Nic-Chr commented Mar 19, 2024