Skip to content

Commit

Permalink
Merge March 24 update into Master (#933)
Browse files Browse the repository at this point in the history
* Remove redundant code

* Update documentation

* Style code

* Reorder when we match on client variables
This was causing NSUs to show a social care id. This now resolves this.

* Update documentation

* Style code

* Revert "Update logic to use end of Quarter"

This reverts commit 004e831.

* Style code

* Update documentation

* add check comment (TO DO for this PR)

* Remove `check_quarter_format` function

* Remove `check_quarter_format`

* Add chi parameter to `create_demog_test_flags`

* Style code

* Use CHI parameter for ep/indiv tests

* Use CHI parameter for extract tests (chi)

* Change test sheet names to lowercase

* Change date to lowercase

* Update documentation

* Update documentation

* Update documentation

* Style code

* Fix pick variables
This was not taking the correct variables, leading to NSUs being assigned psychiatry

* SC Demographics and SDS (#900)

* Style code

* # read in sc demographics

different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun.

* social care demographics first draft

removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes:
- Fill in missing values,
- create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates
- change postcode to choose chi over submitted

* Style code

* had a github error? Not sure what happened but commiting first draft of sc demographics

* Style code

* first draft sds.
No major changes - only how demographics is matched on and how latest social care id is selected

* Update documentation

* demographics - add sending location to group by

* Style code

* Update documentation

* Added ungroup()

* Remove comments

* Remove comments

* Style code

---------

Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>

* Sc all at speedup (#904)

* speed up process_sc_all_alarms_telecare function with data.table package

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>

* Add case_when statement for `high_cc` cohort

* Bug - `high_cc` in demographic cohort showing `NAs` instead of `TRUE/FALSE` (#911)

Add case_when statement for `high_cc` cohort

* added a casewhen to update property type description for homelessness

* Update documentation

* Style code

* Bug - deal with missing variables (#914)

* Add missing sc variables for no sc data

* Fix code for including `_inc_dna` variables

* Remove commented line

* Bug - Fix get pop path failing and preventing the indiv file from running.  (#913)

Fix bug - pop file paths breaking indiv file

* correct file hscp file path

* Update process_sc_all_home_care.R

A small issue was identified when running targets. Linked with changes to the function `fix_sc_end_dates()`

* Update process_sc_all_alarms_telecare.R

* remove duplicate columns

* Fix targets (#892)

* fix sc_client_lookup sc_send_lca

* fix an issue of get_pop_path

* Style code

* fix the rest of get_pop_path from get_datazone_pop_path

* Update documentation

* fix sc_send_lca

* add missing year column

* explicitly specify the argument year to avoid corruption of targets

* Update documentation

* new data pipeline with targets
remove create_individual_files from targets and append it to run_targets script

* minor changes

* Style code

* undo sc_send_lca bit

* Update targets scripts

* Remove top level targets scripts

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Jennifer Thom <[email protected]>

* remove cases that start date is later than end date

* Update Refs for March24 SLF update

* 758 investigate extracts to identify areas of code which can be cut down for processing times (#899)

* re-writing process_sc_all sds and alarm_telecare with data.table to improve the speed

* Update documentation

* Style code

* changes in line with new process_sc_all_sds dplyr version

* Style code

* remove duplicate columns

* remove duplicated columns

---------

Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>

* Update homelessness completeness path

* Update check_year_valid function

* 920 issues with file permissions need constant monitoring (#921)

* set a correct file permission

* update descriptions in process_tests function

* Update documentation

---------

Co-authored-by: lizihao-anu <[email protected]>

* change joining with sc_demog_lookup to right_join and move person_id down

* Archive social care extracts (#927)

* Set up `get_sandpit_extract_path`

* Update documentation

* Update sc `all` data paths

* Write sandpit extract if file does not exist

* Style code

---------

Co-authored-by: Jennit07 <[email protected]>

* Update excel sg completeness tabs

---------

Co-authored-by: Jennit07 <[email protected]>
Co-authored-by: Megan McNicol <[email protected]>
Co-authored-by: SwiftySalmon <[email protected]>
Co-authored-by: marjom02 <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
Co-authored-by: lizihao-anu <[email protected]>
Co-authored-by: rchlv <[email protected]>
Co-authored-by: Zihao Li <[email protected]>
  • Loading branch information
9 people authored Mar 26, 2024
1 parent 70c4382 commit a570e6a
Show file tree
Hide file tree
Showing 114 changed files with 805 additions and 496 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -73,4 +73,5 @@ Encoding: UTF-8
Language: en-GB
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1

1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ export(get_nsu_path)
export(get_pop_path)
export(get_practice_details_path)
export(get_readcode_lookup_path)
export(get_sandpit_extract_path)
export(get_sc_at_episodes_path)
export(get_sc_ch_episodes_path)
export(get_sc_client_lookup_path)
Expand Down
4 changes: 2 additions & 2 deletions R/00-update_refs.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#'
#' @family initialisation
latest_update <- function() {
"Dec_2023"
"Mar_2024"
}

#' Previous update
Expand Down Expand Up @@ -61,7 +61,7 @@ previous_update <- function(months_ago = 3L, override = NULL) {
#'
#' @family initialisation
get_dd_period <- function() {
"Jul16_Sep23"
"Jul16_Dec23"
}

#' The latest financial year for Cost uplift setting
Expand Down
2 changes: 1 addition & 1 deletion R/add_hri_variables.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ add_hri_variables <- function(
"mh_episodes",
"gls_episodes",
"op_newcons_attendances",
# op_newcons_dnas,
"op_newcons_dnas",
"ae_attendances",
"pis_paid_items",
"ooh_cases"
Expand Down
2 changes: 1 addition & 1 deletion R/add_keep_population_flag.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ add_keep_population_flag <- function(individual_file, year) {
} else {
## Obtain the population estimates for Locality AgeGroup and Gender.
pop_estimates <-
readr::read_rds(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) %>%
readr::read_rds(get_pop_path(type = "datazone")) %>%
dplyr::select(year, datazone2011, sex, age0:age90plus)

# Step 1: Obtain the population estimates for Locality, AgeGroup, and Gender
Expand Down
7 changes: 4 additions & 3 deletions R/aggregate_by_chi.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#' @importFrom data.table .SD
#'
#' @inheritParams create_individual_file
aggregate_by_chi <- function(episode_file, exclude_sc_var = FALSE) {
aggregate_by_chi <- function(episode_file, year, exclude_sc_var = FALSE) {
cli::cli_alert_info("Aggregate by CHI function started at {Sys.time()}")

# Convert to data.table
Expand Down Expand Up @@ -89,6 +89,7 @@ aggregate_by_chi <- function(episode_file, exclude_sc_var = FALSE) {
"episodes",
"beddays",
"cost",
"_dnas",
"attendances",
"attend",
"contacts",
Expand All @@ -109,8 +110,7 @@ aggregate_by_chi <- function(episode_file, exclude_sc_var = FALSE) {
vars_start_with(
episode_file,
"sds_option"
),
"health_net_cost_inc_dnas"
)
)
cols4 <- cols4[!(cols4 %in% "ch_cis_episodes")]
if (exclude_sc_var) {
Expand Down Expand Up @@ -187,6 +187,7 @@ aggregate_by_chi <- function(episode_file, exclude_sc_var = FALSE) {
individual_file_cols5[, chi := NULL],
individual_file_cols6[, chi := NULL]
)
individual_file <- individual_file[, year := year]

# convert back to tibble
return(dplyr::as_tibble(individual_file))
Expand Down
4 changes: 1 addition & 3 deletions R/calculate_stay.R
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@ calculate_stay <- function(year, start_date, end_date, sc_qtr = NULL) {
if (anyNA(sc_qtr)) {
cli::cli_abort("Some of the submitted quarters are missing")
}
# else {
# sc_qtr <- check_quarter_format(sc_qtr)
# }

# Set Quarters
qtr_end <- lubridate::add_with_rollback(
Expand All @@ -51,6 +48,7 @@ calculate_stay <- function(year, start_date, end_date, sc_qtr = NULL) {
lubridate::period(1L, "days")
)

# check logic here for care home methodology
dummy_end_date <- dplyr::case_when(
# If end_date is not missing use the end date
!is.na(end_date) ~ end_date,
Expand Down
2 changes: 1 addition & 1 deletion R/check_year_valid.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ check_year_valid <- function(
return(FALSE)
} else if (year >= "2425" && type %in% "sparra") {
return(FALSE)
} else if (year >= "2324" && type %in% c("ch", "hc", "sds", "at")) {
} else if (year >= "2425" && type %in% c("ch", "hc", "sds", "at")) {
return(FALSE)
}

Expand Down
10 changes: 5 additions & 5 deletions R/create_demog_test_flags.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@
#' @description Create the demographic flags for testing
#'
#' @param data a dataframe containing demographic variables e.g. chi
#' @param chi Specify chi or anon_chi.
#'
#' @return a dataframe with flag (1 or 0) for each demographic variable.
#' Missing value flag from [is_missing()]
#'
#' @family flag functions
create_demog_test_flags <- function(data) {
create_demog_test_flags <- function(data, chi = c(chi, anon_chi)) {
data %>%
dplyr::arrange(.data$chi) %>%
dplyr::arrange({{ chi }}) %>%
# create test flags
dplyr::mutate(
valid_chi = phsmethods::chi_check(.data$chi) == "Valid CHI",
unique_chi = dplyr::lag(.data$chi) != .data$chi,
n_missing_chi = is_missing(.data$chi),
unique_chi = dplyr::lag({{ chi }}) != {{ chi }},
n_missing_chi = is_missing({{ chi }}),
n_males = .data$gender == 1L,
n_females = .data$gender == 2L,
n_postcode = !is.na(.data$postcode) | !.data$postcode == "",
Expand Down
11 changes: 7 additions & 4 deletions R/create_demographic_lookup.R
Original file line number Diff line number Diff line change
Expand Up @@ -344,18 +344,21 @@ assign_d_cohort_high_cc <- function(dementia,
liver,
cancer,
spec) {
high_cc <-
high_cc <- dplyr::case_when(
spec == "G5" ~ TRUE,
# FOR FUTURE: PhysicalandSensoryDisabilityClientGroup or LearningDisabilityClientGroup = "Y",
# then high_cc_cohort = TRUE
# FOR FUTURE: Care home removed, here's the code: .data$recid = "CH" & age < 65
rowSums(dplyr::pick(c(
(rowSums(dplyr::pick(c(
"dementia",
"hefailure",
"refailure",
"liver",
"cancer"
)), na.rm = TRUE) >= 1L |
spec == "G5"
)), na.rm = TRUE) >= 1L) ~ TRUE,
.default = FALSE
)

return(high_cc)
}

Expand Down
10 changes: 7 additions & 3 deletions R/create_episode_file.R
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ create_episode_file <- function(
"mar_beddays"
)
) %>%
# match on sc client variables
join_sc_client(year, sc_client = sc_client, file_type = "episode") %>%
# Check chi is valid using phsmethods function
# If the CHI is invalid for whatever reason, set the CHI to NA
dplyr::mutate(
Expand Down Expand Up @@ -135,15 +137,15 @@ create_episode_file <- function(
year,
slf_deaths_lookup
) %>%
join_sc_client(year, sc_client = sc_client, file_type = "episode") %>%
load_ep_file_vars(year)

if (!check_year_valid(year, type = c("ch", "hc", "at", "sds"))) {
episode_file <- episode_file %>%
dplyr::mutate(
ch_chi_cis = NA,
sc_id_cis = NA,
ch_sc_id_cis = NA,
ch_name = NA,
ch_postcode = NA,
ch_adm_reason = NA,
ch_provider = NA,
ch_nursing = NA,
Expand All @@ -158,7 +160,9 @@ create_episode_file <- function(
hc_cost_q4 = NA,
hc_provider = NA,
hc_reablement = NA,
sds_option_4 = NA,
person_id = NA,
sc_latest_submission = NA,
sc_send_lca = NA,
sc_living_alone = NA,
sc_support_from_unpaid_carer = NA,
sc_social_worker = NA,
Expand Down
12 changes: 7 additions & 5 deletions R/create_individual_file.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ create_individual_file <- function(
))) %>%
remove_blank_chi() %>%
add_cij_columns() %>%
add_all_columns()
add_all_columns(year = year)

if (!check_year_valid(year, type = c("ch", "hc", "at", "sds"))) {
individual_file <- individual_file %>%
Expand All @@ -82,7 +82,7 @@ create_individual_file <- function(
individual_file <- individual_file %>%
aggregate_ch_episodes() %>%
clean_up_ch(year) %>%
aggregate_by_chi(exclude_sc_var = FALSE)
aggregate_by_chi(year = year, exclude_sc_var = FALSE)
}

individual_file <- individual_file %>%
Expand Down Expand Up @@ -202,7 +202,7 @@ add_cij_columns <- function(episode_file) {
#' of prefixed column names created based on some condition.
#' @family individual_file
#' @inheritParams create_individual_file
add_all_columns <- function(episode_file) {
add_all_columns <- function(episode_file, year) {
cli::cli_alert_info("Add all columns function started at {Sys.time()}")

episode_file <- episode_file %>%
Expand Down Expand Up @@ -483,8 +483,10 @@ add_ch_columns <- function(episode_file, prefix, condition) {
ch_ep_end = dplyr::if_else(
eval(condition),
.data$record_keydate2,
lubridate::NA_Date_ ),
# If end date is missing use the first day of next FY quarter
lubridate::NA_Date_
),
# check logic here for care home methodology
# If end date is missing use the end of the FY quarter
ch_ep_end = dplyr::if_else(
eval(condition) & is.na(.data$ch_ep_end),
start_next_fy_quarter(.data$sc_latest_submission),
Expand Down
8 changes: 7 additions & 1 deletion R/create_service_use_lookup.R
Original file line number Diff line number Diff line change
Expand Up @@ -908,7 +908,13 @@ assign_cohort_names <- function(data) {
# Situation where no cost is greater than another,
# so the maximum is the same as the mean
.data$cost_max == rowSums(
dplyr::pick("psychiatry_cost":"residential_care_cost")
dplyr::pick(c(
"psychiatry_cost", "maternity_cost", "geriatric_cost",
"elective_inpatient_cost", "limited_daycases_cost",
"routine_daycase_cost", "single_emergency_cost",
"multiple_emergency_cost", "prescribing_cost",
"outpatient_cost", "ae2_cost", "residential_care_cost"
))
) / 12.0 ~ "Unassigned",
.data$cost_max == .data$psychiatry_cost ~ "Psychiatry",
.data$cost_max == .data$maternity_cost ~ "Maternity",
Expand Down
8 changes: 4 additions & 4 deletions R/fix_sc_dates.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' @return A date vector with replaced end dates
fix_sc_start_dates <- function(start_date, period_start) {
# Fix sds_start_date is missing by setting start_date to be the start of
# financial year
# financial period
start_date <- dplyr::if_else(
is.na(start_date),
period_start,
Expand All @@ -30,12 +30,12 @@ fix_sc_start_dates <- function(start_date, period_start) {
#' @param period Social care latest submission period.
#'
#' @return A date vector with replaced end dates
fix_sc_end_dates <- function(start_date, end_date, period) {
fix_sc_end_dates <- function(start_date, end_date, period_end_date) {
# Fix sds_end_date is earlier than sds_start_date by setting end_date to be
# the end of financial year
end_date <- dplyr::if_else(
start_date > end_date,
end_fy(year = stringr::str_sub(period, 1L, 4L), "alternate"),
period_end_date,
end_date
)

Expand All @@ -57,7 +57,7 @@ fix_sc_end_dates <- function(start_date, end_date, period) {
#' @return A date vector with replaced end dates
fix_sc_missing_end_dates <- function(end_date, period_end) {
# Fix sds_end_date is earlier than sds_start_date by setting end_date to be
# the end of financial year
# the end of financial period
end_date <- dplyr::if_else(
is.na(end_date),
period_end,
Expand Down
33 changes: 0 additions & 33 deletions R/get_fy_quarter_dates.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@
start_fy_quarter <- function(quarter) {
quarter_unique <- unique(quarter)

#check_quarter_format(quarter)

cal_quarter_date_unique <- lubridate::yq(quarter_unique)

fy_quarter_date_unique <- lubridate::add_with_rollback(
Expand Down Expand Up @@ -47,8 +45,6 @@ start_fy_quarter <- function(quarter) {
end_fy_quarter <- function(quarter) {
quarter_unique <- unique(quarter)

#check_quarter_format(quarter)

cal_quarter_date_unique <- lubridate::yq(quarter_unique)

fy_quarter_date_unique <- lubridate::add_with_rollback(
Expand Down Expand Up @@ -80,8 +76,6 @@ end_fy_quarter <- function(quarter) {
start_next_fy_quarter <- function(quarter) {
quarter_unique <- unique(quarter)

#check_quarter_format(quarter)

cal_quarter_date_unique <- lubridate::yq(quarter_unique)

fy_quarter_date_unique <- lubridate::add_with_rollback(
Expand Down Expand Up @@ -112,8 +106,6 @@ start_next_fy_quarter <- function(quarter) {
end_next_fy_quarter <- function(quarter) {
quarter_unique <- unique(quarter)

#check_quarter_format(quarter)

cal_quarter_date_unique <- lubridate::yq(quarter_unique)

fy_quarter_date_unique <- lubridate::add_with_rollback(
Expand All @@ -128,28 +120,3 @@ end_next_fy_quarter <- function(quarter) {

return(end_next_fy_quarter)
}

#' Check quarter format
#'
#' @inheritParams start_fy_quarter
#'
#' @return `quarter` invisibly if no issues were found
#'
#' @family date functions
# check_quarter_format <- function(quarter) {
# stopifnot(typeof(quarter) == "character")
#
# if (any(
# stringr::str_detect(quarter, "^\\d{4}Q[1-4]$", negate = TRUE),
# na.rm = TRUE
# )) {
# cli::cli_abort(
# c("{.var quarter} must be in the format {.val YYYYQx}
# where {.val x} is the quarter number.",
# "v" = "For example {.val 2019Q1}."
# )
# )
# }
#
# return(invisible(quarter))
# }
2 changes: 1 addition & 1 deletion R/get_lookup_paths.R
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ get_pop_path <- function(file_name = NULL,
"intzone" ~ stringr::str_glue("IntZone_pop_est_2011_\\d+?\\.{ext}")
)

datazone_pop_path <- get_file_path(
pop_path <- get_file_path(
directory = pop_dir,
file_name = file_name,
ext = ext,
Expand Down
Loading

0 comments on commit a570e6a

Please sign in to comment.