Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

December 2024 #1029

Merged
merged 41 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
65984dd
Move `cli` warning messages before `return()` (#1016)
Jennit07 Oct 14, 2024
d516ee2
Update check_year_valid.R (#1017)
lizihao-anu Oct 15, 2024
82a42b0
Remove person_id. Matched in later process
Jennit07 Sep 18, 2024
9fd8dfa
Remove redundant #TODO comments
Jennit07 Sep 18, 2024
898e33c
remove redundant #TODO comments
Jennit07 Sep 18, 2024
e61f1af
Update news - sep release date
Jennit07 Sep 18, 2024
0be1daa
Write temp data (#1014)
lizihao-anu Oct 16, 2024
74947ce
sequence writing tests to excel (#1013)
lizihao-anu Oct 16, 2024
7787668
Sc latest quarter (#1012)
lizihao-anu Oct 16, 2024
ffa3d0c
death join and distinct refined death (#1015)
lizihao-anu Oct 22, 2024
4107720
1018 moving dd hl1 (#1019)
lizihao-anu Oct 25, 2024
ab3f2b1
Update process_extract_ae.R (#1020)
lizihao-anu Oct 28, 2024
4c716ae
Organise pre processing scripts (#1023)
Jennit07 Nov 13, 2024
4a277fc
Clean test folder (#1021)
Jennit07 Nov 19, 2024
93f71be
Update homelessness completeness code (#1026)
Jennit07 Nov 19, 2024
75391f5
Update documentation
Jennit07 Nov 20, 2024
43a8828
update namespace
Jennit07 Nov 20, 2024
2e5fa00
Update references
Jennit07 Nov 20, 2024
97cb97d
update process_tests_sc_demographics
lizihao-anu Nov 25, 2024
97346f7
Update - write temp file on `create_episode_file`
Jennit07 Nov 25, 2024
63372ee
get_chi for data
lizihao-anu Nov 25, 2024
ed25916
Style code
lizihao-anu Nov 25, 2024
86aa29c
IT deaths changes
Jennit07 Nov 25, 2024
d791aa6
Style code
Jennit07 Nov 25, 2024
9e1b2ce
remove get_chi
Jennit07 Nov 25, 2024
7dcc78b
Specify year in episode file tests
Jennit07 Nov 26, 2024
56f0ffb
specify year in indiv tests
Jennit07 Nov 27, 2024
2d6b9b9
Update `check_year_valid`
Jennit07 Nov 27, 2024
d2d584b
Fix `ch_provider` new coding guidance
Jennit07 Nov 27, 2024
df5ec44
Add `full.names` parameter to `write_temp_data`
Jennit07 Nov 27, 2024
a99df69
Revert "Fix `ch_provider` new coding guidance"
lizihao-anu Nov 28, 2024
6ac721e
filter na episodes by filtering period
lizihao-anu Nov 28, 2024
814b885
Update running scripts indiv file
Jennit07 Nov 29, 2024
b875530
Update `check_year_valid`
Jennit07 Nov 29, 2024
6cceead
update end_date
lizihao-anu Dec 4, 2024
d330ddc
Update NEWS.md
lizihao-anu Dec 10, 2024
4502403
Update create_individual_file.R
lizihao-anu Dec 10, 2024
a12ead9
update test-check_year_valid
lizihao-anu Dec 10, 2024
6e18735
fix binding issue and remove redundance
lizihao-anu Dec 10, 2024
edb16fd
Style code
lizihao-anu Dec 10, 2024
5a46d4f
R cmd check over v4.1.2
lizihao-anu Dec 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
fail-fast: false
matrix:
r_version: ['4.0.2', '4.1.2', 'release']
r_version: ['4.1.2', 'release']

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
Expand Down
8 changes: 6 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Generated by roxygen2: do not edit by hand

export("%>%")
export(add_deceased_flag)
export(add_homelessness_date_flags)
export(add_homelessness_flag)
export(add_hri_variables)
export(add_nsu_cohort)
export(check_year_format)
export(clean_temp_data)
export(clean_up_free_text)
export(compute_mid_year_age)
export(convert_ca_to_lca)
Expand All @@ -21,10 +21,12 @@ export(create_episode_file)
export(create_homelessness_lookup)
export(create_individual_file)
export(create_service_use_cohorts)
export(end_date)
export(end_fy)
export(end_fy_quarter)
export(end_next_fy_quarter)
export(find_latest_file)
export(fy)
export(fy_interval)
export(get_boxi_extract_path)
export(get_ch_costs_path)
Expand Down Expand Up @@ -89,7 +91,6 @@ export(midpoint_fy)
export(next_fy)
export(phs_db_connection)
export(previous_update)
export(process_combined_deaths_lookup)
export(process_costs_ch_rmd)
export(process_costs_dn_rmd)
export(process_costs_gp_ooh_rmd)
Expand Down Expand Up @@ -156,6 +157,7 @@ export(produce_episode_file_tests)
export(produce_sc_sandpit_tests)
export(produce_source_extract_tests)
export(produce_test_comparison)
export(qtr)
export(read_dev_slf_file)
export(read_extract_acute)
export(read_extract_ae)
Expand All @@ -178,12 +180,14 @@ export(read_sc_all_alarms_telecare)
export(read_sc_all_care_home)
export(read_sc_all_home_care)
export(read_sc_all_sds)
export(read_temp_data)
export(rename_hscp)
export(setup_keyring)
export(start_fy)
export(start_fy_quarter)
export(start_next_fy_quarter)
export(write_file)
export(write_temp_data)
export(years_to_run)
importFrom(data.table,.N)
importFrom(data.table,.SD)
Expand Down
10 changes: 9 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
# September 2024 Update - Unreleased
# December 2024 Update - released 10-Dec-24
* 24/25 files have been updated, containing data up to September 2024.
* 17/18 - 23/24 files have been updated.
* Homelessness completeness flag is now available in 23/24 files.
* Substance misuse flag updated.
* Mid-2023 & Mid-2022 population estimates for Scotland have been updated.
* Mid-2022 Small Area Population Estimates for 2011 Data Zones have been updated.

# September 2024 Update - released 13-Sep-24
* New 24/25 files created
* New NSU cohort for 23/24 available
* New SPARRA scores calculated from April 24/25
Expand Down Expand Up @@ -75,7 +83,7 @@
* Homelessness Flags.
* Bug fixes:
* Blank `datazone` in A&E. This has been fixed and was due to PC8 postcode format matching onto SLF pc lookup.
* Large increase in preventable beddays. This was caused due to an SPSS vs R logic difference. Uses SPSS logic which

Check warning on line 86 in NEWS.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`SPSS` is not a recognized word. (unrecognized-spelling)

Check warning on line 86 in NEWS.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`SPSS` is not a recognized word. (unrecognized-spelling)
brings the difference down to `3.3%`.
* Issue with `locality` which showed `locality` in each row instead of its true `locality`. This has now been fixed.
* Duplicated CHI in the individual file. The issue was identified when trying to include HRIs. This has now been corrected.
Expand All @@ -94,7 +102,7 @@
* Removal of `keydate1_dateformat` and `keydate2_dateformat`.
* `dd_responsible_lca` – This variable now uses CA2019 codes instead of the 2-digit ‘old’ LCA code.
* Preventable beddays - not able to calculate these correctly. * Death fixes not included.
* Variables not ordered in R like they used to be in SPSS.

Check warning on line 105 in NEWS.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`SPSS` is not a recognized word. (unrecognized-spelling)
* End of HHG.
* New variable `ch_postcode`.
* rename of variables `cost_total_net_incdnas`, `ooh_outcome.1`, `ooh_outcome.2`, `ooh_outcome.3`, `ooh_outcome.4`, `totalnodncontacts`.
Expand Down Expand Up @@ -155,7 +163,7 @@
* Fixed a bug where CH costs was not referring to end of year.
* e.g. 2018 costs relates to 2017/18
* The changes to Homelessness described in the March update have been properly implemented.
* We now use [`{haven}`](https://haven.tidyverse.org/news/index.html) to compress the SPSS files which compresses them better than SPSS does 🤷

Check warning on line 166 in NEWS.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`SPSS` is not a recognized word. (unrecognized-spelling)

Check warning on line 166 in NEWS.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`SPSS` is not a recognized word. (unrecognized-spelling)
♂️
* `cij_marker` is now a numeric instead of a string which changes empty strings to missing instead of blank using sysmis.
* Check code of the form `cij_marker = "x"`. `x` now needs to be a numeric.
Expand Down
File renamed without changes.
File renamed without changes.
70 changes: 70 additions & 0 deletions Pre_processing_scripts/write_anon_chi_files.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
################################################################################
# Name of file - Write_anon_chi_files.R
#
# Original Authors - Jennifer Thom, Zihao Li
# Original Date - July 2024
# Written/run on - R Posit
# Version of R - 4.1.2
#
# Description: Run this script in stages to convert chi to anon chi and save files.
# By default this is set up to take the delayed discharges file
# convert the chi to anon_chi and save to disk. Important for
# ensuring we do not save chi anywhere on disk.
#
################################################################################

## Stage 1 - Setup environment
#-------------------------------------------------------------------------------

# Set up directory
source_dir <- "/conf/hscdiip/SLF_Extracts/Delayed_Discharges"

# Specify type of files e.g parquet, rds, csv
pattern <- ".parquet"
cat(stringr::str_glue("Looking in '{source_dir}' for parquet files."))

# List all files in the directory
parquet_files <- list.files(source_dir, pattern = ".parquet", full.names = TRUE)
print(stringr::str_glue("Found {length(parquet_files)} parquet files to process."))

# Create a function to read variable names and check if CHI is in the file
is_chi_in_file <- function(filename) {
data <- arrow::read_parquet(filename, nrow = 5)
return(grepl("chi", names(data)) %>% any())
}


# Stage 2 - In each file, convert chi to anon_chi and save to disk
#-------------------------------------------------------------------------------

# create a loop for converting to anon chi in all listed files
for (data_file in parquet_files) {
# specify new name and new file path
save_file_path <- file.path(source_dir, paste0("anon-", basename(data_file)))
chi_in_file <- is_chi_in_file(data_file)

# If chi is in the file, convert to anon_chi
if (chi_in_file) {
read_file(data_file) %>%
slfhelper::get_anon_chi() %>%
write_file(save_file_path)

cat("Replaced chi with anon chi:", data_file, "to", save_file_path, "\n")
} else {
read_file(data_file) %>%
write_file(save_file_path)
cat("renamed file with anon chi:", data_file, "to", save_file_path, "\n")
}
}


# Stage 3 - Remove files with CHI
#-------------------------------------------------------------------------------

# Create a loop for removing the old files with CHI
for (data_file in parquet_files) {
file.remove(data_file)
cat("Removed chi files:", data_file, "in", source_dir, "\n")
}

# End of Script #
137 changes: 129 additions & 8 deletions R/00-update_refs.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,109 @@
################################################################################
# # Name of file - 00-update_refs.R
# Original Authors - Jennifer Thom, Zihao Li
# Original Date - August 2021
# Update - Oct 2024
#
# Written/run on - RStudio Server
# Version of R - 4.1.2
#
# Description - Use this script to update references needed for the SLF update.
#
# Manual changes needed to the following Essential Functions:
# # End_date
# # Check_year_valid
# # Delayed_discharges_period
# # Latest_update
#
################################################################################

#' End date
#'
#' @return Get the end date of the latest update period
#' @export
#'
end_date <- function() {
## UPDATE ##
# Specify update by indicating end of quarter date
# Q1 June = 30062024
# Q2 September = 30092024
# Q3 December = 31122024
# Q4 March = 31032024
lubridate::dmy(31122024)
}


#' Check data exists for a year
#'
#' @description Check there is data available for a given year
#' as some extracts are year dependent. E.g Homelessness
#' is only available from 2016/17 onwards.
#'
#' @param year Financial year
#' @param type name of extract
#'
#' @return A logical TRUE/FALSE
check_year_valid <- function(

Check warning on line 46 in R/00-update_refs.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/00-update_refs.R,line=46,col=1,[cyclocomp_linter] Functions should have cyclomatic complexity of less than 15, this has 33.
year,
type = c(
"acute",
"ae",
"at",
"ch",
"client",
"cmh",
"cost_dna",
"dd",
"deaths",
"dn",
"gpooh",
"hc",
"homelessness",
"hhg",
"maternity",
"mh",
"nsu",
"outpatients",
"pis",
"sds",
"sparra"
)) {
if (year <= "1415" && type %in% c("dn", "sparra")) {
return(FALSE)
} else if (year <= "1516" && type %in% c("cmh", "homelessness", "dd")) {
return(FALSE)
} else if (year <= "1617" && type %in% c("ch", "hc", "sds", "at", "client", "cost_dna")) {

Check warning on line 75 in R/00-update_refs.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/00-update_refs.R,line=75,col=81,[line_length_linter] Lines should not be more than 80 characters. This line is 92 characters.
return(FALSE)
} else if (year <= "1718" && type %in% "hhg") {
return(FALSE)
} else if (year >= "2122" && type %in% c("cmh", "dn")) {
return(FALSE)
} else if (year >= "2324" && type %in% "hhg") {
return(FALSE)
} else if (year >= "2425" && type %in% c("nsu", "sds")) {
return(FALSE)
} else if (year >= "2526" && type %in% c("ch", "hc", "sds", "at", "sparra")) {
return(FALSE)
}

return(TRUE)
}


#' Delayed Discharge period
#'
#' @description Get the period for Delayed Discharge
#'
#' @return The period for the Delayed Discharge file
#' as MMMYY_MMMYY
#' @export
#'
#' @family initialisation
get_dd_period <- function() {
"Jul16_Sep24"
}


#' Latest update
#'
#' @description Get the date of the latest update, e.g 'Jun_2022'
Expand All @@ -7,9 +113,10 @@
#'
#' @family initialisation
latest_update <- function() {
"Sep_2024"
"Dec_2024"
}


#' Previous update
#'
#' @param months_ago Number of months since the previous update
Expand All @@ -34,11 +141,11 @@

latest_update_date <- lubridate::my(latest_update())

previous_update_year <- lubridate::year(

Check warning on line 144 in R/00-update_refs.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/00-update_refs.R,line=144,col=3,[object_usage_linter] local variable 'previous_update_year' assigned but may not be used
latest_update_date - lubridate::period(months_ago, "months")
)

previous_update_month <- lubridate::month(

Check warning on line 148 in R/00-update_refs.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/00-update_refs.R,line=148,col=3,[object_usage_linter] local variable 'previous_update_month' assigned but may not be used
latest_update_date - lubridate::period(months_ago, "months"),
label = TRUE,
abbr = TRUE
Expand All @@ -51,19 +158,33 @@
return(previous_update)
}

#' Delayed Discharge period

#' Extract latest FY from end_date
#'
#' @description Get the period for Delayed Discharge
#' @return fy in format "2024"
#' @export
#'
#' @return The period for the Delayed Discharge file
#' as MMMYY_MMMYY
fy <- function() {
# Latest FY
fy <- phsmethods::extract_fin_year(end_date()) %>% substr(1, 4)

Check warning on line 169 in R/00-update_refs.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/00-update_refs.R,line=169,col=3,[object_usage_linter] local variable 'fy' assigned but may not be used
}


#' Extract latest quarter from end_date
#'
#' @return qtr in format "Q1"
#' @export
#'
#' @family initialisation
get_dd_period <- function() {
"Jul16_Jun24"
qtr <- function() {
# Latest Quarter
qtr <- lubridate::quarter(end_date(), fiscal_start = 4)

qtr <- stringr::str_glue("Q{qtr}")

return(qtr)
}


#' The year list for slf to update
#'
#' @description Get the vector of years to update slf
Expand Down
Loading
Loading