From 99020b606704eed2c5135829c79aebfeb5d4cb69 Mon Sep 17 00:00:00 2001
From: Jennit07 <67372904+Jennit07@users.noreply.github.com>
Date: Mon, 2 Oct 2023 10:21:39 +0100
Subject: [PATCH 01/17] Rename function to `convert_sc_sending_location_to_lca`
 (#839)

* Bump `{slfhelper}` version

The new version is needed to read the SLFs now. We use this in `get_existing_data_for_tests()`

* Remove unnecessary code from `get_anon_chi` (#759)

* remove unnecessary code from `get_anon_chi`

`get_anon_chi` was updated in slfhelper v0.10

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5669528966/attempts/1
Accepted in https://github.com/Public-Health-Scotland/source-linkage-files/pull/759#issuecomment-1651842662

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Co-authored-by: marjom02 <megan.mcnicol2@nhs.scot>
Co-authored-by: Megan McNicol <SwiftySalmon@users.noreply.github.com>

* Set the default reporter for `tar_outdated()` and friends

* Comment out dataset writing targets

These take a very long time to run, so were skipped at the last update. They need to be revisited.

* Make sure `year` is added as the first variable

* Correct some documentation (#769)

* Correct some documentation

This resolves a build warning.

* Style code

---------

Co-authored-by: Moohan <Moohan@users.noreply.github.com>

* Make some changes suggested by lintr

Lots of layout changes, as well as lots of implicit to explicit integer / double changes.

* Document

* Fix documentation typo

* Investigate missing datazone from episode file (#773)

* Format postcode into `pc7` format

* Style code

* Style code

* Update documentation

* Update comment in R/process_extract_ae.R

* Implement catch-all for PC7 format

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: Moohan <Moohan@users.noreply.github.com>

* Remove some obsolete code (#770)

* Remove some obsolete code

Renaming and removing some functions.

* Style code

---------

Co-authored-by: Moohan <Moohan@users.noreply.github.com>
Co-authored-by: Zihao Li <zihao.li@phs.scot>

* Simplify `create_hscp_test_flags` (#772)

* Simplify `create_hscp_test_flags`

* Update documentation

* Style code

* simplify `create_hb_test_flags`

* implement hscp test flags into tests

* Simplify `create_demog_test_flags`

---------

Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: Moohan <Moohan@users.noreply.github.com>

* Rewrite case when statements (#780)

* updated code from case_when to case_match as it's a bit easier to read

* Style code

* changed some more `case_when` to `case_match`

* Style code

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5752014211/attempts/1
Accepted in https://github.com/Public-Health-Scotland/source-linkage-files/pull/780#issuecomment-1664201334

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>

* Add tests for `convert_sending_location_to_lca`

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Co-authored-by: marjom02 <megan.mcnicol2@nhs.scot>
Co-authored-by: SwiftySalmon <SwiftySalmon@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>

* Update R-CMD-check.yaml (#781)

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* added solve for hscp names (#789)

In processed extract variable is called hscp, and in final SLF it's called hscp2018.

Fixed with nested if statement

Co-authored-by: marjom02 <megan.mcnicol2@nhs.scot>

* Fix locality (#802)

Tiny error and a simple fix.

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Add simple scripts for running targets as a workbench job (#767)

* Fix CHI duplicates of chi in individual file (#791)

* fix duplicated matches in chi in sc data.

* Update R/create_individual_file.R

* update on join_sc_client

* Create a test checking if individual files have duplicated chi

* add duplicated chi number to the tests in process_tests_individual_file

---------

Co-authored-by: lizihao-anu <lizihao-anu@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>

* Update NSU code for new 22/23 cohort (#784)

Update `check_year_valid` for NSUs

* Amend `get_boxi_extract_path` function for archiving DN and CMH data  (#785)

* Update `get_boxi_extract_path` for DN/CMH data

* Remove extra function

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5856792420/attempts/1
Accepted in https://github.com/Public-Health-Scotland/source-linkage-files/pull/785#issuecomment-1677400900

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>

* Fix increase in total preventable beddays (#779)

* further obsolete code change

* fix the preventable_beddays

Co-authored-by: James McMahon <james.mcmahon@phs.scot>

---------

Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* fix warning on `:=` (#797)

* fix warning on `:=`

* Update R/aggregate_by_chi.R

Co-authored-by: James McMahon <james.mcmahon@phs.scot>

* Style code

---------

Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: lizihao-anu <lizihao-anu@users.noreply.github.com>

* Add 2324 targets/workbench job file

* Use `get_source_extract_path` in homelessness (#796)

This was already set up, just not used for some reason. Note that this will switch from using a `.rds` to `.parquet` (unless you do `get_source_extract_path(year, "Homelessness", ext = "rds")`).

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Correct tests for NSU

* Update script for extracting NSU from SMRA space

* Update year in 99_NSU extract script

* Update news for September 23 update (#811)

* Update News for March and June updates

* Update release date

* WIP - update news for Sep update

* Update NEWS.md

Fix some typos / grammar

---------

Co-authored-by: James McMahon <james.mcmahon@phs.scot>

* Apply styling

* Fix issue with `case_match` types (#810)

* Fix issue with `case_match` types

It seems that `case_match()` is stricter about types than `case_when()`. See the below code:

```r
library(dplyr)
# Breaks
mutate(starwars,
  new_height = case_when(
    height == "172" ~ "170"),
  new_height2 = case_match(
    height,
    "172" ~ "170"
  ),
  .after = "height"
)

# Works
mutate(starwars,
  new_height = case_when(
      height == "172" ~ "170"),
  new_height2 = case_match(
    height,
    172L ~ "170"
  ),
  .after = "height"
)
```

Since `sending_location` is an integer, the LHS of `case_match` must be numeric. It was slightly incorrect previously but `case_when` let us get away with it!

I also updated and added to the tests.

* Style code

* Style code

---------

Co-authored-by: Moohan <Moohan@users.noreply.github.com>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>

* Bug - Outpatients tests failing due to missing HSCP (#816)

* Update `produce_source_extract_tests`

* Update outpatients tests with hscp_var = FALSE

* Revert "Style code"

This reverts commit 8e73d4abc042986a76754c2acc1d197292a1c245.

* Style code

* simplify code

* Update documentation

* Rename `hscp_var` to `add_hscp_count`

* Update documentation

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: Moohan <Moohan@users.noreply.github.com>

* fix read_sc_all_alarms_telecare with incorrect format in period (#814)

* fix read_sc_all_alarms_telecare with the incorrect format in period

---------

Co-authored-by: lizihao-anu <lizihao-anu@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>

* Fix `convert_sending_location_to_lca` example

* Use `col_select` instead of `columns` in tests

* Add tests for `compute_mid_year_age` (#809)

* Add tests for `compute_mid_year_age`

* Remove redundant code

* Update documentation

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Add a new function to set up keyring (#800)

* Add a new function to set up keyring

I've tested this by deleting my `.Renviron` and deleting my keyring `keyring::keyring_delete("createslf")` and it seems to work. Would be great to have someone with an existing set-up (Jen) test it, and to have someone who doesn't have it set up to test it.

The code looks complicated but I've just tried to catch every scenario, so the process should be smooth and clear (from the user's point of view).

I've also expanded the code relating to the username, which will now hopefully work in more cases.

* [check-spelling] Update metadata

Update for https://github.com/Public-Health-Scotland/source-linkage-files/actions/runs/5824423711/attempts/1
Accepted in https://github.com/Public-Health-Scotland/source-linkage-files/pull/800#issuecomment-1673658357

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>

* Update documentation

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Add additional tests for `get_file_path` (#808)

* Add additional tests for `get_file_path`

* Style code

---------

Co-authored-by: Moohan <Moohan@users.noreply.github.com>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Rename `run_episode_file()` -> `create_episode_file()` (#803)

* Rename `run_episode_file()` -> `create_episode_file()`

This improves consistency! When speaking to Megan we noted that having the two 'main' functions with different names was needlessly confusing!

* Delete run_targets_tests.R

* Update documentation

---------

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>

* Remove incorrect references to rds (#798)

* Remove incorrect references to rds

Since we (mostly) don't use rds anymore these references are incorrect and potentially confusing.

I've updated lots of documentation to remove the reference to rds.

I've also updated many comments that mentioned rds (these were probably the most confusing).

* Update documentation

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>

* Make targets and tarchetypes required packages (#799)

Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>

* Update episode file functions to pass data through (#754)

* Update `read_file` to return an empty tibble if passed the dummy path

This is needed for some other bits, notably NSUs

* Update SPARRA and HHG paths to return dummy if the year is invalid

* Extract all data as a parameter

* Style code

* Update documentation

* Style code

* Update documentation

* rename `run` to `create_episode_file`

* Update documentation

---------

Co-authored-by: Moohan <Moohan@users.noreply.github.com>
Co-authored-by: Jennifer Thom <jennifer.thom@phs.scot>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>

* Tests/it extract path (#807)

* Add additional tests for `check_it_reference()`

* Make the check on the IT reference stricter

* Update documentation

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Update workflow to run against the development branch (#795)

* Make test-coverage.yaml run against development

* Make lint-changed-files.yaml run against development

---------

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Remove package wide imports of `readr` (#792)

* Update documentation

* Use `readr::` where possible

* Update documentation

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>

* Handle OpenData extracts better (#794)

* Refactor the LA Code OpenData

This should now run as its own target and then be passed to the homelessness data.

I also added some tests.

* Also add some tests for the GP prac clusters OpenData

* Update documentation

---------

Co-authored-by: Moohan <Moohan@users.noreply.github.com>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>

* Fix the pkgdown site/job (#804)

* Fix the pkgdown site/job

It generates this site: https://public-health-scotland.github.io/source-linkage-files/ although it hasn't been working for a while since any new function needs to be added to (or captured by) the `_pkgdown.yml` file.

This PR is a pretty minimal fix to get the site working again.

* Update documentation

* Update documentation

* Update `create_episode_file`

* Remove `run_episode_file`

* update documentation

---------

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Jennifer Thom <jennifer.thom@phs.scot>

* Add new 'final' file path functions (#787)

* New function for SLF final file paths

* Implement final file path functions

* Style code

* Update documentation

* Update final file paths to use `...`

* fixing conflicts with `run episode file` getting renamed to `create episode file`

* Update documentation

* Update documentation

* Style code

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: marjom02 <megan.mcnicol2@nhs.scot>
Co-authored-by: SwiftySalmon <SwiftySalmon@users.noreply.github.com>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>

* Check scripts are in snake case (#793)

* Update `get_boxi_extract_path` for DN/CMH data

* Remove extra function

* Update documentation

* change `get_boxi_extract_path` to snake_case

* change `get_source_extract_path` to snake_case

* Update documentation

* Update targets with snake_case

* Fix typo

* Style code

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>
Co-authored-by: SwiftySalmon <SwiftySalmon@users.noreply.github.com>

* transform the python script for sorting BI extracts to R (#833)

* transform the python script for sorting BI extracts to R

* Style code

* Delete 00-Sort_BI_Extracts.py

---------

Co-authored-by: lizihao-anu <lizihao-anu@users.noreply.github.com>

* Use `get_slf_episode_path` in right place

* fix pipe

* Fix typo in string

* Update documentation

* Rename to `convert_sc_sending_location_to_lca`

* Update documentation

* Style code

* Update documentation

---------

Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com>
Co-authored-by: James McMahon <james.mcmahon@phs.scot>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>
Co-authored-by: marjom02 <megan.mcnicol2@nhs.scot>
Co-authored-by: Megan McNicol <SwiftySalmon@users.noreply.github.com>
Co-authored-by: Moohan <Moohan@users.noreply.github.com>
Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Zihao Li <zihao.li@phs.scot>
Co-authored-by: lizihao-anu <lizihao-anu@users.noreply.github.com>
---
 00-Sort_BI_Extracts.py                        |  85 -----------
 00_Sort_BI_Extracts.R                         |  50 +++++++
 NAMESPACE                                     |  17 +--
 R/add_nsu_cohort.R                            |   2 +-
 R/add_ppa_flag.R                              |   2 +-
 R/convert_ca_to_lca.R                         |   2 +-
 ...R => convert_sc_sending_location_to_lca.R} |   4 +-
 R/cost_uplift.R                               |   2 +
 R/create_episode_file.R                       |   9 +-
 R/create_individual_file.R                    |  46 ++++--
 R/createslf-package.R                         |   2 -
 R/get_boxi_extract_path.R                     |  54 +++----
 R/get_existing_data_for_tests.R               |   3 +
 R/get_final_file_paths.R                      |  34 +++++
 ...lookup.R => get_la_code_opendata_lookup.R} |   5 +-
 R/get_source_extract_path.R                   |  76 +++++-----
 R/link_delayed_discharge_eps.R                |   4 +-
 R/process_extract_acute.R                     |   2 +-
 R/process_extract_ae.R                        |  16 +-
 R/process_extract_alarms_telecare.R           |   2 +-
 R/process_extract_care_home.R                 |   4 +-
 R/process_extract_cmh.R                       |   2 +-
 R/process_extract_delayed_discharges.R        |   2 +-
 R/process_extract_district_nursing.R          |   2 +-
 R/process_extract_gp_ooh.R                    |   2 +-
 R/process_extract_home_care.R                 |   2 +-
 R/process_extract_homelessness.R              |   3 +-
 R/process_extract_maternity.R                 |   2 +-
 R/process_extract_mental_health.R             |   2 +-
 R/process_extract_nrs_deaths.R                |   2 +-
 R/process_extract_outpatients.R               |   2 +-
 R/process_extract_prescribing.R               |   2 +-
 R/process_extract_sds.R                       |   2 +-
 R/process_lookup_deaths.R                     |   2 +-
 R/process_sc_all_alarms_telecare.R            |   2 +-
 R/process_sc_all_home_care.R                  |   2 +-
 R/process_sc_all_sds.R                        |   2 +-
 R/process_tests_episode_file.R                |   1 +
 R/read_extract_acute.R                        | 140 +++++++++---------
 R/read_extract_ae.R                           |  74 ++++-----
 R/read_extract_cmh.R                          |  38 ++---
 R/read_extract_district_nursing.R             |  40 ++---
 R/read_extract_gp_ooh.R                       |   6 +-
 R/read_extract_homelessness.R                 |  48 +++---
 R/read_extract_maternity.R                    | 110 +++++++-------
 R/read_extract_mental_health.R                | 118 +++++++--------
 R/read_extract_nrs_deaths.R                   |  56 +++----
 R/read_extract_ooh_consultations.R            |   2 +-
 R/read_extract_ooh_diagnosis.R                |   2 +-
 R/read_extract_ooh_outcomes.R                 |   2 +-
 R/read_extract_outpatients.R                  |  74 ++++-----
 R/read_extract_prescribing.R                  |  16 +-
 R/read_it_chi_deaths.R                        |   8 +-
 R/read_lookup_ltc.R                           |  44 +++---
 _pkgdown.yml                                  |  60 ++++++--
 _targets.R                                    |  32 ++--
 man/add_acute_columns.Rd                      |  30 ++++
 man/add_ae_columns.Rd                         |  30 ++++
 man/add_all_columns.Rd                        |  30 ++++
 man/add_at_columns.Rd                         |  30 ++++
 man/add_ch_columns.Rd                         |  30 ++++
 man/add_cij_columns.Rd                        |  30 ++++
 man/add_cmh_columns.Rd                        |  30 ++++
 man/add_dd_columns.Rd                         |  30 ++++
 man/add_dn_columns.Rd                         |  30 ++++
 man/add_gls_columns.Rd                        |  30 ++++
 man/add_hc_columns.Rd                         |  30 ++++
 man/add_hl1_columns.Rd                        |  30 ++++
 man/add_ipdc_cols.Rd                          |  33 ++++-
 man/add_mat_columns.Rd                        |  30 ++++
 man/add_mh_columns.Rd                         |  30 ++++
 man/add_nrs_columns.Rd                        |  30 ++++
 man/add_nsu_cohort.Rd                         |   8 +-
 man/add_nsu_columns.Rd                        |  30 ++++
 man/add_ooh_columns.Rd                        |  30 ++++
 man/add_op_columns.Rd                         |  30 ++++
 man/add_pis_columns.Rd                        |  30 ++++
 man/add_ppa_flag.Rd                           |   8 +-
 man/add_sds_columns.Rd                        |  30 ++++
 man/add_standard_cols.Rd                      |  33 ++++-
 man/apply_cost_uplift.Rd                      |   8 +
 man/clean_up_ch.Rd                            |  30 ++++
 man/condition_cols.Rd                         |  30 ++++
 man/convert_ca_to_lca.Rd                      |   4 +-
 man/convert_hb_to_hbnames.Rd                  |   2 +-
 man/convert_hscp_to_hscpnames.Rd              |   2 +-
 ... => convert_sc_sending_location_to_lca.Rd} |  10 +-
 man/create_individual_file.Rd                 |  30 ++++
 man/get_boxi_extract_path.Rd                  |   4 +-
 man/get_la_code_opendata_lookup.Rd            |  16 ++
 man/get_slf_episode_path.Rd                   |  19 +++
 man/get_slf_individual_path.Rd                |  19 +++
 man/get_source_extract_path.Rd                |   4 +-
 man/la_code_lookup.Rd                         |  20 ---
 man/link_delayed_discharge_eps.Rd             |  10 +-
 man/lookup_uplift.Rd                          |   8 +
 man/max_no_inf.Rd                             |   5 +
 man/min_no_inf.Rd                             |   5 +
 man/process_extract_homelessness.Rd           |   1 +
 man/process_slf_deaths_lookup.Rd              |   2 +-
 man/read_extract_acute.Rd                     |   2 +-
 man/read_extract_ae.Rd                        |   2 +-
 man/read_extract_cmh.Rd                       |   2 +-
 man/read_extract_district_nursing.Rd          |   2 +-
 man/read_extract_gp_ooh.Rd                    |   6 +-
 man/read_extract_homelessness.Rd              |   2 +-
 man/read_extract_maternity.Rd                 |   2 +-
 man/read_extract_mental_health.Rd             |   2 +-
 man/read_extract_nrs_deaths.Rd                |   2 +-
 man/read_extract_ooh_consultations.Rd         |   2 +-
 man/read_extract_ooh_diagnosis.Rd             |   2 +-
 man/read_extract_ooh_outcomes.Rd              |   2 +-
 man/read_extract_outpatients.Rd               |   2 +-
 man/recode_gender.Rd                          |  30 ++++
 man/remove_blank_chi.Rd                       |  30 ++++
 .../_snaps/convert_sending_location_to_lca.md |   2 +-
 .../_snaps/get_la_code_opendata_lookup.md     |  20 +++
 .../test-convert_sending_location_to_lca.R    |   6 +-
 tests/testthat/test-get_gpprac_opendata.R     |  18 +++
 .../test-get_la_code_opendata_lookup.R        |  13 ++
 120 files changed, 1685 insertions(+), 700 deletions(-)
 delete mode 100644 00-Sort_BI_Extracts.py
 create mode 100644 00_Sort_BI_Extracts.R
 rename R/{convert_sending_location_to_lca.R => convert_sc_sending_location_to_lca.R} (92%)
 create mode 100644 R/get_final_file_paths.R
 rename R/{la_code_lookup.R => get_la_code_opendata_lookup.R} (84%)
 rename man/{convert_sending_location_to_lca.Rd => convert_sc_sending_location_to_lca.Rd} (69%)
 create mode 100644 man/get_la_code_opendata_lookup.Rd
 create mode 100644 man/get_slf_episode_path.Rd
 create mode 100644 man/get_slf_individual_path.Rd
 delete mode 100644 man/la_code_lookup.Rd
 create mode 100644 tests/testthat/_snaps/get_la_code_opendata_lookup.md
 create mode 100644 tests/testthat/test-get_gpprac_opendata.R
 create mode 100644 tests/testthat/test-get_la_code_opendata_lookup.R

diff --git a/00-Sort_BI_Extracts.py b/00-Sort_BI_Extracts.py
deleted file mode 100644
index 52bdb4d3d..000000000
--- a/00-Sort_BI_Extracts.py
+++ /dev/null
@@ -1,85 +0,0 @@
-import os
-from collections import defaultdict
-import re
-import gzip
-
-if __name__ == "__main__":
-    compress_files = False
-
-    base_dir = r"\\stats\sourcedev\Source_Linkage_File_Updates\Extracts Temp"
-
-    print("Looking in '{}' for csv files.".format(base_dir))
-
-    # Create a list of all the csv files
-    all_extracts = [file for file in os.listdir(base_dir) if file.endswith(".csv")]
-
-    # Set up a default dict
-    files_by_year = defaultdict(list)
-
-    # Set up the regEx
-    # Look for files ending "-20...."
-    pattern = re.compile(r"-20(\d\d\d\d).csv")
-
-    # Create a dictionary as {'Year':[file1, file2]} etc.
-    # match.group(1) will be the year e.g. 1718
-    for file in all_extracts:
-        match = pattern.search(file)
-        if match:
-            files_by_year[match.group(1)].append(file)
-
-    n_files = files_by_year.__len__()
-
-    if n_files == 0:
-        print("No correctly named csv files found.")
-    else:
-        print("Found {} csv files to process.".format(n_files))
-
-    # Loop through the dictionary by year
-    for year in files_by_year.keys():
-        # Create a string for the relevant year's directory
-        year_dir = os.path.join(
-            r"\\stats\sourcedev\Source_Linkage_File_Updates\{}\Extracts".format(year)
-        )
-
-        # First check if the year folder exists
-        # if not create it
-        if os.path.exists(year_dir) != True:
-            os.makedirs(year_dir)
-            print("Creating new folder for {}".format(year))
-
-        for file in files_by_year[year]:
-            # Create string for the 'old' and 'new' locations
-            unsorted_file = os.path.join(base_dir, file)
-            sorted_file = os.path.join(year_dir, file)
-
-            # If a file already exists remove the old one first
-            if os.path.exists(sorted_file):
-                try:
-                    os.remove(sorted_file)
-                except PermissionError:
-                    print(
-                        "Tried to remove {} from the {} Extracts folder but couldn't.\nCheck if the file is open then re-run this script.".format(
-                            file, year
-                        )
-                    )
-                else:
-                    print(
-                        "Removed the existing {} from the {} Extracts folder.".format(
-                            file, year
-                        )
-                    )
-
-            # Move to the sorted location
-            os.rename(unsorted_file, sorted_file)
-            print("Moved {} to the {} Extracts folder.".format(file, year))
-
-            if compress_files:
-                with open(sorted_file, "rb") as uncompressed_csv:
-                    with gzip.open(sorted_file + ".gz", "wb") as gzip_csv:
-                        print("Compressing {} ...".format(file))
-                        gzip_csv.writelines(uncompressed_csv)
-                os.remove(sorted_file)
-
-    input(
-        "\n---------------------------------------------\nThe script has finished, press enter to exit."
-    )
diff --git a/00_Sort_BI_Extracts.R b/00_Sort_BI_Extracts.R
new file mode 100644
index 000000000..888ede5b2
--- /dev/null
+++ b/00_Sort_BI_Extracts.R
@@ -0,0 +1,50 @@
+# Define the source directory and financial year pattern
+compress_files <- FALSE
+source_dir <- "/conf/sourcedev/Source_Linkage_File_Updates/Extracts Temp"
+pattern <- "-20(\\d{4})\\.csv"
+
+
+# List all the CSV files in the source directory
+cat(stringr::str_glue("Looking in '{source_dir}' for csv files."))
+csv_files <- list.files(source_dir, pattern = ".csv", full.names = TRUE)
+print(stringr::str_glue("Found {length(csv_files)} csv files to process."))
+
+# Create a function to extract the financial year from a filename
+extract_financial_year <- function(filename) {
+  match <- regexpr(pattern, basename(filename))
+  if (match[[1]][1] > 0) {
+    financial_year <- substr(basename(filename), match[[1]][1] + 3, match[[1]][1] + 6)
+    return(financial_year)
+  } else {
+    return(NULL)
+  }
+}
+
+# Create directories for each financial year and move files
+for (csv_file in csv_files) {
+  financial_year <- extract_financial_year(csv_file)
+  # check if year directory exists
+  if (!is.null(financial_year)) {
+    financial_year_dir <- file.path("/conf/sourcedev/Source_Linkage_File_Updates", financial_year, "Extracts")
+    # if not, create the year directory
+    if (!dir.exists(financial_year_dir)) {
+      dir.create(financial_year_dir)
+    }
+
+    # compress file
+    if (compress_files) {
+      cat("Compressing:", basename(csv_file), "\n")
+      system2(
+        command = "gzip",
+        args = shQuote(csv_file)
+      )
+      csv_file <- paste0(csv_file, ".gz")
+    }
+
+    # move file
+    new_file_path <- file.path(financial_year_dir, basename(csv_file))
+    file.copy(csv_file, new_file_path)
+    file.remove(csv_file)
+    cat("Moved:", csv_file, "to", new_file_path, "\n")
+  }
+}
diff --git a/NAMESPACE b/NAMESPACE
index c5dca28bd..c9ffc03d2 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -14,7 +14,7 @@ export(convert_fyyear_to_year)
 export(convert_hb_to_hbnames)
 export(convert_hscp_to_hscpnames)
 export(convert_numeric_to_date)
-export(convert_sending_location_to_lca)
+export(convert_sc_sending_location_to_lca)
 export(convert_year_to_fyyear)
 export(create_episode_file)
 export(create_homelessness_lookup)
@@ -47,6 +47,7 @@ export(get_homelessness_completeness_path)
 export(get_it_deaths_path)
 export(get_it_ltc_path)
 export(get_it_prescribing_path)
+export(get_la_code_opendata_lookup)
 export(get_locality_path)
 export(get_lookups_dir)
 export(get_ltcs_path)
@@ -66,7 +67,9 @@ export(get_slf_ch_name_lookup_path)
 export(get_slf_chi_deaths_path)
 export(get_slf_deaths_lookup_path)
 export(get_slf_dir)
+export(get_slf_episode_path)
 export(get_slf_gpprac_path)
+export(get_slf_individual_path)
 export(get_slf_postcode_path)
 export(get_source_extract_path)
 export(get_sparra_path)
@@ -75,7 +78,6 @@ export(get_year_dir)
 export(gzip_files)
 export(is_date_in_fyyear)
 export(is_missing)
-export(la_code_lookup)
 export(last_date_month)
 export(latest_cost_year)
 export(latest_update)
@@ -172,17 +174,6 @@ export(write_file)
 importFrom(data.table,.N)
 importFrom(data.table,.SD)
 importFrom(magrittr,"%>%")
-importFrom(readr,col_character)
-importFrom(readr,col_date)
-importFrom(readr,col_datetime)
-importFrom(readr,col_double)
-importFrom(readr,col_factor)
-importFrom(readr,col_integer)
-importFrom(readr,col_logical)
-importFrom(readr,col_number)
-importFrom(readr,col_time)
-importFrom(readr,cols)
-importFrom(readr,cols_only)
 importFrom(rlang,":=")
 importFrom(rlang,.data)
 importFrom(tibble,tibble)
diff --git a/R/add_nsu_cohort.R b/R/add_nsu_cohort.R
index 00260bb8e..9a3032259 100644
--- a/R/add_nsu_cohort.R
+++ b/R/add_nsu_cohort.R
@@ -7,7 +7,7 @@
 #' @return A data frame containing the Non-Service Users as additional rows
 #' @export
 #'
-#' @family episode file
+#' @family episode_file
 #' @seealso [get_nsu_path()]
 add_nsu_cohort <- function(
     data,
diff --git a/R/add_ppa_flag.R b/R/add_ppa_flag.R
index d0d0c4395..bb99f0543 100644
--- a/R/add_ppa_flag.R
+++ b/R/add_ppa_flag.R
@@ -6,7 +6,7 @@
 #' @param data A data frame
 #'
 #' @return A data frame to use as a lookup of PPAs
-#' @family episode file
+#' @family episode_file
 add_ppa_flag <- function(data) {
   check_variables_exist(
     data,
diff --git a/R/convert_ca_to_lca.R b/R/convert_ca_to_lca.R
index 518d7e8fb..1bb803a5f 100644
--- a/R/convert_ca_to_lca.R
+++ b/R/convert_ca_to_lca.R
@@ -12,7 +12,7 @@
 #' convert_ca_to_lca(ca)
 #'
 #' @family code functions
-#' @seealso convert_sending_location_to_lca
+#' @seealso convert_sc_sending_location_to_lca
 convert_ca_to_lca <- function(ca_var) {
   lca <- dplyr::case_match(
     ca_var,
diff --git a/R/convert_sending_location_to_lca.R b/R/convert_sc_sending_location_to_lca.R
similarity index 92%
rename from R/convert_sending_location_to_lca.R
rename to R/convert_sc_sending_location_to_lca.R
index ff7e51db1..c78cfa602 100644
--- a/R/convert_sending_location_to_lca.R
+++ b/R/convert_sc_sending_location_to_lca.R
@@ -10,12 +10,12 @@
 #'
 #' @examples
 #' sending_location <- c(100, 120)
-#' convert_sending_location_to_lca(sending_location)
+#' convert_sc_sending_location_to_lca(sending_location)
 #'
 #' @family code functions
 #'
 #' @seealso convert_ca_to_lca
-convert_sending_location_to_lca <- function(sending_location) {
+convert_sc_sending_location_to_lca <- function(sending_location) {
   lca <- dplyr::case_match(
     sending_location,
     100L ~ "01", # Aberdeen City
diff --git a/R/cost_uplift.R b/R/cost_uplift.R
index 2bb1d4c1f..e554c2505 100644
--- a/R/cost_uplift.R
+++ b/R/cost_uplift.R
@@ -3,6 +3,7 @@
 #' @param data episode data
 #'
 #' @return episode data with uplifted costs
+#' @family episode_file
 apply_cost_uplift <- function(data) {
   data <- data %>%
     # attach a uplift scale as the last column
@@ -34,6 +35,7 @@ apply_cost_uplift <- function(data) {
 #' @param data episode data
 #'
 #' @return episode data with a uplift scale
+#' @family episode_file
 lookup_uplift <- function(data) {
   # We have set uplifts to use for 2020/21, 2021/22 and 2022/23,
   # provided by Paul Leak.
diff --git a/R/create_episode_file.R b/R/create_episode_file.R
index 1e2319836..f909defef 100644
--- a/R/create_episode_file.R
+++ b/R/create_episode_file.R
@@ -171,14 +171,7 @@ create_episode_file <- function(
   }
 
   if (write_to_disk) {
-    # TODO make the slf_path a function
-    slf_episode_path <- get_file_path(
-      get_year_dir(year),
-      stringr::str_glue(
-        "source-episode-file-{year}.parquet"
-      ),
-      check_mode = "write"
-    )
+    slf_episode_path <- get_slf_episode_path(year, check_mode = "write")
 
     write_file(episode_file, slf_episode_path)
   }
diff --git a/R/create_individual_file.R b/R/create_individual_file.R
index 664e69ad2..cbf1777a3 100644
--- a/R/create_individual_file.R
+++ b/R/create_individual_file.R
@@ -8,6 +8,7 @@
 #' @inheritParams create_episode_file
 #'
 #' @return The processed individual file
+#' @family individual_file
 #' @export
 create_individual_file <- function(
     episode_file,
@@ -134,13 +135,7 @@ create_individual_file <- function(
   }
 
   if (write_to_disk) {
-    slf_indiv_path <- get_file_path(
-      get_year_dir(year),
-      stringr::str_glue(
-        "source-individual-file-{year}.parquet"
-      ),
-      check_mode = "write"
-    )
+    slf_indiv_path <- get_slf_individual_path(year, check_mode = "write")
 
     write_file(individual_file, slf_indiv_path)
   }
@@ -151,7 +146,7 @@ create_individual_file <- function(
 #' Remove blank CHI
 #'
 #' @description Convert blank strings to NA and remove NAs from CHI column
-#'
+#' @family individual_file
 #' @inheritParams create_individual_file
 remove_blank_chi <- function(episode_file) {
   cli::cli_alert_info("Remove blank CHI function started at {Sys.time()}")
@@ -165,7 +160,7 @@ remove_blank_chi <- function(episode_file) {
 #' Add CIJ-related columns
 #'
 #' @description Add new columns related to CIJ
-#'
+#' @family individual_file
 #' @inheritParams create_individual_file
 add_cij_columns <- function(episode_file) {
   cli::cli_alert_info("Add cij columns function started at {Sys.time()}")
@@ -204,7 +199,7 @@ add_cij_columns <- function(episode_file) {
 #'
 #' @description Add new columns based on SMRType and recid which follow a pattern
 #' of prefixed column names created based on some condition.
-#'
+#' @family individual_file
 #' @inheritParams create_individual_file
 add_all_columns <- function(episode_file) {
   cli::cli_alert_info("Add all columns function started at {Sys.time()}")
@@ -261,6 +256,7 @@ add_all_columns <- function(episode_file) {
 #' @inheritParams create_individual_file
 #' @param prefix Prefix to add to related columns, e.g. "Acute"
 #' @param condition Condition to create new columns based on
+#' @family individual_file
 add_acute_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -271,6 +267,7 @@ add_acute_columns <- function(episode_file, prefix, condition) {
 #' Add Mat columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_mat_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -281,6 +278,7 @@ add_mat_columns <- function(episode_file, prefix, condition) {
 #' Add MH columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_mh_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -291,6 +289,7 @@ add_mh_columns <- function(episode_file, prefix, condition) {
 #' Add GLS columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_gls_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -301,6 +300,7 @@ add_gls_columns <- function(episode_file, prefix, condition) {
 #' Add OP columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_op_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file <- episode_file %>%
@@ -323,6 +323,7 @@ add_op_columns <- function(episode_file, prefix, condition) {
 #' Add AE columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_ae_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -333,6 +334,7 @@ add_ae_columns <- function(episode_file, prefix, condition) {
 #' Add PIS columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_pis_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -343,6 +345,7 @@ add_pis_columns <- function(episode_file, prefix, condition) {
 #' Add OoH columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_ooh_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file <- episode_file %>%
@@ -377,6 +380,7 @@ add_ooh_columns <- function(episode_file, prefix, condition) {
 #' Add DN columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_dn_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   if ("total_no_dn_contacts" %in% names(episode_file)) {
@@ -399,6 +403,7 @@ add_dn_columns <- function(episode_file, prefix, condition) {
 #' Add CMH columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_cmh_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -409,6 +414,7 @@ add_cmh_columns <- function(episode_file, prefix, condition) {
 #' Add DD columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_dd_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   condition_delay <- substitute(condition & primary_delay_reason != "9")
@@ -429,6 +435,7 @@ add_dd_columns <- function(episode_file, prefix, condition) {
 #' Add NSU columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_nsu_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -439,6 +446,7 @@ add_nsu_columns <- function(episode_file, prefix, condition) {
 #' Add NRS columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_nrs_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -449,6 +457,7 @@ add_nrs_columns <- function(episode_file, prefix, condition) {
 #' Add HL1 columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_hl1_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -458,6 +467,7 @@ add_hl1_columns <- function(episode_file, prefix, condition) {
 #' Add CH columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_ch_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -486,6 +496,7 @@ add_ch_columns <- function(episode_file, prefix, condition) {
 #' Add HC columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_hc_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file <- episode_file %>%
@@ -528,6 +539,7 @@ add_hc_columns <- function(episode_file, prefix, condition) {
 #' Add AT columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_at_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -541,6 +553,7 @@ add_at_columns <- function(episode_file, prefix, condition) {
 #' Add SDS columns
 #'
 #' @inheritParams add_acute_columns
+#' @family individual_file
 add_sds_columns <- function(episode_file, prefix, condition) {
   condition <- substitute(condition)
   episode_file %>%
@@ -560,7 +573,9 @@ add_sds_columns <- function(episode_file, prefix, condition) {
 #'
 #' @inheritParams add_acute_columns
 #' @param ipdc_d Whether to create columns based on IPDC = "D" (lgl)
-#' @param elective Whether to create columns based on Elective/Non-Elective cij_pattype (lgl)
+#' @param elective Whether to create columns based on Elective/Non-Elective
+#' cij_pattype (lgl)
+#' @family individual_file
 add_ipdc_cols <- function(episode_file, prefix, condition, ipdc_d = TRUE, elective = TRUE) {
   condition_i <- substitute(eval(condition) & ipdc == "I")
   episode_file <- episode_file %>%
@@ -598,11 +613,13 @@ add_ipdc_cols <- function(episode_file, prefix, condition, ipdc_d = TRUE, electi
 
 #' Add standard columns
 #'
-#' @description Add standard columns (DoB, postcode, gpprac, episodes, cost) to episode file.
+#' @description Add standard columns (DoB, postcode, gpprac, episodes, cost)
+#' to episode file.
 #'
 #' @inheritParams add_acute_columns
 #' @param episode Whether to create prefix_episodes col, e.g. "Acute_episodes"
 #' @param cost Whether to create prefix_cost col, e.g. "Acute_cost"
+#' @family individual_file
 add_standard_cols <- function(episode_file, prefix, condition, episode = FALSE, cost = FALSE) {
   if (episode) {
     episode_file <- dplyr::mutate(episode_file, "{prefix}_episodes" := dplyr::if_else(eval(condition), 1L, NA_integer_))
@@ -618,6 +635,7 @@ add_standard_cols <- function(episode_file, prefix, condition, episode = FALSE,
 #' @description Clean up CH-related columns.
 #'
 #' @inheritParams create_individual_file
+#' @family individual_file
 clean_up_ch <- function(episode_file, year) {
   cli::cli_alert_info("Clean up CH function started at {Sys.time()}")
 
@@ -660,6 +678,7 @@ clean_up_ch <- function(episode_file, year) {
 #' @description Recode gender to 1.5 if 0 or 9.
 #'
 #' @inheritParams create_individual_file
+#' @family individual_file
 recode_gender <- function(episode_file) {
   cli::cli_alert_info("Recode Gender function started at {Sys.time()}")
 
@@ -678,6 +697,7 @@ recode_gender <- function(episode_file) {
 #' @description Returns chr vector of column names
 #' which follow format "condition" and "condition_date" e.g.
 #' "dementia" and "dementia_date"
+#' @family individual_file
 condition_cols <- function() {
   conditions <- slfhelper::ltc_vars
   date_cols <- paste0(conditions, "_date")
@@ -692,6 +712,7 @@ condition_cols <- function() {
 #' are missing (instead returns NA)
 #'
 #' @param x Vector to return max of
+#' @family helper_funs
 max_no_inf <- function(x) {
   dplyr::if_else(all(is.na(x)), NA, max(x, na.rm = TRUE))
 }
@@ -703,6 +724,7 @@ max_no_inf <- function(x) {
 #' are missing (instead returns NA)
 #'
 #' @param x Vector to return min of
+#' @family helper_funs
 min_no_inf <- function(x) {
   dplyr::if_else(all(is.na(x)), NA, min(x, na.rm = TRUE))
 }
diff --git a/R/createslf-package.R b/R/createslf-package.R
index acf9154b6..cdd7d1a01 100644
--- a/R/createslf-package.R
+++ b/R/createslf-package.R
@@ -1,6 +1,4 @@
 ## usethis namespace: start
-#' @importFrom readr cols cols_only col_character col_date col_datetime
-#' col_double col_factor col_integer col_logical col_number col_time
 #' @importFrom tibble tibble
 #' @importFrom rlang := .data
 ## usethis namespace: end
diff --git a/R/get_boxi_extract_path.R b/R/get_boxi_extract_path.R
index 6096525e5..a4c2e4abc 100644
--- a/R/get_boxi_extract_path.R
+++ b/R/get_boxi_extract_path.R
@@ -13,23 +13,23 @@
 get_boxi_extract_path <- function(
     year,
     type = c(
-      "AE",
-      "AE_CUP",
-      "Acute",
-      "CMH",
-      "Deaths",
-      "DN",
-      "GP_OoH-c",
-      "GP_OoH-d",
-      "GP_OoH-o",
-      "Homelessness",
-      "Maternity",
-      "MH",
-      "Outpatients"
+      "ae",
+      "ae_cup",
+      "acute",
+      "cmh",
+      "deaths",
+      "dn",
+      "gp_ooh-c",
+      "gp_ooh-d",
+      "gp_ooh-o",
+      "homelessness",
+      "maternity",
+      "mh",
+      "outpatients"
     )) {
   type <- match.arg(type)
 
-  if (type %in% c("DN", "CMH")) {
+  if (type %in% c("dn", "cmh")) {
     dir <- fs::path(get_slf_dir(), "Archived_data")
   } else {
     dir <- get_year_dir(year, extracts_dir = TRUE)
@@ -41,19 +41,19 @@ get_boxi_extract_path <- function(
 
   file_name <- dplyr::case_match(
     type,
-    "AE" ~ "A&E-episode-level-extract",
-    "AE_CUP" ~ "A&E-UCD-CUP-extract",
-    "Acute" ~ "Acute-episode-level-extract",
-    "CMH" ~ "Community-MH-contact-level-extract",
-    "DN" ~ "District-Nursing-contact-level-extract",
-    "GP_OoH-c" ~ "GP-OoH-consultations-extract",
-    "GP_OoH-d" ~ "GP-OoH-diagnosis-extract",
-    "GP_OoH-o" ~ "GP-OoH-outcomes-extract",
-    "Homelessness" ~ "Homelessness-extract",
-    "Maternity" ~ "Maternity-episode-level-extract",
-    "MH" ~ "Mental-Health-episode-level-extract",
-    "Deaths" ~ "NRS-death-registrations-extract",
-    "Outpatients" ~ "Outpatients-episode-level-extract"
+    "ae" ~ "a&e-episode-level-extract",
+    "ae_cup" ~ "a&e-ucd-cup-extract",
+    "acute" ~ "acute-episode-level-extract",
+    "cmh" ~ "community-mh-contact-level-extract",
+    "dn" ~ "district-nursing-contact-level-extract",
+    "gp_ooh-c" ~ "gp-ooh-consultations-extract",
+    "gp_ooh-d" ~ "gp-ooh-diagnosis-extract",
+    "gp_ooh-o" ~ "gp-ooh-outcomes-extract",
+    "homelessness" ~ "homelessness-extract",
+    "maternity" ~ "maternity-episode-level-extract",
+    "mh" ~ "mental-health-episode-level-extract",
+    "deaths" ~ "nrs-death-registrations-extract",
+    "outpatients" ~ "outpatients-episode-level-extract"
   )
 
   boxi_extract_path_csv_gz <- fs::path(
diff --git a/R/get_existing_data_for_tests.R b/R/get_existing_data_for_tests.R
index 9e7d06dcd..ae3c07e16 100644
--- a/R/get_existing_data_for_tests.R
+++ b/R/get_existing_data_for_tests.R
@@ -51,6 +51,9 @@ get_existing_data_for_tests <- function(new_data, file_version = "episode", anon
       recids = recids,
       col_select = variable_names
     ))
+    if ("hscp2018" %in% variable_names) {
+      slf_data <- dplyr::rename(slf_data, "hscp" = "hscp2018")
+    }
   } else {
     slf_data <- suppressMessages(slfhelper::read_slf_individual(
       year = year,
diff --git a/R/get_final_file_paths.R b/R/get_final_file_paths.R
new file mode 100644
index 000000000..f47250621
--- /dev/null
+++ b/R/get_final_file_paths.R
@@ -0,0 +1,34 @@
+#' Get the slf episode file path
+#'
+#' @param year Financial year
+#' @param ... additional arguments passed to [get_file_path()]
+#'
+#' @return Path to the final episode file.
+#' @export
+#'
+get_slf_episode_path <- function(year, ...) {
+  slf_episode_path <- get_file_path(
+    directory = get_year_dir(year),
+    file_name = stringr::str_glue("source-episode-file-{year}.parquet"),
+    ...
+  )
+
+  return(slf_episode_path)
+}
+
+#' Get the SLF individual file path
+#'
+#' @param year Financial year
+#' @param ... additional arguments passed to [get_file_path()]
+#'
+#' @return Path to the final individual file
+#' @export
+#'
+get_slf_individual_path <- function(year, ...) {
+  slf_indiv_path <- get_file_path(
+    directory = get_year_dir(year),
+    file_name = stringr::str_glue("source-individual-file-{year}.parquet"),
+    ...
+  )
+  return(slf_indiv_path)
+}
diff --git a/R/la_code_lookup.R b/R/get_la_code_opendata_lookup.R
similarity index 84%
rename from R/la_code_lookup.R
rename to R/get_la_code_opendata_lookup.R
index 09f0a9f1a..1b1e38e90 100644
--- a/R/la_code_lookup.R
+++ b/R/get_la_code_opendata_lookup.R
@@ -1,14 +1,13 @@
 #' Download the LA code lookup
 #'
-#' @inheritParams phsopendata::get_resource
-#'
 #' @description Download and process the Local Authority lookup from the Open
 #' Data platform
 #'
 #' @return a [tibble][tibble::tibble-package] with the Local Authority names
 #' and codes.
 #' @export
-la_code_lookup <- function(res_id = "967937c4-8d67-4f39-974f-fd58c4acfda5") {
+get_la_code_opendata_lookup <- function() {
+  res_id <- "967937c4-8d67-4f39-974f-fd58c4acfda5"
   la_code_lookup <- phsopendata::get_resource(
     res_id = res_id,
     col_select = c("CA", "CAName")
diff --git a/R/get_source_extract_path.R b/R/get_source_extract_path.R
index cbd3fd46e..6be47d61a 100644
--- a/R/get_source_extract_path.R
+++ b/R/get_source_extract_path.R
@@ -10,27 +10,27 @@
 #' @export
 #'
 #' @family extract file paths
-get_source_extract_path <- function(
-    year,
-    type = c(
-      "Acute",
-      "AE",
-      "AT",
-      "CH",
-      "CMH",
-      "DD",
-      "Deaths",
-      "DN",
-      "GPOoH",
-      "HC",
-      "Homelessness",
-      "Maternity",
-      "MH",
-      "Outpatients",
-      "PIS",
-      "SDS"
-    ),
-    ...) {
+get_source_extract_path <- function(year,
+                                    type = c(
+                                      "acute",
+                                      "ae",
+                                      "at",
+                                      "ch",
+                                      "client",
+                                      "cmh",
+                                      "dd",
+                                      "deaths",
+                                      "dn",
+                                      "gp_ooh",
+                                      "hc",
+                                      "homelessness",
+                                      "maternity",
+                                      "mh",
+                                      "outpatients",
+                                      "pis",
+                                      "sds"
+                                    ),
+                                    ...) {
   if (year %in% type) {
     cli::cli_abort("{.val {year}} was supplied to the {.arg year} argument.")
   }
@@ -45,22 +45,24 @@ get_source_extract_path <- function(
 
   file_name <- dplyr::case_match(
     type,
-    "Acute" ~ "acute_for_source",
-    "AE" ~ "a_and_e_for_source",
-    "AT" ~ "alarms-telecare-for-source",
-    "CH" ~ "care_home_for_source",
-    "CMH" ~ "cmh_for_source",
-    "DD" ~ "delayed_discharge_for_source",
-    "Deaths" ~ "deaths_for_source",
-    "DN" ~ "district_nursing_for_source",
-    "GPOoH" ~ "gp_ooh_for_source",
-    "HC" ~ "home_care_for_source",
-    "Homelessness" ~ "homelessness_for_source",
-    "Maternity" ~ "maternity_for_source",
-    "MH" ~ "mental_health_for_source",
-    "Outpatients" ~ "outpatients_for_source",
-    "PIS" ~ "prescribing_for_source",
-    "SDS" ~ "sds_for_source"
+    "acute" ~ "acute_for_source",
+    "ae" ~ "a_and_e_for_source",
+    "at" ~ "alarms-telecare-for-source",
+    "ch" ~ "care_home_for_source",
+    "cmh" ~ "cmh_for_source",
+    "client" ~ "client_for_source",
+    "dd" ~ "delayed_discharge_for_source",
+    "deaths" ~ "deaths_for_source",
+    "dn" ~ "district_nursing_for_source",
+    "gp_ooh" ~ "gp_ooh_for_source",
+    "hc" ~ "home_care_for_source",
+    "homelessness" ~ "homelessness_for_source",
+    "maternity" ~ "maternity_for_source",
+    "mh" ~ "mental_health_for_source",
+    "dd" ~ "dd_for_source",
+    "outpatients" ~ "outpatients_for_source",
+    "pis" ~ "prescribing_file_for_source",
+    "sds" ~ "sds-for-source"
   ) %>%
     stringr::str_glue("-{year}.parquet")
 
diff --git a/R/link_delayed_discharge_eps.R b/R/link_delayed_discharge_eps.R
index fd9b2ea60..b4c3b2f5b 100644
--- a/R/link_delayed_discharge_eps.R
+++ b/R/link_delayed_discharge_eps.R
@@ -7,11 +7,11 @@
 #' @return A data frame with the delayed discharge cohort added and linked
 #' using the `cij_marker`
 #'
-#' @family episode file
+#' @family episode_file
 link_delayed_discharge_eps <- function(
     episode_file,
     year,
-    dd_data = read_file(get_source_extract_path(year, "DD"))) {
+    dd_data = read_file(get_source_extract_path(year, "dd"))) {
   episode_file <- episode_file %>%
     dplyr::mutate(
       # remember to revoke the cij_end_date with dummy_cij_end
diff --git a/R/process_extract_acute.R b/R/process_extract_acute.R
index 70ff29370..c327f4b66 100644
--- a/R/process_extract_acute.R
+++ b/R/process_extract_acute.R
@@ -113,7 +113,7 @@ process_extract_acute <- function(data, year, write_to_disk = TRUE) {
   if (write_to_disk) {
     write_file(
       acute_processed,
-      get_source_extract_path(year, "Acute", check_mode = "write")
+      get_source_extract_path(year, "acute", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_ae.R b/R/process_extract_ae.R
index 95dfd99be..785797395 100644
--- a/R/process_extract_ae.R
+++ b/R/process_extract_ae.R
@@ -192,13 +192,13 @@ process_extract_ae <- function(data, year, write_to_disk = TRUE) {
   # Read in data---------------------------------------
 
   ae_cup_file <- read_file(
-    path = get_boxi_extract_path(year, "AE_CUP"),
-    col_type = cols(
-      "ED Arrival Date" = col_date(format = "%Y/%m/%d %T"),
-      "ED Arrival Time" = col_time(""),
-      "ED Case Reference Number [C]" = col_character(),
-      "CUP Marker" = col_double(),
-      "CUP Pathway Name" = col_character()
+    path = get_boxi_extract_path(year, "ae_cup"),
+    col_type = readr::cols(
+      "ED Arrival Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "ED Arrival Time" = readr::col_time(""),
+      "ED Case Reference Number [C]" = readr::col_character(),
+      "CUP Marker" = readr::col_double(),
+      "CUP Pathway Name" = readr::col_character()
     )
   ) %>%
     # rename variables
@@ -294,7 +294,7 @@ process_extract_ae <- function(data, year, write_to_disk = TRUE) {
   if (write_to_disk) {
     write_file(
       ae_processed,
-      get_source_extract_path(year, "AE", check_mode = "write")
+      get_source_extract_path(year, "ae", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_alarms_telecare.R b/R/process_extract_alarms_telecare.R
index 9a0745a04..0ef686881 100644
--- a/R/process_extract_alarms_telecare.R
+++ b/R/process_extract_alarms_telecare.R
@@ -64,7 +64,7 @@ process_extract_alarms_telecare <- function(
   if (write_to_disk) {
     at_data %>%
       write_file(
-        get_source_extract_path(year, type = "AT", check_mode = "write")
+        get_source_extract_path(year, type = "at", check_mode = "write")
       )
   }
 
diff --git a/R/process_extract_care_home.R b/R/process_extract_care_home.R
index cbf6d417c..f6b3bca15 100644
--- a/R/process_extract_care_home.R
+++ b/R/process_extract_care_home.R
@@ -62,7 +62,7 @@ process_extract_care_home <- function(
     ) %>%
     # compute lca variable from sending_location
     dplyr::mutate(
-      sc_send_lca = convert_sending_location_to_lca(.data$sending_location)
+      sc_send_lca = convert_sc_sending_location_to_lca(.data$sending_location)
     ) %>%
     # bed days
     create_monthly_beddays(year,
@@ -143,7 +143,7 @@ process_extract_care_home <- function(
   if (write_to_disk) {
     write_file(
       ch_processed,
-      get_source_extract_path(year, type = "CH", check_mode = "write")
+      get_source_extract_path(year, type = "ch", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_cmh.R b/R/process_extract_cmh.R
index a2adad75e..bbce59f0f 100644
--- a/R/process_extract_cmh.R
+++ b/R/process_extract_cmh.R
@@ -73,7 +73,7 @@ process_extract_cmh <- function(data,
   if (write_to_disk) {
     write_file(
       cmh_processed,
-      get_source_extract_path(year, "CMH", check_mode = "write")
+      get_source_extract_path(year, "cmh", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_delayed_discharges.R b/R/process_extract_delayed_discharges.R
index 3c56807f9..c16748a2d 100644
--- a/R/process_extract_delayed_discharges.R
+++ b/R/process_extract_delayed_discharges.R
@@ -110,7 +110,7 @@ process_extract_delayed_discharges <- function(
   if (write_to_disk) {
     write_file(
       dd_final,
-      get_source_extract_path(year, "DD", check_mode = "write")
+      get_source_extract_path(year, "dd", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_district_nursing.R b/R/process_extract_district_nursing.R
index 9d1df62a6..02f23719f 100644
--- a/R/process_extract_district_nursing.R
+++ b/R/process_extract_district_nursing.R
@@ -135,7 +135,7 @@ process_extract_district_nursing <- function(
 
   if (write_to_disk) {
     dn_episodes %>%
-      write_file(get_source_extract_path(year, "DN", check_mode = "write"))
+      write_file(get_source_extract_path(year, "dn", check_mode = "write"))
   }
 
   return(dn_episodes)
diff --git a/R/process_extract_gp_ooh.R b/R/process_extract_gp_ooh.R
index 2b536878a..3503888b6 100644
--- a/R/process_extract_gp_ooh.R
+++ b/R/process_extract_gp_ooh.R
@@ -127,7 +127,7 @@ process_extract_gp_ooh <- function(year, data_list, write_to_disk = TRUE) {
 
   if (write_to_disk) {
     final_data %>%
-      write_file(get_source_extract_path(year, "GPOoH", check_mode = "write"))
+      write_file(get_source_extract_path(year, "gp_ooh", check_mode = "write"))
   }
 
   return(final_data)
diff --git a/R/process_extract_home_care.R b/R/process_extract_home_care.R
index 874ad899c..857f3006f 100644
--- a/R/process_extract_home_care.R
+++ b/R/process_extract_home_care.R
@@ -104,7 +104,7 @@ process_extract_home_care <- function(
   if (write_to_disk) {
     write_file(
       hc_processed,
-      get_source_extract_path(year, type = "HC", check_mode = "write")
+      get_source_extract_path(year, type = "hc", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_homelessness.R b/R/process_extract_homelessness.R
index f4fb7d3e5..c1afff837 100644
--- a/R/process_extract_homelessness.R
+++ b/R/process_extract_homelessness.R
@@ -20,6 +20,7 @@ process_extract_homelessness <- function(
     year,
     write_to_disk = TRUE,
     update = latest_update(),
+    la_code_lookup = get_la_code_opendata_lookup(),
     sg_pub_path = get_sg_homelessness_pub_path()) {
   # Only run for a single year
   stopifnot(length(year) == 1L)
@@ -100,7 +101,7 @@ process_extract_homelessness <- function(
       )
     ) %>%
     dplyr::left_join(
-      la_code_lookup(),
+      la_code_lookup,
       by = dplyr::join_by("sending_local_authority_code_9" == "CA")
     ) %>%
     # Filter out duplicates
diff --git a/R/process_extract_maternity.R b/R/process_extract_maternity.R
index 64fa4e205..7bb016243 100644
--- a/R/process_extract_maternity.R
+++ b/R/process_extract_maternity.R
@@ -112,7 +112,7 @@ process_extract_maternity <- function(data, year, write_to_disk = TRUE) {
   if (write_to_disk) {
     write_file(
       maternity_processed,
-      get_source_extract_path(year, "Maternity", check_mode = "write")
+      get_source_extract_path(year, "maternity", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_mental_health.R b/R/process_extract_mental_health.R
index ffea63d28..b8d89377d 100644
--- a/R/process_extract_mental_health.R
+++ b/R/process_extract_mental_health.R
@@ -117,7 +117,7 @@ process_extract_mental_health <- function(data, year, write_to_disk = TRUE) {
   if (write_to_disk) {
     write_file(
       mh_processed,
-      get_source_extract_path(year, "MH", check_mode = "write")
+      get_source_extract_path(year, "mh", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_nrs_deaths.R b/R/process_extract_nrs_deaths.R
index ecb10d2e0..71e19d456 100644
--- a/R/process_extract_nrs_deaths.R
+++ b/R/process_extract_nrs_deaths.R
@@ -27,7 +27,7 @@ process_extract_nrs_deaths <- function(data, year, write_to_disk = TRUE) {
 
   if (write_to_disk) {
     deaths_clean %>%
-      write_file(get_source_extract_path(year, "Deaths", check_mode = "write"))
+      write_file(get_source_extract_path(year, "deaths", check_mode = "write"))
   }
 
   return(deaths_clean)
diff --git a/R/process_extract_outpatients.R b/R/process_extract_outpatients.R
index 341ee0f1a..86262e6b3 100644
--- a/R/process_extract_outpatients.R
+++ b/R/process_extract_outpatients.R
@@ -87,7 +87,7 @@ process_extract_outpatients <- function(data, year, write_to_disk = TRUE) {
   if (write_to_disk) {
     write_file(
       outpatients_processed,
-      get_source_extract_path(year, "Outpatients", check_mode = "write")
+      get_source_extract_path(year, "outpatients", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_prescribing.R b/R/process_extract_prescribing.R
index 68c388b83..c54a55b65 100644
--- a/R/process_extract_prescribing.R
+++ b/R/process_extract_prescribing.R
@@ -52,7 +52,7 @@ process_extract_prescribing <- function(data, year, write_to_disk = TRUE) {
   if (write_to_disk) {
     write_file(
       pis_clean,
-      get_source_extract_path(year, "PIS", check_mode = "write")
+      get_source_extract_path(year, "pis", check_mode = "write")
     )
   }
 
diff --git a/R/process_extract_sds.R b/R/process_extract_sds.R
index bd9e93a3f..d8c43507c 100644
--- a/R/process_extract_sds.R
+++ b/R/process_extract_sds.R
@@ -58,7 +58,7 @@ process_extract_sds <- function(
 
   if (write_to_disk) {
     outfile %>%
-      write_file(get_source_extract_path(year, type = "SDS", check_mode = "write"))
+      write_file(get_source_extract_path(year, type = "sds", check_mode = "write"))
   }
 
   return(outfile)
diff --git a/R/process_lookup_deaths.R b/R/process_lookup_deaths.R
index 6485d4e7f..1150059a7 100644
--- a/R/process_lookup_deaths.R
+++ b/R/process_lookup_deaths.R
@@ -16,7 +16,7 @@
 process_slf_deaths_lookup <- function(
     year,
     nrs_deaths_data = read_file(
-      get_source_extract_path(year, "Deaths"),
+      get_source_extract_path(year, "deaths"),
       col_select = c("chi", "record_keydate1")
     ),
     chi_deaths_data = read_file(get_slf_chi_deaths_path()),
diff --git a/R/process_sc_all_alarms_telecare.R b/R/process_sc_all_alarms_telecare.R
index 620b14cee..628bd7165 100644
--- a/R/process_sc_all_alarms_telecare.R
+++ b/R/process_sc_all_alarms_telecare.R
@@ -55,7 +55,7 @@ process_sc_all_alarms_telecare <- function(
       # Create person id variable
       person_id = stringr::str_glue("{sending_location}-{social_care_id}"),
       # Use function for creating sc send lca variables
-      sc_send_lca = convert_sending_location_to_lca(.data$sending_location)
+      sc_send_lca = convert_sc_sending_location_to_lca(.data$sending_location)
     ) %>%
     # when multiple social_care_id from sending_location for single CHI
     # replace social_care_id with latest
diff --git a/R/process_sc_all_home_care.R b/R/process_sc_all_home_care.R
index 5f2b4db49..2a990a386 100644
--- a/R/process_sc_all_home_care.R
+++ b/R/process_sc_all_home_care.R
@@ -199,7 +199,7 @@ process_sc_all_home_care <- function(
     create_person_id(type = "SC") %>%
     # compute lca variable from sending_location
     dplyr::mutate(
-      sc_send_lca = convert_sending_location_to_lca(.data$sending_location)
+      sc_send_lca = convert_sc_sending_location_to_lca(.data$sending_location)
     )
 
   if (write_to_disk) {
diff --git a/R/process_sc_all_sds.R b/R/process_sc_all_sds.R
index c17f74f28..09ce430b8 100644
--- a/R/process_sc_all_sds.R
+++ b/R/process_sc_all_sds.R
@@ -80,7 +80,7 @@ process_sc_all_sds <- function(
       # Create person id variable
       person_id = stringr::str_glue("{sending_location}-{social_care_id}"),
       # Use function for creating sc send lca variables
-      sc_send_lca = convert_sending_location_to_lca(.data$sending_location)
+      sc_send_lca = convert_sc_sending_location_to_lca(.data$sending_location)
     ) %>%
     # when multiple social_care_id from sending_location for single CHI
     # replace social_care_id with latest
diff --git a/R/process_tests_episode_file.R b/R/process_tests_episode_file.R
index fc31727ed..bb04cdfc7 100644
--- a/R/process_tests_episode_file.R
+++ b/R/process_tests_episode_file.R
@@ -84,6 +84,7 @@ produce_episode_file_tests <- function(
     ) %>%
     create_hb_test_flags(.data$hbtreatcode) %>%
     create_hb_cost_test_flags(.data$hbtreatcode, .data$cost_total_net) %>%
+    create_hscp_test_flags(.data$hscp2018) %>%
     # Flags to count stay types
     dplyr::mutate(
       cij_elective = dplyr::if_else(
diff --git a/R/read_extract_acute.R b/R/read_extract_acute.R
index 6a0d23b11..7a227db73 100644
--- a/R/read_extract_acute.R
+++ b/R/read_extract_acute.R
@@ -6,78 +6,78 @@
 #' @return a [tibble][tibble::tibble-package].
 #'
 #' @export
-read_extract_acute <- function(year, file_path = get_boxi_extract_path(year = year, type = "Acute")) {
+read_extract_acute <- function(year, file_path = get_boxi_extract_path(year = year, type = "acute")) {
   # Read BOXI extract
   extract_acute <- read_file(file_path,
-    col_type = cols(
-      "Costs Financial Year (01)" = col_integer(),
-      "Costs Financial Month Number (01)" = col_double(),
-      "GLS Record" = col_character(),
-      "Date of Admission(01)" = col_date(format = "%Y/%m/%d %T"),
-      "Date of Discharge(01)" = col_date(format = "%Y/%m/%d %T"),
-      "Pat UPI" = col_character(),
-      "Pat Gender Code" = col_double(),
-      "Pat Date Of Birth [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Practice Location Code" = col_character(),
-      "Practice NHS Board Code - current" = col_character(),
-      "Geo Postcode [C]" = col_character(),
-      "NHS Board of Residence Code - current" = col_character(),
-      "Geo Council Area Code" = col_character(),
-      "Geo HSCP of Residence Code - current" = col_character(),
-      "Geo Data Zone 2011" = col_character(),
-      "Treatment Location Code" = col_character(),
-      "Treatment NHS Board Code - current" = col_character(),
-      "Occupied Bed Days (01)" = col_double(),
-      "Inpatient Day Case Identifier Code" = col_character(),
-      "Specialty Classificat. 1/4/97 Code" = col_character(),
-      "Significant Facility Code" = col_character(),
-      "Lead Consultant/HCP Code" = col_character(),
-      "Management of Patient Code" = col_character(),
-      "Patient Category Code" = col_character(),
-      "Admission Type Code" = col_character(),
-      "Admitted Trans From Code" = col_character(),
-      "Location Admitted Trans From Code" = col_character(),
-      "Old SMR1 Type of Admission Code" = col_integer(),
-      "Discharge Type Code" = col_character(),
-      "Discharge Trans To Code" = col_character(),
-      "Location Discharged Trans To Code" = col_character(),
-      "Diagnosis 1 Code (6 char)" = col_character(),
-      "Diagnosis 2 Code (6 char)" = col_character(),
-      "Diagnosis 3 Code (6 char)" = col_character(),
-      "Diagnosis 4 Code (6 char)" = col_character(),
-      "Diagnosis 5 Code (6 char)" = col_character(),
-      "Diagnosis 6 Code (6 char)" = col_character(),
-      "Operation 1A Code (4 char)" = col_character(),
-      "Operation 1B Code (4 char)" = col_character(),
-      "Date of Operation 1 (01)" = col_date(format = "%Y/%m/%d %T"),
-      "Operation 2A Code (4 char)" = col_character(),
-      "Operation 2B Code (4 char)" = col_character(),
-      "Date of Operation 2 (01)" = col_date(format = "%Y/%m/%d %T"),
-      "Operation 3A Code (4 char)" = col_character(),
-      "Operation 3B Code (4 char)" = col_character(),
-      "Date of Operation 3 (01)" = col_date(format = "%Y/%m/%d %T"),
-      "Operation 4A Code (4 char)" = col_character(),
-      "Operation 4B Code (4 char)" = col_character(),
-      "Date of Operation 4 (01)" = col_date(format = "%Y/%m/%d %T"),
-      "Age at Midpoint of Financial Year (01)" = col_integer(),
-      "Continuous Inpatient Stay(SMR01) (inc GLS)" = col_integer(),
-      "Continuous Inpatient Journey Marker (01)" = col_integer(),
-      "CIJ Planned Admission Code (01)" = col_integer(),
-      "CIJ Inpatient Day Case Identifier Code (01)" = col_character(),
-      "CIJ Type of Admission Code (01)" = col_character(),
-      "CIJ Admission Specialty Code (01)" = col_character(),
-      "CIJ Discharge Specialty Code (01)" = col_character(),
-      "CIJ Start Date (01)" = col_date(format = "%Y/%m/%d %T"),
-      "CIJ End Date (01)" = col_date(format = "%Y/%m/%d %T"),
-      "Total Net Costs (01)" = col_double(),
-      "NHS Hospital Flag (01)" = col_character(),
-      "Community Hospital Flag (01)" = col_character(),
-      "Alcohol Related Admission (01)" = col_character(),
-      "Substance Misuse Related Admission (01)" = col_character(),
-      "Falls Related Admission (01)" = col_character(),
-      "Self Harm Related Admission (01)" = col_character(),
-      "Unique Record Identifier" = col_character(),
-      "Line Number (01)" = col_character()
+    col_type = readr::cols(
+      "Costs Financial Year (01)" = readr::col_integer(),
+      "Costs Financial Month Number (01)" = readr::col_double(),
+      "GLS Record" = readr::col_character(),
+      "Date of Admission(01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Date of Discharge(01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Pat UPI" = readr::col_character(),
+      "Pat Gender Code" = readr::col_double(),
+      "Pat Date Of Birth [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Practice Location Code" = readr::col_character(),
+      "Practice NHS Board Code - current" = readr::col_character(),
+      "Geo Postcode [C]" = readr::col_character(),
+      "NHS Board of Residence Code - current" = readr::col_character(),
+      "Geo Council Area Code" = readr::col_character(),
+      "Geo HSCP of Residence Code - current" = readr::col_character(),
+      "Geo Data Zone 2011" = readr::col_character(),
+      "Treatment Location Code" = readr::col_character(),
+      "Treatment NHS Board Code - current" = readr::col_character(),
+      "Occupied Bed Days (01)" = readr::col_double(),
+      "Inpatient Day Case Identifier Code" = readr::col_character(),
+      "Specialty Classificat. 1/4/97 Code" = readr::col_character(),
+      "Significant Facility Code" = readr::col_character(),
+      "Lead Consultant/HCP Code" = readr::col_character(),
+      "Management of Patient Code" = readr::col_character(),
+      "Patient Category Code" = readr::col_character(),
+      "Admission Type Code" = readr::col_character(),
+      "Admitted Trans From Code" = readr::col_character(),
+      "Location Admitted Trans From Code" = readr::col_character(),
+      "Old SMR1 Type of Admission Code" = readr::col_integer(),
+      "Discharge Type Code" = readr::col_character(),
+      "Discharge Trans To Code" = readr::col_character(),
+      "Location Discharged Trans To Code" = readr::col_character(),
+      "Diagnosis 1 Code (6 char)" = readr::col_character(),
+      "Diagnosis 2 Code (6 char)" = readr::col_character(),
+      "Diagnosis 3 Code (6 char)" = readr::col_character(),
+      "Diagnosis 4 Code (6 char)" = readr::col_character(),
+      "Diagnosis 5 Code (6 char)" = readr::col_character(),
+      "Diagnosis 6 Code (6 char)" = readr::col_character(),
+      "Operation 1A Code (4 char)" = readr::col_character(),
+      "Operation 1B Code (4 char)" = readr::col_character(),
+      "Date of Operation 1 (01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Operation 2A Code (4 char)" = readr::col_character(),
+      "Operation 2B Code (4 char)" = readr::col_character(),
+      "Date of Operation 2 (01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Operation 3A Code (4 char)" = readr::col_character(),
+      "Operation 3B Code (4 char)" = readr::col_character(),
+      "Date of Operation 3 (01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Operation 4A Code (4 char)" = readr::col_character(),
+      "Operation 4B Code (4 char)" = readr::col_character(),
+      "Date of Operation 4 (01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Age at Midpoint of Financial Year (01)" = readr::col_integer(),
+      "Continuous Inpatient Stay(SMR01) (inc GLS)" = readr::col_integer(),
+      "Continuous Inpatient Journey Marker (01)" = readr::col_integer(),
+      "CIJ Planned Admission Code (01)" = readr::col_integer(),
+      "CIJ Inpatient Day Case Identifier Code (01)" = readr::col_character(),
+      "CIJ Type of Admission Code (01)" = readr::col_character(),
+      "CIJ Admission Specialty Code (01)" = readr::col_character(),
+      "CIJ Discharge Specialty Code (01)" = readr::col_character(),
+      "CIJ Start Date (01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "CIJ End Date (01)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Total Net Costs (01)" = readr::col_double(),
+      "NHS Hospital Flag (01)" = readr::col_character(),
+      "Community Hospital Flag (01)" = readr::col_character(),
+      "Alcohol Related Admission (01)" = readr::col_character(),
+      "Substance Misuse Related Admission (01)" = readr::col_character(),
+      "Falls Related Admission (01)" = readr::col_character(),
+      "Self Harm Related Admission (01)" = readr::col_character(),
+      "Unique Record Identifier" = readr::col_character(),
+      "Line Number (01)" = readr::col_character()
     )
   ) %>%
     # Rename variables
diff --git a/R/read_extract_ae.R b/R/read_extract_ae.R
index 6cddd1cb6..e426a167c 100644
--- a/R/read_extract_ae.R
+++ b/R/read_extract_ae.R
@@ -6,44 +6,44 @@
 #'
 read_extract_ae <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "AE")) {
+    file_path = get_boxi_extract_path(year = year, type = "ae")) {
   extract_ae <- read_file(file_path,
-    col_type = cols(
-      "Arrival Date" = col_date(format = "%Y/%m/%d %T"),
-      "DAT Date" = col_date(format = "%Y/%m/%d %T"),
-      "Pat UPI [C]" = col_character(),
-      "Pat Date Of Birth [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Pat Gender Code" = col_double(),
-      "NHS Board of Residence Code - current" = col_character(),
-      "Treatment NHS Board Code - current" = col_character(),
-      "Treatment Location Code" = col_character(),
-      "GP Practice Code" = col_character(),
-      "Council Area Code" = col_character(),
-      "Postcode (epi) [C]" = col_character(),
-      "Postcode (CHI) [C]" = col_character(),
-      "HSCP of Residence Code - current" = col_character(),
-      "Arrival Time" = col_time(""),
-      "DAT Time" = col_time(""),
-      "Arrival Mode Code" = col_character(),
-      "Referral Source Code" = col_character(),
-      "Attendance Category Code" = col_character(),
-      "Discharge Destination Code" = col_character(),
-      "Patient Flow Code" = col_double(),
-      "Place of Incident Code" = col_character(),
-      "Reason for Wait Code" = col_character(),
-      "Disease 1 Code" = col_character(),
-      "Disease 2 Code" = col_character(),
-      "Disease 3 Code" = col_character(),
-      "Bodily Location Of Injury Code" = col_character(),
-      "Alcohol Involved Code" = col_character(),
-      "Alcohol Related Admission" = col_character(),
-      "Substance Misuse Related Admission" = col_character(),
-      "Falls Related Admission" = col_character(),
-      "Self Harm Related Admission" = col_character(),
-      "Total Net Costs" = col_double(),
-      "Age at Midpoint of Financial Year" = col_double(),
-      "Case Reference Number" = col_character(),
-      "Significant Facility Code" = col_character()
+    col_type = readr::cols(
+      "Arrival Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "DAT Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Pat UPI [C]" = readr::col_character(),
+      "Pat Date Of Birth [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Pat Gender Code" = readr::col_double(),
+      "NHS Board of Residence Code - current" = readr::col_character(),
+      "Treatment NHS Board Code - current" = readr::col_character(),
+      "Treatment Location Code" = readr::col_character(),
+      "GP Practice Code" = readr::col_character(),
+      "Council Area Code" = readr::col_character(),
+      "Postcode (epi) [C]" = readr::col_character(),
+      "Postcode (CHI) [C]" = readr::col_character(),
+      "HSCP of Residence Code - current" = readr::col_character(),
+      "Arrival Time" = readr::col_time(""),
+      "DAT Time" = readr::col_time(""),
+      "Arrival Mode Code" = readr::col_character(),
+      "Referral Source Code" = readr::col_character(),
+      "Attendance Category Code" = readr::col_character(),
+      "Discharge Destination Code" = readr::col_character(),
+      "Patient Flow Code" = readr::col_double(),
+      "Place of Incident Code" = readr::col_character(),
+      "Reason for Wait Code" = readr::col_character(),
+      "Disease 1 Code" = readr::col_character(),
+      "Disease 2 Code" = readr::col_character(),
+      "Disease 3 Code" = readr::col_character(),
+      "Bodily Location Of Injury Code" = readr::col_character(),
+      "Alcohol Involved Code" = readr::col_character(),
+      "Alcohol Related Admission" = readr::col_character(),
+      "Substance Misuse Related Admission" = readr::col_character(),
+      "Falls Related Admission" = readr::col_character(),
+      "Self Harm Related Admission" = readr::col_character(),
+      "Total Net Costs" = readr::col_double(),
+      "Age at Midpoint of Financial Year" = readr::col_double(),
+      "Case Reference Number" = readr::col_character(),
+      "Significant Facility Code" = readr::col_character()
     )
   ) %>%
     # rename variables
diff --git a/R/read_extract_cmh.R b/R/read_extract_cmh.R
index 16151bd43..0beb4ea4a 100644
--- a/R/read_extract_cmh.R
+++ b/R/read_extract_cmh.R
@@ -5,7 +5,7 @@
 #' @export
 read_extract_cmh <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "CMH")) {
+    file_path = get_boxi_extract_path(year = year, type = "cmh")) {
   # Specify years available for running
   if (file_path == get_dummy_boxi_extract_path()) {
     return(tibble::tibble())
@@ -13,24 +13,24 @@ read_extract_cmh <- function(
 
   # Read BOXI extract
   extract_cmh <- read_file(file_path,
-    col_types = cols_only(
-      "UPI Number [C]" = col_character(),
-      "Patient DoB Date [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Gender" = col_double(),
-      "Patient Postcode [C]" = col_character(),
-      "NHS Board of Residence Code 9" = col_character(),
-      "Patient HSCP Code - current" = col_character(),
-      "Practice Code" = col_integer(),
-      "Treatment NHS Board Code 9" = col_character(),
-      "Contact Date" = col_date(format = "%Y/%m/%d %T"),
-      "Contact Start Time" = col_time(format = "%T"),
-      "Duration of Contact" = col_integer(),
-      "Location of Contact" = col_character(),
-      "Main Aim of Contact" = col_character(),
-      "Other Aim of Contact (1)" = col_character(),
-      "Other Aim of Contact (2)" = col_character(),
-      "Other Aim of Contact (3)" = col_character(),
-      "Other Aim of Contact (4)" = col_character()
+    col_types = readr::cols_only(
+      "UPI Number [C]" = readr::col_character(),
+      "Patient DoB Date [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Gender" = readr::col_double(),
+      "Patient Postcode [C]" = readr::col_character(),
+      "NHS Board of Residence Code 9" = readr::col_character(),
+      "Patient HSCP Code - current" = readr::col_character(),
+      "Practice Code" = readr::col_integer(),
+      "Treatment NHS Board Code 9" = readr::col_character(),
+      "Contact Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Contact Start Time" = readr::col_time(format = "%T"),
+      "Duration of Contact" = readr::col_integer(),
+      "Location of Contact" = readr::col_character(),
+      "Main Aim of Contact" = readr::col_character(),
+      "Other Aim of Contact (1)" = readr::col_character(),
+      "Other Aim of Contact (2)" = readr::col_character(),
+      "Other Aim of Contact (3)" = readr::col_character(),
+      "Other Aim of Contact (4)" = readr::col_character()
     )
   ) %>%
     # rename
diff --git a/R/read_extract_district_nursing.R b/R/read_extract_district_nursing.R
index 607f9b47e..59b1142e5 100644
--- a/R/read_extract_district_nursing.R
+++ b/R/read_extract_district_nursing.R
@@ -5,32 +5,32 @@
 #' @export
 read_extract_district_nursing <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "DN")) {
+    file_path = get_boxi_extract_path(year = year, type = "dn")) {
   if (file_path == get_dummy_boxi_extract_path()) {
     return(tibble::tibble())
   }
 
   # Read BOXI extract
   extract_district_nursing <- read_file(file_path,
-    col_types = cols_only(
-      `Treatment NHS Board Code 9` = col_character(),
-      `Age at Contact Date` = col_integer(),
-      `Contact Date` = col_date(format = "%Y/%m/%d %T"),
-      `Primary Intervention Category` = col_character(),
-      `Other Intervention Category (1)` = col_character(),
-      `Other Intervention Category (2)` = col_character(),
-      `UPI Number [C]` = col_character(),
-      `Patient DoB Date [C]` = col_date(format = "%Y/%m/%d %T"),
-      `Patient Postcode [C] (Contact)` = col_character(),
-      `Duration of Contact (measure)` = col_double(),
-      Gender = col_double(),
-      `Location of Contact` = col_character(),
-      `Practice NHS Board Code 9 (Contact)` = col_character(),
-      `Patient Council Area Code (Contact)` = col_character(),
-      `Practice Code (Contact)` = col_character(),
-      `NHS Board of Residence Code 9 (Contact)` = col_character(),
-      `HSCP of Residence Code (Contact)` = col_character(),
-      `Patient Data Zone 2011 (Contact)` = col_character()
+    col_types = readr::cols_only(
+      `Treatment NHS Board Code 9` = readr::col_character(),
+      `Age at Contact Date` = readr::col_integer(),
+      `Contact Date` = readr::col_date(format = "%Y/%m/%d %T"),
+      `Primary Intervention Category` = readr::col_character(),
+      `Other Intervention Category (1)` = readr::col_character(),
+      `Other Intervention Category (2)` = readr::col_character(),
+      `UPI Number [C]` = readr::col_character(),
+      `Patient DoB Date [C]` = readr::col_date(format = "%Y/%m/%d %T"),
+      `Patient Postcode [C] (Contact)` = readr::col_character(),
+      `Duration of Contact (measure)` = readr::col_double(),
+      Gender = readr::col_double(),
+      `Location of Contact` = readr::col_character(),
+      `Practice NHS Board Code 9 (Contact)` = readr::col_character(),
+      `Patient Council Area Code (Contact)` = readr::col_character(),
+      `Practice Code (Contact)` = readr::col_character(),
+      `NHS Board of Residence Code 9 (Contact)` = readr::col_character(),
+      `HSCP of Residence Code (Contact)` = readr::col_character(),
+      `Patient Data Zone 2011 (Contact)` = readr::col_character()
     )
   ) %>%
     # rename
diff --git a/R/read_extract_gp_ooh.R b/R/read_extract_gp_ooh.R
index 3a711c2f8..ca7d32b51 100644
--- a/R/read_extract_gp_ooh.R
+++ b/R/read_extract_gp_ooh.R
@@ -13,9 +13,9 @@
 #' @export
 #' @family process extracts
 read_extract_gp_ooh <- function(year,
-                                diagnosis_path = get_boxi_extract_path(year = year, type = "GP_OoH-d"),
-                                outcomes_path = get_boxi_extract_path(year = year, type = "GP_OoH-o"),
-                                consultations_path = get_boxi_extract_path(year = year, type = "GP_OoH-c")) {
+                                diagnosis_path = get_boxi_extract_path(year = year, type = "gp_ooh-d"),
+                                outcomes_path = get_boxi_extract_path(year = year, type = "gp_ooh-o"),
+                                consultations_path = get_boxi_extract_path(year = year, type = "gp_ooh-c")) {
   ooh_extracts <- list(
     "diagnosis" = read_extract_ooh_diagnosis(year, diagnosis_path),
     "outcomes" = read_extract_ooh_outcomes(year, outcomes_path),
diff --git a/R/read_extract_homelessness.R b/R/read_extract_homelessness.R
index 32b7d6e86..58888c5b8 100644
--- a/R/read_extract_homelessness.R
+++ b/R/read_extract_homelessness.R
@@ -5,7 +5,7 @@
 #' @export
 read_extract_homelessness <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "Homelessness")) {
+    file_path = get_boxi_extract_path(year = year, type = "homelessness")) {
   # Specify years available for running
   if (file_path == get_dummy_boxi_extract_path()) {
     return(tibble::tibble())
@@ -13,29 +13,29 @@ read_extract_homelessness <- function(
 
   extract_homelessness <- read_file(file_path,
     col_types = cols(
-      "Assessment Decision Date" = col_date(format = "%Y/%m/%d %T"),
-      "Case Closed Date" = col_date(format = "%Y/%m/%d %T"),
-      "Sending Local Authority Code 9" = col_character(),
-      "Client Unique Identifier" = col_character(),
-      "UPI Number [C]" = col_character(),
-      "Client DoB Date [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Age at Assessment Decision Date" = col_integer(),
-      "Gender Code" = col_integer(),
-      "Client Postcode [C]" = col_character(),
-      "Main Applicant Flag" = col_character(),
-      "Application Reference Number" = col_character(),
-      "Property Type Code" = col_integer(),
-      "Financial Difficulties / Debt / Unemployment" = col_integer(),
-      "Physical Health Reasons" = col_integer(),
-      "Mental Health Reasons" = col_integer(),
-      "Unmet Need for Support from Housing / Social Work / Health Services" = col_integer(),
-      "Lack of Support from Friends / Family" = col_integer(),
-      "Difficulties Managing on Own" = col_integer(),
-      "Drug / Alcohol Dependency" = col_integer(),
-      "Criminal / Anti-Social Behaviour" = col_integer(),
-      "Not to do with Applicant Household" = col_integer(),
-      "Refused" = col_integer(),
-      "Person in Receipt of Universal Credit" = col_integer()
+      "Assessment Decision Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Case Closed Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Sending Local Authority Code 9" = readr::col_character(),
+      "Client Unique Identifier" = readr::col_character(),
+      "UPI Number [C]" = readr::col_character(),
+      "Client DoB Date [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Age at Assessment Decision Date" = readr::col_integer(),
+      "Gender Code" = readr::col_integer(),
+      "Client Postcode [C]" = readr::col_character(),
+      "Main Applicant Flag" = readr::col_character(),
+      "Application Reference Number" = readr::col_character(),
+      "Property Type Code" = readr::col_integer(),
+      "Financial Difficulties / Debt / Unemployment" = readr::col_integer(),
+      "Physical Health Reasons" = readr::col_integer(),
+      "Mental Health Reasons" = readr::col_integer(),
+      "Unmet Need for Support from Housing / Social Work / Health Services" = readr::col_integer(),
+      "Lack of Support from Friends / Family" = readr::col_integer(),
+      "Difficulties Managing on Own" = readr::col_integer(),
+      "Drug / Alcohol Dependency" = readr::col_integer(),
+      "Criminal / Anti-Social Behaviour" = readr::col_integer(),
+      "Not to do with Applicant Household" = readr::col_integer(),
+      "Refused" = readr::col_integer(),
+      "Person in Receipt of Universal Credit" = readr::col_integer()
     )
   ) %>%
     dplyr::rename(
diff --git a/R/read_extract_maternity.R b/R/read_extract_maternity.R
index 49bda2fb5..e03b50e12 100644
--- a/R/read_extract_maternity.R
+++ b/R/read_extract_maternity.R
@@ -5,63 +5,63 @@
 #' @export
 read_extract_maternity <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "Maternity")) {
+    file_path = get_boxi_extract_path(year = year, type = "maternity")) {
   # Read BOXI extract
   extract_maternity <- read_file(file_path,
-    col_type = cols(
-      "Costs Financial Year" = col_double(),
-      "Date of Admission Full Date" = col_date(format = "%Y/%m/%d %T"),
-      "Date of Discharge Full Date" = col_date(format = "%Y/%m/%d %T"),
-      "Pat UPI [C]" = col_character(),
-      "Pat Date Of Birth [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Practice Location Code" = col_character(),
-      "Practice NHS Board Code - current" = col_character(),
-      "Geo Postcode [C]" = col_character(),
-      "NHS Board of Residence Code - current" = col_character(),
-      "HSCP of Residence Code - current" = col_character(),
-      "Geo Council Area Code" = col_character(),
-      "Treatment Location Code" = col_character(),
-      "Treatment NHS Board Code - current" = col_character(),
-      "Occupied Bed Days" = col_double(),
-      "Specialty Classification 1/4/97 Code" = col_character(),
-      "Significant Facility Code" = col_character(),
-      "Consultant/HCP Code" = col_character(),
-      "Management of Patient Code" = col_character(),
-      "Admission Reason Code" = col_character(),
-      "Admitted/Transfer from Code (new)" = col_character(),
-      "Admitted/transfer from - Location Code" = col_character(),
-      "Discharge Type Code" = col_character(),
-      "Discharge/Transfer to Code (new)" = col_character(),
-      "Discharged to - Location Code" = col_character(),
-      "Condition On Discharge Code" = col_double(),
-      "Continuous Inpatient Journey Marker" = col_double(),
-      "CIJ Planned Admission Code" = col_double(),
-      "CIJ Inpatient Day Case Identifier Code" = col_character(),
-      "CIJ Type of Admission Code" = col_character(),
-      "CIJ Admission Specialty Code" = col_character(),
-      "CIJ Discharge Specialty Code" = col_character(),
-      "CIJ Start Date" = col_date(format = "%Y/%m/%d %T"),
-      "CIJ End Date" = col_date(format = "%Y/%m/%d %T"),
-      "Total Net Costs" = col_double(),
-      "Diagnosis 1 Discharge Code" = col_character(),
-      "Diagnosis 2 Discharge Code" = col_character(),
-      "Diagnosis 3 Discharge Code" = col_character(),
-      "Diagnosis 4 Discharge Code" = col_character(),
-      "Diagnosis 5 Discharge Code" = col_character(),
-      "Diagnosis 6 Discharge Code" = col_character(),
-      "Operation 1A Code" = col_character(),
-      "Operation 2A Code" = col_character(),
-      "Operation 3A Code" = col_character(),
-      "Operation 4A Code" = col_character(),
-      "Date of Main Operation Full Date" = col_date(format = "%Y/%m/%d %T"),
-      "Age at Midpoint of Financial Year" = col_double(),
-      "NHS Hospital Flag" = col_character(),
-      "Community Hospital Flag" = col_character(),
-      "Alcohol Related AdmissioN" = col_character(),
-      "Substance Misuse Related Admission" = col_character(),
-      "Falls Related Admission" = col_character(),
-      "Self Harm Related Admission" = col_character(),
-      "Maternity Unique Record Identifier [C]" = col_character()
+    col_type = readr::cols(
+      "Costs Financial Year" = readr::col_double(),
+      "Date of Admission Full Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Date of Discharge Full Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Pat UPI [C]" = readr::col_character(),
+      "Pat Date Of Birth [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Practice Location Code" = readr::col_character(),
+      "Practice NHS Board Code - current" = readr::col_character(),
+      "Geo Postcode [C]" = readr::col_character(),
+      "NHS Board of Residence Code - current" = readr::col_character(),
+      "HSCP of Residence Code - current" = readr::col_character(),
+      "Geo Council Area Code" = readr::col_character(),
+      "Treatment Location Code" = readr::col_character(),
+      "Treatment NHS Board Code - current" = readr::col_character(),
+      "Occupied Bed Days" = readr::col_double(),
+      "Specialty Classification 1/4/97 Code" = readr::col_character(),
+      "Significant Facility Code" = readr::col_character(),
+      "Consultant/HCP Code" = readr::col_character(),
+      "Management of Patient Code" = readr::col_character(),
+      "Admission Reason Code" = readr::col_character(),
+      "Admitted/Transfer from Code (new)" = readr::col_character(),
+      "Admitted/transfer from - Location Code" = readr::col_character(),
+      "Discharge Type Code" = readr::col_character(),
+      "Discharge/Transfer to Code (new)" = readr::col_character(),
+      "Discharged to - Location Code" = readr::col_character(),
+      "Condition On Discharge Code" = readr::col_double(),
+      "Continuous Inpatient Journey Marker" = readr::col_double(),
+      "CIJ Planned Admission Code" = readr::col_double(),
+      "CIJ Inpatient Day Case Identifier Code" = readr::col_character(),
+      "CIJ Type of Admission Code" = readr::col_character(),
+      "CIJ Admission Specialty Code" = readr::col_character(),
+      "CIJ Discharge Specialty Code" = readr::col_character(),
+      "CIJ Start Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "CIJ End Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Total Net Costs" = readr::col_double(),
+      "Diagnosis 1 Discharge Code" = readr::col_character(),
+      "Diagnosis 2 Discharge Code" = readr::col_character(),
+      "Diagnosis 3 Discharge Code" = readr::col_character(),
+      "Diagnosis 4 Discharge Code" = readr::col_character(),
+      "Diagnosis 5 Discharge Code" = readr::col_character(),
+      "Diagnosis 6 Discharge Code" = readr::col_character(),
+      "Operation 1A Code" = readr::col_character(),
+      "Operation 2A Code" = readr::col_character(),
+      "Operation 3A Code" = readr::col_character(),
+      "Operation 4A Code" = readr::col_character(),
+      "Date of Main Operation Full Date" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Age at Midpoint of Financial Year" = readr::col_double(),
+      "NHS Hospital Flag" = readr::col_character(),
+      "Community Hospital Flag" = readr::col_character(),
+      "Alcohol Related AdmissioN" = readr::col_character(),
+      "Substance Misuse Related Admission" = readr::col_character(),
+      "Falls Related Admission" = readr::col_character(),
+      "Self Harm Related Admission" = readr::col_character(),
+      "Maternity Unique Record Identifier [C]" = readr::col_character()
     )
   ) %>%
     # Rename variables in line with SLF variable names
diff --git a/R/read_extract_mental_health.R b/R/read_extract_mental_health.R
index 248316975..687e656d0 100644
--- a/R/read_extract_mental_health.R
+++ b/R/read_extract_mental_health.R
@@ -5,67 +5,67 @@
 #' @export
 read_extract_mental_health <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "MH")) {
+    file_path = get_boxi_extract_path(year = year, type = "mh")) {
   # Read BOXI extract
   extract_mental_health <- read_file(file_path,
-    col_types = cols_only(
-      "Costs Financial Year (04)" = col_double(),
-      "Costs Financial Month Number (04)" = col_double(),
-      "Date of Admission(04)" = col_date(format = "%Y/%m/%d %T"),
-      "Date of Discharge(04)" = col_date(format = "%Y/%m/%d %T"),
-      "Pat UPI" = col_character(),
-      "Pat Gender Code" = col_integer(),
-      "Pat Date Of Birth [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Practice Location Code" = col_character(),
-      "Practice NHS Board Code - current" = col_character(),
-      "Geo Postcode [C]" = col_character(),
-      "NHS Board of Residence Code - current" = col_character(),
-      "Geo Council Area Code" = col_character(),
-      "Geo HSCP of Residence Code - current" = col_character(),
-      "Geo Data Zone 2011" = col_character(),
-      "Treatment Location Code" = col_character(),
-      "Treatment NHS Board Code - current" = col_character(),
-      "Occupied Bed Days (04)" = col_double(),
-      "Specialty Classificat. 1/4/97 Code" = col_character(),
-      "Significant Facility Code" = col_character(),
-      "Lead Consultant/HCP Code" = col_character(),
-      "Management of Patient Code" = col_character(),
-      "Patient Category Code" = col_character(),
-      "Admission Type Code" = col_character(),
-      "Admitted Trans From Code" = col_character(),
-      "Location Admitted Trans From Code" = col_character(),
-      "Discharge Type Code" = col_character(),
-      "Discharge Trans To Code" = col_character(),
-      "Location Discharged Trans To Code" = col_character(),
-      "Diagnosis 1 Code (6 char)" = col_character(),
-      "Diagnosis 2 Code (6 char)" = col_character(),
-      "Diagnosis 3 Code (6 char)" = col_character(),
-      "Diagnosis 4 Code (6 char)" = col_character(),
-      "Diagnosis 5 Code (6 char)" = col_character(),
-      "Diagnosis 6 Code (6 char)" = col_character(),
-      "Status on Admission Code" = col_integer(),
-      "Admission Diagnosis 1 Code (6 char)" = col_character(),
-      "Admission Diagnosis 2 Code (6 char)" = col_character(),
-      "Admission Diagnosis 3 Code (6 char)" = col_character(),
-      "Admission Diagnosis 4 Code (6 char)" = col_character(),
-      "Age at Midpoint of Financial Year (04)" = col_integer(),
-      "Continuous Inpatient Journey Marker (04)" = col_integer(),
-      "CIJ Planned Admission Code (04)" = col_integer(),
-      "CIJ Inpatient Day Case Identifier Code (04)" = col_character(),
-      "CIJ Type of Admission Code (04)" = col_character(),
-      "CIJ Admission Specialty Code (04)" = col_character(),
-      "CIJ Discharge Specialty Code (04)" = col_character(),
-      "CIJ Start Date (04)" = col_date(format = "%Y/%m/%d %T"),
-      "CIJ End Date (04)" = col_date(format = "%Y/%m/%d %T"),
-      "Total Net Costs (04)" = col_double(),
-      "Alcohol Related Admission (04)" = col_factor(levels = c("Y", "N")),
-      "Substance Misuse Related Admission (04)" = col_factor(levels = c("Y", "N")),
-      "Falls Related Admission (04)" = col_factor(levels = c("Y", "N")),
-      "Self Harm Related Admission (04)" = col_factor(levels = c("Y", "N")),
-      "Duplicate Record Flag (04)" = col_factor(levels = c("Y", "N")),
-      "NHS Hospital Flag (04)" = col_factor(levels = c("Y", "N")),
-      "Community Hospital Flag (04)" = col_factor(levels = c("Y", "N")),
-      "Unique Record Identifier" = col_character()
+    col_types = readr::cols_only(
+      "Costs Financial Year (04)" = readr::col_double(),
+      "Costs Financial Month Number (04)" = readr::col_double(),
+      "Date of Admission(04)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Date of Discharge(04)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Pat UPI" = readr::col_character(),
+      "Pat Gender Code" = readr::col_integer(),
+      "Pat Date Of Birth [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Practice Location Code" = readr::col_character(),
+      "Practice NHS Board Code - current" = readr::col_character(),
+      "Geo Postcode [C]" = readr::col_character(),
+      "NHS Board of Residence Code - current" = readr::col_character(),
+      "Geo Council Area Code" = readr::col_character(),
+      "Geo HSCP of Residence Code - current" = readr::col_character(),
+      "Geo Data Zone 2011" = readr::col_character(),
+      "Treatment Location Code" = readr::col_character(),
+      "Treatment NHS Board Code - current" = readr::col_character(),
+      "Occupied Bed Days (04)" = readr::col_double(),
+      "Specialty Classificat. 1/4/97 Code" = readr::col_character(),
+      "Significant Facility Code" = readr::col_character(),
+      "Lead Consultant/HCP Code" = readr::col_character(),
+      "Management of Patient Code" = readr::col_character(),
+      "Patient Category Code" = readr::col_character(),
+      "Admission Type Code" = readr::col_character(),
+      "Admitted Trans From Code" = readr::col_character(),
+      "Location Admitted Trans From Code" = readr::col_character(),
+      "Discharge Type Code" = readr::col_character(),
+      "Discharge Trans To Code" = readr::col_character(),
+      "Location Discharged Trans To Code" = readr::col_character(),
+      "Diagnosis 1 Code (6 char)" = readr::col_character(),
+      "Diagnosis 2 Code (6 char)" = readr::col_character(),
+      "Diagnosis 3 Code (6 char)" = readr::col_character(),
+      "Diagnosis 4 Code (6 char)" = readr::col_character(),
+      "Diagnosis 5 Code (6 char)" = readr::col_character(),
+      "Diagnosis 6 Code (6 char)" = readr::col_character(),
+      "Status on Admission Code" = readr::col_integer(),
+      "Admission Diagnosis 1 Code (6 char)" = readr::col_character(),
+      "Admission Diagnosis 2 Code (6 char)" = readr::col_character(),
+      "Admission Diagnosis 3 Code (6 char)" = readr::col_character(),
+      "Admission Diagnosis 4 Code (6 char)" = readr::col_character(),
+      "Age at Midpoint of Financial Year (04)" = readr::col_integer(),
+      "Continuous Inpatient Journey Marker (04)" = readr::col_integer(),
+      "CIJ Planned Admission Code (04)" = readr::col_integer(),
+      "CIJ Inpatient Day Case Identifier Code (04)" = readr::col_character(),
+      "CIJ Type of Admission Code (04)" = readr::col_character(),
+      "CIJ Admission Specialty Code (04)" = readr::col_character(),
+      "CIJ Discharge Specialty Code (04)" = readr::col_character(),
+      "CIJ Start Date (04)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "CIJ End Date (04)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Total Net Costs (04)" = readr::col_double(),
+      "Alcohol Related Admission (04)" = readr::col_factor(levels = c("Y", "N")),
+      "Substance Misuse Related Admission (04)" = readr::col_factor(levels = c("Y", "N")),
+      "Falls Related Admission (04)" = readr::col_factor(levels = c("Y", "N")),
+      "Self Harm Related Admission (04)" = readr::col_factor(levels = c("Y", "N")),
+      "Duplicate Record Flag (04)" = readr::col_factor(levels = c("Y", "N")),
+      "NHS Hospital Flag (04)" = readr::col_factor(levels = c("Y", "N")),
+      "Community Hospital Flag (04)" = readr::col_factor(levels = c("Y", "N")),
+      "Unique Record Identifier" = readr::col_character()
     )
   ) %>%
     # rename variables
diff --git a/R/read_extract_nrs_deaths.R b/R/read_extract_nrs_deaths.R
index 1734b23aa..c852748b9 100644
--- a/R/read_extract_nrs_deaths.R
+++ b/R/read_extract_nrs_deaths.R
@@ -5,35 +5,35 @@
 #' @export
 read_extract_nrs_deaths <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "Deaths")) {
+    file_path = get_boxi_extract_path(year = year, type = "deaths")) {
   extract_nrs_deaths <- read_file(file_path,
-    col_types = cols_only(
-      "Death Location Code" = col_character(),
-      "Geo Council Area Code" = col_character(),
-      "Geo Data Zone 2011" = col_character(),
-      "Geo Postcode [C]" = col_character(),
-      "Geo HSCP of Residence Code - current" = col_character(),
-      "NHS Board of Occurrence Code - current" = col_character(),
-      "NHS Board of Residence Code - current" = col_character(),
-      "Pat Date Of Birth [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Date of Death(99)" = col_date(format = "%Y/%m/%d %T"),
-      "Pat Gender Code" = col_double(),
-      "Pat UPI" = col_character(),
-      "Place Death Occurred Code" = col_character(),
-      "Post Mortem Code" = col_character(),
-      "Prim Cause of Death Code (6 char)" = col_character(),
-      "Sec Cause of Death 0 Code (6 char)" = col_character(),
-      "Sec Cause of Death 1 Code (6 char)" = col_character(),
-      "Sec Cause of Death 2 Code (6 char)" = col_character(),
-      "Sec Cause of Death 3 Code (6 char)" = col_character(),
-      "Sec Cause of Death 4 Code (6 char)" = col_character(),
-      "Sec Cause of Death 5 Code (6 char)" = col_character(),
-      "Sec Cause of Death 6 Code (6 char)" = col_character(),
-      "Sec Cause of Death 7 Code (6 char)" = col_character(),
-      "Sec Cause of Death 8 Code (6 char)" = col_character(),
-      "Sec Cause of Death 9 Code (6 char)" = col_character(),
-      "Unique Record Identifier" = col_character(),
-      "GP practice code(99)" = col_character()
+    col_types = readr::cols_only(
+      "Death Location Code" = readr::col_character(),
+      "Geo Council Area Code" = readr::col_character(),
+      "Geo Data Zone 2011" = readr::col_character(),
+      "Geo Postcode [C]" = readr::col_character(),
+      "Geo HSCP of Residence Code - current" = readr::col_character(),
+      "NHS Board of Occurrence Code - current" = readr::col_character(),
+      "NHS Board of Residence Code - current" = readr::col_character(),
+      "Pat Date Of Birth [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Date of Death(99)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Pat Gender Code" = readr::col_double(),
+      "Pat UPI" = readr::col_character(),
+      "Place Death Occurred Code" = readr::col_character(),
+      "Post Mortem Code" = readr::col_character(),
+      "Prim Cause of Death Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 0 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 1 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 2 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 3 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 4 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 5 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 6 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 7 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 8 Code (6 char)" = readr::col_character(),
+      "Sec Cause of Death 9 Code (6 char)" = readr::col_character(),
+      "Unique Record Identifier" = readr::col_character(),
+      "GP practice code(99)" = readr::col_character()
     )
   ) %>%
     dplyr::rename(
diff --git a/R/read_extract_ooh_consultations.R b/R/read_extract_ooh_consultations.R
index 4e16527a3..d6f19c127 100644
--- a/R/read_extract_ooh_consultations.R
+++ b/R/read_extract_ooh_consultations.R
@@ -5,7 +5,7 @@
 #' @return a [tibble][tibble::tibble-package] with OOH Consultations extract data
 read_extract_ooh_consultations <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "GP_OoH-c")) {
+    file_path = get_boxi_extract_path(year = year, type = "gp_ooh-c")) {
   # Read consultations data
   consultations_extract <- read_file(file_path,
     col_types = readr::cols(
diff --git a/R/read_extract_ooh_diagnosis.R b/R/read_extract_ooh_diagnosis.R
index 33ef7eb5c..c93d5aaa1 100644
--- a/R/read_extract_ooh_diagnosis.R
+++ b/R/read_extract_ooh_diagnosis.R
@@ -6,7 +6,7 @@
 #'
 read_extract_ooh_diagnosis <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "GP_OoH-d")) {
+    file_path = get_boxi_extract_path(year = year, type = "gp_ooh-d")) {
   # Load extract file
   diagnosis_extract <- read_file(file_path,
     # All columns are character type
diff --git a/R/read_extract_ooh_outcomes.R b/R/read_extract_ooh_outcomes.R
index 949e17133..acfd8ae50 100644
--- a/R/read_extract_ooh_outcomes.R
+++ b/R/read_extract_ooh_outcomes.R
@@ -5,7 +5,7 @@
 #' @return a [tibble][tibble::tibble-package] with OOH Outcomes extract data
 read_extract_ooh_outcomes <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "GP_OoH-o")) {
+    file_path = get_boxi_extract_path(year = year, type = "gp_ooh-o")) {
   ## Load extract file
   outcomes_extract <- read_file(file_path,
     # All columns are character type
diff --git a/R/read_extract_outpatients.R b/R/read_extract_outpatients.R
index 44e02ca97..9ff60a36f 100644
--- a/R/read_extract_outpatients.R
+++ b/R/read_extract_outpatients.R
@@ -5,45 +5,45 @@
 #' @export
 read_extract_outpatients <- function(
     year,
-    file_path = get_boxi_extract_path(year = year, type = "Outpatient")) {
+    file_path = get_boxi_extract_path(year = year, type = "outpatient")) {
   # Read BOXI extract
   extract_outpatients <- read_file(file_path,
-    col_type = cols(
-      "Clinic Date Fin Year" = col_double(),
-      "Clinic Date (00)" = col_date(format = "%Y/%m/%d %T"),
-      "Episode Record Key (SMR00) [C]" = col_character(),
-      "Pat UPI" = col_character(),
-      "Pat Gender Code" = col_double(),
-      "Pat Date Of Birth [C]" = col_date(format = "%Y/%m/%d %T"),
-      "Practice Location Code" = col_character(),
-      "Practice NHS Board Code - current" = col_character(),
-      "Geo Postcode [C]" = col_character(),
-      "NHS Board of Residence Code - current" = col_character(),
-      "Geo Council Area Code" = col_character(),
-      "Treatment Location Code" = col_character(),
-      "Treatment NHS Board Code - current" = col_character(),
-      "Operation 1A Code (4 char)" = col_character(),
-      "Operation 1B Code (4 char)" = col_character(),
-      "Date of Main Operation(00)" = col_date(format = "%Y/%m/%d %T"),
-      "Operation 2A Code (4 char)" = col_character(),
-      "Operation 2B Code (4 char)" = col_character(),
-      "Date of Operation 2 (00)" = col_date(format = "%Y/%m/%d %T"),
-      "Specialty Classificat. 1/4/97 Code" = col_character(),
-      "Significant Facility Code" = col_character(),
-      "Consultant/HCP Code" = col_character(),
-      "Patient Category Code" = col_character(),
-      "Referral Source Code" = col_character(),
-      "Referral Type Code" = col_double(),
-      "Clinic Type Code" = col_double(),
-      "Clinic Attendance (Status) Code" = col_double(),
-      "Age at Midpoint of Financial Year" = col_double(),
-      "Alcohol Related Admission" = col_character(),
-      "Substance Misuse Related Admission" = col_character(),
-      "Falls Related Admission" = col_character(),
-      "Self Harm Related Admission" = col_character(),
-      "NHS Hospital Flag" = col_character(),
-      "Community Hospital Flag" = col_character(),
-      "Total Net Costs" = col_double()
+    col_type = readr::cols(
+      "Clinic Date Fin Year" = readr::col_double(),
+      "Clinic Date (00)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Episode Record Key (SMR00) [C]" = readr::col_character(),
+      "Pat UPI" = readr::col_character(),
+      "Pat Gender Code" = readr::col_double(),
+      "Pat Date Of Birth [C]" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Practice Location Code" = readr::col_character(),
+      "Practice NHS Board Code - current" = readr::col_character(),
+      "Geo Postcode [C]" = readr::col_character(),
+      "NHS Board of Residence Code - current" = readr::col_character(),
+      "Geo Council Area Code" = readr::col_character(),
+      "Treatment Location Code" = readr::col_character(),
+      "Treatment NHS Board Code - current" = readr::col_character(),
+      "Operation 1A Code (4 char)" = readr::col_character(),
+      "Operation 1B Code (4 char)" = readr::col_character(),
+      "Date of Main Operation(00)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Operation 2A Code (4 char)" = readr::col_character(),
+      "Operation 2B Code (4 char)" = readr::col_character(),
+      "Date of Operation 2 (00)" = readr::col_date(format = "%Y/%m/%d %T"),
+      "Specialty Classificat. 1/4/97 Code" = readr::col_character(),
+      "Significant Facility Code" = readr::col_character(),
+      "Consultant/HCP Code" = readr::col_character(),
+      "Patient Category Code" = readr::col_character(),
+      "Referral Source Code" = readr::col_character(),
+      "Referral Type Code" = readr::col_double(),
+      "Clinic Type Code" = readr::col_double(),
+      "Clinic Attendance (Status) Code" = readr::col_double(),
+      "Age at Midpoint of Financial Year" = readr::col_double(),
+      "Alcohol Related Admission" = readr::col_character(),
+      "Substance Misuse Related Admission" = readr::col_character(),
+      "Falls Related Admission" = readr::col_character(),
+      "Self Harm Related Admission" = readr::col_character(),
+      "NHS Hospital Flag" = readr::col_character(),
+      "Community Hospital Flag" = readr::col_character(),
+      "Total Net Costs" = readr::col_double()
     )
   ) %>%
     # Rename variables
diff --git a/R/read_extract_prescribing.R b/R/read_extract_prescribing.R
index 4f834a44e..683484473 100644
--- a/R/read_extract_prescribing.R
+++ b/R/read_extract_prescribing.R
@@ -5,14 +5,14 @@
 #' @export
 read_extract_prescribing <- function(year, file_path = get_it_prescribing_path(year)) {
   pis_file <- read_file(file_path,
-    col_type = cols_only(
-      "Pat UPI [C]" = col_character(),
-      "Pat DoB [C]" = col_date(format = "%d-%m-%Y"),
-      "Pat Gender" = col_double(),
-      "Pat Postcode [C]" = col_character(),
-      "Practice Code" = col_character(),
-      "Number of Paid Items" = col_double(),
-      "PD Paid GIC excl. BB" = col_double()
+    col_type = readr::cols_only(
+      "Pat UPI [C]" = readr::col_character(),
+      "Pat DoB [C]" = readr::col_date(format = "%d-%m-%Y"),
+      "Pat Gender" = readr::col_double(),
+      "Pat Postcode [C]" = readr::col_character(),
+      "Practice Code" = readr::col_character(),
+      "Number of Paid Items" = readr::col_double(),
+      "PD Paid GIC excl. BB" = readr::col_double()
     )
   ) %>%
     # Rename variables
diff --git a/R/read_it_chi_deaths.R b/R/read_it_chi_deaths.R
index 35f502c60..aab56c86d 100644
--- a/R/read_it_chi_deaths.R
+++ b/R/read_it_chi_deaths.R
@@ -8,10 +8,10 @@
 #' @family process extracts
 read_it_chi_deaths <- function(file_path = get_it_deaths_path()) {
   it_chi_deaths <- read_file(file_path,
-    col_type = cols(
-      "PATIENT_UPI [C]" = col_character(),
-      "PATIENT DoD DATE (NRS)" = col_date(format = "%d-%m-%Y"),
-      "PATIENT DoD DATE (CHI)" = col_date(format = "%d-%m-%Y")
+    col_type = readr::cols(
+      "PATIENT_UPI [C]" = readr::col_character(),
+      "PATIENT DoD DATE (NRS)" = readr::col_date(format = "%d-%m-%Y"),
+      "PATIENT DoD DATE (CHI)" = readr::col_date(format = "%d-%m-%Y")
     )
   ) %>%
     dplyr::rename(
diff --git a/R/read_lookup_ltc.R b/R/read_lookup_ltc.R
index 0a1ce5957..7eb83a434 100644
--- a/R/read_lookup_ltc.R
+++ b/R/read_lookup_ltc.R
@@ -9,28 +9,28 @@ read_lookup_ltc <- function(file_path = get_it_ltc_path()) {
   # Read data------------------------------------------------
   ltc_file <- read_file(
     file_path,
-    col_type = cols(
-      "PATIENT_UPI [C]" = col_character(),
-      "PATIENT_POSTCODE [C]" = col_character(),
-      "ARTHRITIS_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "ASTHMA_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "ATRIAL_FIB_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "CANCER_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "CEREBROVASC_DIS_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "CHRON_LIVER_DIS_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "COPD_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "DEMENTIA_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "DIABETES_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "EPILEPSY_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "HEART_DISEASE_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "HEART_FAILURE_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "MULT_SCLEROSIS_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "PARKINSONS_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "RENAL_FAILURE_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "CONGENITAL_PROB_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "BLOOD_AND_BFO_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "OTH_DIS_END_MET_DIAG_DATE" = col_date(format = "%d-%m-%Y"),
-      "OTH_DIS_DIG_SYS_DIAG_DATE" = col_date(format = "%d-%m-%Y")
+    col_type = readr::cols(
+      "PATIENT_UPI [C]" = readr::col_character(),
+      "PATIENT_POSTCODE [C]" = readr::col_character(),
+      "ARTHRITIS_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "ASTHMA_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "ATRIAL_FIB_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "CANCER_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "CEREBROVASC_DIS_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "CHRON_LIVER_DIS_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "COPD_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "DEMENTIA_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "DIABETES_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "EPILEPSY_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "HEART_DISEASE_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "HEART_FAILURE_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "MULT_SCLEROSIS_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "PARKINSONS_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "RENAL_FAILURE_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "CONGENITAL_PROB_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "BLOOD_AND_BFO_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "OTH_DIS_END_MET_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y"),
+      "OTH_DIS_DIG_SYS_DIAG_DATE" = readr::col_date(format = "%d-%m-%Y")
     )
   ) %>%
     # Rename variables
diff --git a/_pkgdown.yml b/_pkgdown.yml
index 41517d94a..dd144fe2a 100644
--- a/_pkgdown.yml
+++ b/_pkgdown.yml
@@ -18,6 +18,7 @@ reference:
     - is_missing
     - check_variables_exist
     - check_year_valid
+    - check_it_reference
 
 
   - title: Years & Dates
@@ -45,6 +46,8 @@ reference:
   - contents:
     - starts_with("clean_up")
     - fill_ch_names
+    - cascade_geographies
+    - correct_demographics
 
 
   - title: Create
@@ -80,13 +83,7 @@ reference:
     - contains("_hscp_to")
     - contains("_chi")
     - contains("lca")
-
-
-  - title: Duplicates
-    desc: Functions to fix duplicates
-  - contents:
-    - contains("_duplicates")
-
+    - la_code_lookup
 
   - title: Writing data
     desc: Functions which mask the typical data write functions to add some nice defaults and importantly fix file permissions.
@@ -101,6 +98,8 @@ reference:
     - ends_with("_period")
     - ends_with("_update")
     - starts_with("it_extract")
+    - gzip_files
+    - make_lowercase_ext
 
 
   - title: Files
@@ -141,10 +140,53 @@ reference:
   - title: Episode file
     desc: Building the episode file
   - contents:
-    - has_concept("episode file")
-  - subtitle: Cohorts
+    - has_concept("episode_file")
+    - fill_geographies
+  - subtitle: Lookups
   - contents:
     - has_concept("Demographic and Service Use Cohort functions")
+    - join_sparra_hhg
+    - join_deaths_data
+    - join_sc_client
+    - match_on_ltcs
+
+
+  - title: Individual file
+    desc: Building the episode file
+  - contents:
+    - has_concept("individual_file")
+  - subtitle: Lookups
+  - contents:
+    - has_concept("Demographic and Service Use Cohort functions")
+    - join_sparra_hhg
+    - join_cohort_lookups
+    - join_deaths_data
+    - join_slf_lookup_vars
+    - match_on_ltcs
+
+  - title: Demographics
+    desc: Things related to demographic lookups
+  - contents:
+    - fill_geographies
+    - get_gpprac_opendata
+    - make_gpprac_lookup
+    - make_postcode_lookup
+    - recode_health_boards
+    - recode_hscp
+    - la_code_lookup
+
+
+  - title: Miscellaneous functions
+    desc: Miscellaneous functions.
+  - subtitle: Homelessness
+  - contents:
+    - fix_east_ayrshire_duplicates
+    - fix_west_dun_duplicates
+    - produce_homelessness_completeness
+  - subtitle: Helper functions
+  - contents:
+    - vars_end_with
+    - has_concept("helper_funs")
 
 
   - title: Testing
diff --git a/_targets.R b/_targets.R
index 3473679de..88118eb01 100644
--- a/_targets.R
+++ b/_targets.R
@@ -34,6 +34,7 @@ list(
   ),
   ## Lookup data ##
   tar_target(gpprac_opendata, get_gpprac_opendata()),
+  tar_target(la_code_opendata, get_la_code_opendata_lookup()),
   tar_target(gpprac_ref_path, get_gpprac_ref_path(), format = "file"),
   tar_target(locality_path, get_locality_path(), format = "file"),
   tar_target(simd_path, get_simd_path(), format = "file"),
@@ -205,47 +206,47 @@ list(
     ### target data extracts ###
     tar_file_read(
       acute_data,
-      get_boxi_extract_path(year, type = "Acute"),
+      get_boxi_extract_path(year, type = "acute"),
       read_extract_acute(year, !!.x)
     ),
     tar_file_read(
       ae_data,
-      get_boxi_extract_path(year, type = "AE"),
+      get_boxi_extract_path(year, type = "ae"),
       read_extract_ae(year, !!.x)
     ),
     tar_file_read(
       cmh_data,
-      get_boxi_extract_path(year, type = "CMH"),
+      get_boxi_extract_path(year, type = "cmh"),
       read_extract_cmh(year, !!.x)
     ),
     tar_file_read(
       dn_data,
-      get_boxi_extract_path(year, type = "DN"),
+      get_boxi_extract_path(year, type = "dn"),
       read_extract_district_nursing(year, !!.x)
     ),
     tar_file_read(
       homelessness_data,
-      get_boxi_extract_path(year, type = "Homelessness"),
+      get_boxi_extract_path(year, type = "homelessness"),
       read_extract_homelessness(year, !!.x)
     ),
     tar_file_read(
       maternity_data,
-      get_boxi_extract_path(year, type = "Maternity"),
+      get_boxi_extract_path(year, type = "maternity"),
       read_extract_maternity(year, !!.x)
     ),
     tar_file_read(
       mental_health_data,
-      get_boxi_extract_path(year, type = "MH"),
+      get_boxi_extract_path(year, type = "mh"),
       read_extract_mental_health(year, !!.x)
     ),
     tar_file_read(
       nrs_deaths_data,
-      get_boxi_extract_path(year, type = "Deaths"),
+      get_boxi_extract_path(year, type = "deaths"),
       read_extract_nrs_deaths(year, !!.x)
     ),
     tar_file_read(
       outpatients_data,
-      get_boxi_extract_path(year, type = "Outpatient"),
+      get_boxi_extract_path(year, type = "outpatient"),
       read_extract_outpatients(year, !!.x)
     ),
     tar_file_read(
@@ -255,17 +256,17 @@ list(
     ),
     tar_target(
       diagnosis_data_path,
-      get_boxi_extract_path(year = year, type = "GP_OoH-d"),
+      get_boxi_extract_path(year = year, type = "gp_ooh-d"),
       format = "file"
     ),
     tar_target(
       outcomes_data_path,
-      get_boxi_extract_path(year = year, type = "GP_OoH-o"),
+      get_boxi_extract_path(year = year, type = "gp_ooh-o"),
       format = "file"
     ),
     tar_target(
       consultations_data_path,
-      get_boxi_extract_path(year = year, type = "GP_OoH-c"),
+      get_boxi_extract_path(year = year, type = "gp_ooh-c"),
       format = "file"
     ),
     tar_qs(
@@ -342,9 +343,10 @@ list(
     tar_target(
       source_homelessness_extract,
       process_extract_homelessness(
-        homelessness_data,
-        year,
-        write_to_disk = write_to_disk
+        data = homelessness_data,
+        year = year,
+        write_to_disk = write_to_disk,
+        la_code_lookup = la_code_opendata
       )
     ),
     tar_target(
diff --git a/man/add_acute_columns.Rd b/man/add_acute_columns.Rd
index c2659f821..b7be171cf 100644
--- a/man/add_acute_columns.Rd
+++ b/man/add_acute_columns.Rd
@@ -16,3 +16,33 @@ add_acute_columns(episode_file, prefix, condition)
 \description{
 Add Acute columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_ae_columns.Rd b/man/add_ae_columns.Rd
index fdc31b7ff..37d60f466 100644
--- a/man/add_ae_columns.Rd
+++ b/man/add_ae_columns.Rd
@@ -16,3 +16,33 @@ add_ae_columns(episode_file, prefix, condition)
 \description{
 Add AE columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_all_columns.Rd b/man/add_all_columns.Rd
index 1d2e587db..2aba7f5ad 100644
--- a/man/add_all_columns.Rd
+++ b/man/add_all_columns.Rd
@@ -13,3 +13,33 @@ add_all_columns(episode_file)
 Add new columns based on SMRType and recid which follow a pattern
 of prefixed column names created based on some condition.
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_at_columns.Rd b/man/add_at_columns.Rd
index af978530a..537a01f40 100644
--- a/man/add_at_columns.Rd
+++ b/man/add_at_columns.Rd
@@ -16,3 +16,33 @@ add_at_columns(episode_file, prefix, condition)
 \description{
 Add AT columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_ch_columns.Rd b/man/add_ch_columns.Rd
index a036a257e..360bb29db 100644
--- a/man/add_ch_columns.Rd
+++ b/man/add_ch_columns.Rd
@@ -16,3 +16,33 @@ add_ch_columns(episode_file, prefix, condition)
 \description{
 Add CH columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_cij_columns.Rd b/man/add_cij_columns.Rd
index c48c1a3ef..f8d2528f2 100644
--- a/man/add_cij_columns.Rd
+++ b/man/add_cij_columns.Rd
@@ -12,3 +12,33 @@ add_cij_columns(episode_file)
 \description{
 Add new columns related to CIJ
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_cmh_columns.Rd b/man/add_cmh_columns.Rd
index a1cb74abb..654e03f75 100644
--- a/man/add_cmh_columns.Rd
+++ b/man/add_cmh_columns.Rd
@@ -16,3 +16,33 @@ add_cmh_columns(episode_file, prefix, condition)
 \description{
 Add CMH columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_dd_columns.Rd b/man/add_dd_columns.Rd
index 11e85fdc7..a920a7979 100644
--- a/man/add_dd_columns.Rd
+++ b/man/add_dd_columns.Rd
@@ -16,3 +16,33 @@ add_dd_columns(episode_file, prefix, condition)
 \description{
 Add DD columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_dn_columns.Rd b/man/add_dn_columns.Rd
index ffdf59a82..6d6fa61cb 100644
--- a/man/add_dn_columns.Rd
+++ b/man/add_dn_columns.Rd
@@ -16,3 +16,33 @@ add_dn_columns(episode_file, prefix, condition)
 \description{
 Add DN columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_gls_columns.Rd b/man/add_gls_columns.Rd
index 6ab7e9645..84c49848a 100644
--- a/man/add_gls_columns.Rd
+++ b/man/add_gls_columns.Rd
@@ -16,3 +16,33 @@ add_gls_columns(episode_file, prefix, condition)
 \description{
 Add GLS columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_hc_columns.Rd b/man/add_hc_columns.Rd
index a58f226ec..d5154acfd 100644
--- a/man/add_hc_columns.Rd
+++ b/man/add_hc_columns.Rd
@@ -16,3 +16,33 @@ add_hc_columns(episode_file, prefix, condition)
 \description{
 Add HC columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_hl1_columns.Rd b/man/add_hl1_columns.Rd
index 24fc714e9..87df2969b 100644
--- a/man/add_hl1_columns.Rd
+++ b/man/add_hl1_columns.Rd
@@ -16,3 +16,33 @@ add_hl1_columns(episode_file, prefix, condition)
 \description{
 Add HL1 columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_ipdc_cols.Rd b/man/add_ipdc_cols.Rd
index bd630b9d3..f78ddd981 100644
--- a/man/add_ipdc_cols.Rd
+++ b/man/add_ipdc_cols.Rd
@@ -15,9 +15,40 @@ add_ipdc_cols(episode_file, prefix, condition, ipdc_d = TRUE, elective = TRUE)
 
 \item{ipdc_d}{Whether to create columns based on IPDC = "D" (lgl)}
 
-\item{elective}{Whether to create columns based on Elective/Non-Elective cij_pattype (lgl)}
+\item{elective}{Whether to create columns based on Elective/Non-Elective
+cij_pattype (lgl)}
 }
 \description{
 Add columns based on value in IPDC column, which can
 be further split by Elective/Non-Elective CIJ.
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_mat_columns.Rd b/man/add_mat_columns.Rd
index 5faab0dc1..8c4e26290 100644
--- a/man/add_mat_columns.Rd
+++ b/man/add_mat_columns.Rd
@@ -16,3 +16,33 @@ add_mat_columns(episode_file, prefix, condition)
 \description{
 Add Mat columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_mh_columns.Rd b/man/add_mh_columns.Rd
index c587c490a..64c1ded97 100644
--- a/man/add_mh_columns.Rd
+++ b/man/add_mh_columns.Rd
@@ -16,3 +16,33 @@ add_mh_columns(episode_file, prefix, condition)
 \description{
 Add MH columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_nrs_columns.Rd b/man/add_nrs_columns.Rd
index b41201a57..e793fefb0 100644
--- a/man/add_nrs_columns.Rd
+++ b/man/add_nrs_columns.Rd
@@ -16,3 +16,33 @@ add_nrs_columns(episode_file, prefix, condition)
 \description{
 Add NRS columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_nsu_cohort.Rd b/man/add_nsu_cohort.Rd
index 4ea9324e0..b9a988c57 100644
--- a/man/add_nsu_cohort.Rd
+++ b/man/add_nsu_cohort.Rd
@@ -22,8 +22,10 @@ Add NSU cohort to working file
 \seealso{
 \code{\link[=get_nsu_path]{get_nsu_path()}}
 
-Other episode file: 
+Other episode_file: 
 \code{\link{add_ppa_flag}()},
-\code{\link{link_delayed_discharge_eps}()}
+\code{\link{apply_cost_uplift}()},
+\code{\link{link_delayed_discharge_eps}()},
+\code{\link{lookup_uplift}()}
 }
-\concept{episode file}
+\concept{episode_file}
diff --git a/man/add_nsu_columns.Rd b/man/add_nsu_columns.Rd
index 5aed481f0..bb72fab58 100644
--- a/man/add_nsu_columns.Rd
+++ b/man/add_nsu_columns.Rd
@@ -16,3 +16,33 @@ add_nsu_columns(episode_file, prefix, condition)
 \description{
 Add NSU columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_ooh_columns.Rd b/man/add_ooh_columns.Rd
index f1e6b63f5..9caf53eac 100644
--- a/man/add_ooh_columns.Rd
+++ b/man/add_ooh_columns.Rd
@@ -16,3 +16,33 @@ add_ooh_columns(episode_file, prefix, condition)
 \description{
 Add OoH columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_op_columns.Rd b/man/add_op_columns.Rd
index 9fb8bc158..52ba219cf 100644
--- a/man/add_op_columns.Rd
+++ b/man/add_op_columns.Rd
@@ -16,3 +16,33 @@ add_op_columns(episode_file, prefix, condition)
 \description{
 Add OP columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_pis_columns.Rd b/man/add_pis_columns.Rd
index 836218da0..1b94ba8f7 100644
--- a/man/add_pis_columns.Rd
+++ b/man/add_pis_columns.Rd
@@ -16,3 +16,33 @@ add_pis_columns(episode_file, prefix, condition)
 \description{
 Add PIS columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_ppa_flag.Rd b/man/add_ppa_flag.Rd
index 8533a09f5..8493cff05 100644
--- a/man/add_ppa_flag.Rd
+++ b/man/add_ppa_flag.Rd
@@ -18,8 +18,10 @@ a combination of diagnostic codes and operation codes, whether an admission
 was preventable or not.
 }
 \seealso{
-Other episode file: 
+Other episode_file: 
 \code{\link{add_nsu_cohort}()},
-\code{\link{link_delayed_discharge_eps}()}
+\code{\link{apply_cost_uplift}()},
+\code{\link{link_delayed_discharge_eps}()},
+\code{\link{lookup_uplift}()}
 }
-\concept{episode file}
+\concept{episode_file}
diff --git a/man/add_sds_columns.Rd b/man/add_sds_columns.Rd
index c06b88527..167290d54 100644
--- a/man/add_sds_columns.Rd
+++ b/man/add_sds_columns.Rd
@@ -16,3 +16,33 @@ add_sds_columns(episode_file, prefix, condition)
 \description{
 Add SDS columns
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_standard_cols.Rd b/man/add_standard_cols.Rd
index 4392157d2..3d0e1e69e 100644
--- a/man/add_standard_cols.Rd
+++ b/man/add_standard_cols.Rd
@@ -24,5 +24,36 @@ add_standard_cols(
 \item{cost}{Whether to create prefix_cost col, e.g. "Acute_cost"}
 }
 \description{
-Add standard columns (DoB, postcode, gpprac, episodes, cost) to episode file.
+Add standard columns (DoB, postcode, gpprac, episodes, cost)
+to episode file.
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/apply_cost_uplift.Rd b/man/apply_cost_uplift.Rd
index 315e154f3..e88b36b76 100644
--- a/man/apply_cost_uplift.Rd
+++ b/man/apply_cost_uplift.Rd
@@ -15,3 +15,11 @@ episode data with uplifted costs
 \description{
 Uplift costs
 }
+\seealso{
+Other episode_file: 
+\code{\link{add_nsu_cohort}()},
+\code{\link{add_ppa_flag}()},
+\code{\link{link_delayed_discharge_eps}()},
+\code{\link{lookup_uplift}()}
+}
+\concept{episode_file}
diff --git a/man/clean_up_ch.Rd b/man/clean_up_ch.Rd
index c0c61966d..9dadbd808 100644
--- a/man/clean_up_ch.Rd
+++ b/man/clean_up_ch.Rd
@@ -14,3 +14,33 @@ clean_up_ch(episode_file, year)
 \description{
 Clean up CH-related columns.
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/condition_cols.Rd b/man/condition_cols.Rd
index ba037a609..8cbbda825 100644
--- a/man/condition_cols.Rd
+++ b/man/condition_cols.Rd
@@ -11,3 +11,33 @@ Returns chr vector of column names
 which follow format "condition" and "condition_date" e.g.
 "dementia" and "dementia_date"
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/convert_ca_to_lca.Rd b/man/convert_ca_to_lca.Rd
index 25a8de018..ffb67960b 100644
--- a/man/convert_ca_to_lca.Rd
+++ b/man/convert_ca_to_lca.Rd
@@ -21,11 +21,11 @@ convert_ca_to_lca(ca)
 
 }
 \seealso{
-convert_sending_location_to_lca
+convert_sc_sending_location_to_lca
 
 Other code functions: 
 \code{\link{convert_hb_to_hbnames}()},
 \code{\link{convert_hscp_to_hscpnames}()},
-\code{\link{convert_sending_location_to_lca}()}
+\code{\link{convert_sc_sending_location_to_lca}()}
 }
 \concept{code functions}
diff --git a/man/convert_hb_to_hbnames.Rd b/man/convert_hb_to_hbnames.Rd
index 0cd2932bd..e104a11ce 100644
--- a/man/convert_hb_to_hbnames.Rd
+++ b/man/convert_hb_to_hbnames.Rd
@@ -24,6 +24,6 @@ convert_hb_to_hbnames(hb)
 Other code functions: 
 \code{\link{convert_ca_to_lca}()},
 \code{\link{convert_hscp_to_hscpnames}()},
-\code{\link{convert_sending_location_to_lca}()}
+\code{\link{convert_sc_sending_location_to_lca}()}
 }
 \concept{code functions}
diff --git a/man/convert_hscp_to_hscpnames.Rd b/man/convert_hscp_to_hscpnames.Rd
index ac9bd023e..c423b8721 100644
--- a/man/convert_hscp_to_hscpnames.Rd
+++ b/man/convert_hscp_to_hscpnames.Rd
@@ -25,6 +25,6 @@ convert_hscp_to_hscpnames(hscp)
 Other code functions: 
 \code{\link{convert_ca_to_lca}()},
 \code{\link{convert_hb_to_hbnames}()},
-\code{\link{convert_sending_location_to_lca}()}
+\code{\link{convert_sc_sending_location_to_lca}()}
 }
 \concept{code functions}
diff --git a/man/convert_sending_location_to_lca.Rd b/man/convert_sc_sending_location_to_lca.Rd
similarity index 69%
rename from man/convert_sending_location_to_lca.Rd
rename to man/convert_sc_sending_location_to_lca.Rd
index 78bf475ba..10a0e952f 100644
--- a/man/convert_sending_location_to_lca.Rd
+++ b/man/convert_sc_sending_location_to_lca.Rd
@@ -1,10 +1,10 @@
 % Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/convert_sending_location_to_lca.R
-\name{convert_sending_location_to_lca}
-\alias{convert_sending_location_to_lca}
+% Please edit documentation in R/convert_sc_sending_location_to_lca.R
+\name{convert_sc_sending_location_to_lca}
+\alias{convert_sc_sending_location_to_lca}
 \title{Convert Social Care Sending Location Codes into LCA Codes}
 \usage{
-convert_sending_location_to_lca(sending_location)
+convert_sc_sending_location_to_lca(sending_location)
 }
 \arguments{
 \item{sending_location}{vector of sending location codes}
@@ -18,7 +18,7 @@ Local Council Authority Codes.
 }
 \examples{
 sending_location <- c(100, 120)
-convert_sending_location_to_lca(sending_location)
+convert_sc_sending_location_to_lca(sending_location)
 
 }
 \seealso{
diff --git a/man/create_individual_file.Rd b/man/create_individual_file.Rd
index 4c87b0731..128819711 100644
--- a/man/create_individual_file.Rd
+++ b/man/create_individual_file.Rd
@@ -33,3 +33,33 @@ The processed individual file
 \description{
 Creates the individual file from the episode file.
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/get_boxi_extract_path.Rd b/man/get_boxi_extract_path.Rd
index 9a97ac199..c012ac3ef 100644
--- a/man/get_boxi_extract_path.Rd
+++ b/man/get_boxi_extract_path.Rd
@@ -6,8 +6,8 @@
 \usage{
 get_boxi_extract_path(
   year,
-  type = c("AE", "AE_CUP", "Acute", "CMH", "Deaths", "DN", "GP_OoH-c", "GP_OoH-d",
-    "GP_OoH-o", "Homelessness", "Maternity", "MH", "Outpatients")
+  type = c("ae", "ae_cup", "acute", "cmh", "deaths", "dn", "gp_ooh-c", "gp_ooh-d",
+    "gp_ooh-o", "homelessness", "maternity", "mh", "outpatients")
 )
 }
 \arguments{
diff --git a/man/get_la_code_opendata_lookup.Rd b/man/get_la_code_opendata_lookup.Rd
new file mode 100644
index 000000000..dbf2fbb73
--- /dev/null
+++ b/man/get_la_code_opendata_lookup.Rd
@@ -0,0 +1,16 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/get_la_code_opendata_lookup.R
+\name{get_la_code_opendata_lookup}
+\alias{get_la_code_opendata_lookup}
+\title{Download the LA code lookup}
+\usage{
+get_la_code_opendata_lookup()
+}
+\value{
+a \link[tibble:tibble-package]{tibble} with the Local Authority names
+and codes.
+}
+\description{
+Download and process the Local Authority lookup from the Open
+Data platform
+}
diff --git a/man/get_slf_episode_path.Rd b/man/get_slf_episode_path.Rd
new file mode 100644
index 000000000..064e47fbb
--- /dev/null
+++ b/man/get_slf_episode_path.Rd
@@ -0,0 +1,19 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/get_final_file_paths.R
+\name{get_slf_episode_path}
+\alias{get_slf_episode_path}
+\title{Get the slf episode file path}
+\usage{
+get_slf_episode_path(year, ...)
+}
+\arguments{
+\item{year}{Financial year}
+
+\item{...}{additional arguments passed to \code{\link[=get_file_path]{get_file_path()}}}
+}
+\value{
+Path to the final episode file.
+}
+\description{
+Get the slf episode file path
+}
diff --git a/man/get_slf_individual_path.Rd b/man/get_slf_individual_path.Rd
new file mode 100644
index 000000000..9b72c6d89
--- /dev/null
+++ b/man/get_slf_individual_path.Rd
@@ -0,0 +1,19 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/get_final_file_paths.R
+\name{get_slf_individual_path}
+\alias{get_slf_individual_path}
+\title{Get the SLF individual file path}
+\usage{
+get_slf_individual_path(year, ...)
+}
+\arguments{
+\item{year}{Financial year}
+
+\item{...}{additional arguments passed to \code{\link[=get_file_path]{get_file_path()}}}
+}
+\value{
+Path to the final individual file
+}
+\description{
+Get the SLF individual file path
+}
diff --git a/man/get_source_extract_path.Rd b/man/get_source_extract_path.Rd
index fd9502b83..48c665a83 100644
--- a/man/get_source_extract_path.Rd
+++ b/man/get_source_extract_path.Rd
@@ -6,8 +6,8 @@
 \usage{
 get_source_extract_path(
   year,
-  type = c("Acute", "AE", "AT", "CH", "CMH", "DD", "Deaths", "DN", "GPOoH", "HC",
-    "Homelessness", "Maternity", "MH", "Outpatients", "PIS", "SDS"),
+  type = c("acute", "ae", "at", "ch", "client", "cmh", "dd", "deaths", "dn", "gp_ooh",
+    "hc", "homelessness", "maternity", "mh", "outpatients", "pis", "sds"),
   ...
 )
 }
diff --git a/man/la_code_lookup.Rd b/man/la_code_lookup.Rd
deleted file mode 100644
index 9dde038e0..000000000
--- a/man/la_code_lookup.Rd
+++ /dev/null
@@ -1,20 +0,0 @@
-% Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/la_code_lookup.R
-\name{la_code_lookup}
-\alias{la_code_lookup}
-\title{Download the LA code lookup}
-\usage{
-la_code_lookup(res_id = "967937c4-8d67-4f39-974f-fd58c4acfda5")
-}
-\arguments{
-\item{res_id}{The resource ID as found on
-\href{https://www.opendata.nhs.scot/}{NHS Open Data platform}}
-}
-\value{
-a \link[tibble:tibble-package]{tibble} with the Local Authority names
-and codes.
-}
-\description{
-Download and process the Local Authority lookup from the Open
-Data platform
-}
diff --git a/man/link_delayed_discharge_eps.Rd b/man/link_delayed_discharge_eps.Rd
index 49c3e2a75..173fc8706 100644
--- a/man/link_delayed_discharge_eps.Rd
+++ b/man/link_delayed_discharge_eps.Rd
@@ -7,7 +7,7 @@
 link_delayed_discharge_eps(
   episode_file,
   year,
-  dd_data = read_file(get_source_extract_path(year, "DD"))
+  dd_data = read_file(get_source_extract_path(year, "dd"))
 )
 }
 \arguments{
@@ -25,8 +25,10 @@ using the \code{cij_marker}
 Link  Delayed Discharge to WIP episode file
 }
 \seealso{
-Other episode file: 
+Other episode_file: 
 \code{\link{add_nsu_cohort}()},
-\code{\link{add_ppa_flag}()}
+\code{\link{add_ppa_flag}()},
+\code{\link{apply_cost_uplift}()},
+\code{\link{lookup_uplift}()}
 }
-\concept{episode file}
+\concept{episode_file}
diff --git a/man/lookup_uplift.Rd b/man/lookup_uplift.Rd
index f3fb4865c..d5ae92d24 100644
--- a/man/lookup_uplift.Rd
+++ b/man/lookup_uplift.Rd
@@ -15,3 +15,11 @@ episode data with a uplift scale
 \description{
 Set uplift scale
 }
+\seealso{
+Other episode_file: 
+\code{\link{add_nsu_cohort}()},
+\code{\link{add_ppa_flag}()},
+\code{\link{apply_cost_uplift}()},
+\code{\link{link_delayed_discharge_eps}()}
+}
+\concept{episode_file}
diff --git a/man/max_no_inf.Rd b/man/max_no_inf.Rd
index 79b9a1057..b6b4b0f0c 100644
--- a/man/max_no_inf.Rd
+++ b/man/max_no_inf.Rd
@@ -14,3 +14,8 @@ Custom maximum function which removes
 missing values but doesn't return Inf if all values
 are missing (instead returns NA)
 }
+\seealso{
+Other helper_funs: 
+\code{\link{min_no_inf}()}
+}
+\concept{helper_funs}
diff --git a/man/min_no_inf.Rd b/man/min_no_inf.Rd
index 38029214f..35c187649 100644
--- a/man/min_no_inf.Rd
+++ b/man/min_no_inf.Rd
@@ -14,3 +14,8 @@ Custom minimum function which removes
 missing values but doesn't return Inf if all values
 are missing (instead returns NA)
 }
+\seealso{
+Other helper_funs: 
+\code{\link{max_no_inf}()}
+}
+\concept{helper_funs}
diff --git a/man/process_extract_homelessness.Rd b/man/process_extract_homelessness.Rd
index 7b2254050..1d7d3d1a7 100644
--- a/man/process_extract_homelessness.Rd
+++ b/man/process_extract_homelessness.Rd
@@ -9,6 +9,7 @@ process_extract_homelessness(
   year,
   write_to_disk = TRUE,
   update = latest_update(),
+  la_code_lookup = get_la_code_opendata_lookup(),
   sg_pub_path = get_sg_homelessness_pub_path()
 )
 }
diff --git a/man/process_slf_deaths_lookup.Rd b/man/process_slf_deaths_lookup.Rd
index 2ecde97ce..8ad103a2a 100644
--- a/man/process_slf_deaths_lookup.Rd
+++ b/man/process_slf_deaths_lookup.Rd
@@ -6,7 +6,7 @@
 \usage{
 process_slf_deaths_lookup(
   year,
-  nrs_deaths_data = read_file(get_source_extract_path(year, "Deaths"), col_select =
+  nrs_deaths_data = read_file(get_source_extract_path(year, "deaths"), col_select =
     c("chi", "record_keydate1")),
   chi_deaths_data = read_file(get_slf_chi_deaths_path()),
   write_to_disk = TRUE
diff --git a/man/read_extract_acute.Rd b/man/read_extract_acute.Rd
index a924c2f80..1c63d7edf 100644
--- a/man/read_extract_acute.Rd
+++ b/man/read_extract_acute.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_acute(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "Acute")
+  file_path = get_boxi_extract_path(year = year, type = "acute")
 )
 }
 \arguments{
diff --git a/man/read_extract_ae.Rd b/man/read_extract_ae.Rd
index 803b281ac..1a15efbc1 100644
--- a/man/read_extract_ae.Rd
+++ b/man/read_extract_ae.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_ae(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "AE")
+  file_path = get_boxi_extract_path(year = year, type = "ae")
 )
 }
 \arguments{
diff --git a/man/read_extract_cmh.Rd b/man/read_extract_cmh.Rd
index 1f76e8292..f0701e41c 100644
--- a/man/read_extract_cmh.Rd
+++ b/man/read_extract_cmh.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_cmh(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "CMH")
+  file_path = get_boxi_extract_path(year = year, type = "cmh")
 )
 }
 \arguments{
diff --git a/man/read_extract_district_nursing.Rd b/man/read_extract_district_nursing.Rd
index 9f4188a5f..07065a3c5 100644
--- a/man/read_extract_district_nursing.Rd
+++ b/man/read_extract_district_nursing.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_district_nursing(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "DN")
+  file_path = get_boxi_extract_path(year = year, type = "dn")
 )
 }
 \arguments{
diff --git a/man/read_extract_gp_ooh.Rd b/man/read_extract_gp_ooh.Rd
index 233844074..ba908127b 100644
--- a/man/read_extract_gp_ooh.Rd
+++ b/man/read_extract_gp_ooh.Rd
@@ -6,9 +6,9 @@
 \usage{
 read_extract_gp_ooh(
   year,
-  diagnosis_path = get_boxi_extract_path(year = year, type = "GP_OoH-d"),
-  outcomes_path = get_boxi_extract_path(year = year, type = "GP_OoH-o"),
-  consultations_path = get_boxi_extract_path(year = year, type = "GP_OoH-c")
+  diagnosis_path = get_boxi_extract_path(year = year, type = "gp_ooh-d"),
+  outcomes_path = get_boxi_extract_path(year = year, type = "gp_ooh-o"),
+  consultations_path = get_boxi_extract_path(year = year, type = "gp_ooh-c")
 )
 }
 \arguments{
diff --git a/man/read_extract_homelessness.Rd b/man/read_extract_homelessness.Rd
index bb03535d5..7ec69d301 100644
--- a/man/read_extract_homelessness.Rd
+++ b/man/read_extract_homelessness.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_homelessness(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "Homelessness")
+  file_path = get_boxi_extract_path(year = year, type = "homelessness")
 )
 }
 \arguments{
diff --git a/man/read_extract_maternity.Rd b/man/read_extract_maternity.Rd
index 6fe10b491..9a04d34f1 100644
--- a/man/read_extract_maternity.Rd
+++ b/man/read_extract_maternity.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_maternity(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "Maternity")
+  file_path = get_boxi_extract_path(year = year, type = "maternity")
 )
 }
 \arguments{
diff --git a/man/read_extract_mental_health.Rd b/man/read_extract_mental_health.Rd
index 3b6e0b619..58115215c 100644
--- a/man/read_extract_mental_health.Rd
+++ b/man/read_extract_mental_health.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_mental_health(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "MH")
+  file_path = get_boxi_extract_path(year = year, type = "mh")
 )
 }
 \arguments{
diff --git a/man/read_extract_nrs_deaths.Rd b/man/read_extract_nrs_deaths.Rd
index d7b63b2db..8b810aebd 100644
--- a/man/read_extract_nrs_deaths.Rd
+++ b/man/read_extract_nrs_deaths.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_nrs_deaths(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "Deaths")
+  file_path = get_boxi_extract_path(year = year, type = "deaths")
 )
 }
 \arguments{
diff --git a/man/read_extract_ooh_consultations.Rd b/man/read_extract_ooh_consultations.Rd
index 05d0bda31..b4ecc62f6 100644
--- a/man/read_extract_ooh_consultations.Rd
+++ b/man/read_extract_ooh_consultations.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_ooh_consultations(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "GP_OoH-c")
+  file_path = get_boxi_extract_path(year = year, type = "gp_ooh-c")
 )
 }
 \arguments{
diff --git a/man/read_extract_ooh_diagnosis.Rd b/man/read_extract_ooh_diagnosis.Rd
index b0d015554..93a8196cf 100644
--- a/man/read_extract_ooh_diagnosis.Rd
+++ b/man/read_extract_ooh_diagnosis.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_ooh_diagnosis(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "GP_OoH-d")
+  file_path = get_boxi_extract_path(year = year, type = "gp_ooh-d")
 )
 }
 \arguments{
diff --git a/man/read_extract_ooh_outcomes.Rd b/man/read_extract_ooh_outcomes.Rd
index bd563cd12..4bf02fcb5 100644
--- a/man/read_extract_ooh_outcomes.Rd
+++ b/man/read_extract_ooh_outcomes.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_ooh_outcomes(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "GP_OoH-o")
+  file_path = get_boxi_extract_path(year = year, type = "gp_ooh-o")
 )
 }
 \arguments{
diff --git a/man/read_extract_outpatients.Rd b/man/read_extract_outpatients.Rd
index 8fb31475b..92a46376b 100644
--- a/man/read_extract_outpatients.Rd
+++ b/man/read_extract_outpatients.Rd
@@ -6,7 +6,7 @@
 \usage{
 read_extract_outpatients(
   year,
-  file_path = get_boxi_extract_path(year = year, type = "Outpatient")
+  file_path = get_boxi_extract_path(year = year, type = "outpatient")
 )
 }
 \arguments{
diff --git a/man/recode_gender.Rd b/man/recode_gender.Rd
index aaa28e6eb..4d1094b4d 100644
--- a/man/recode_gender.Rd
+++ b/man/recode_gender.Rd
@@ -12,3 +12,33 @@ recode_gender(episode_file)
 \description{
 Recode gender to 1.5 if 0 or 9.
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/remove_blank_chi.Rd b/man/remove_blank_chi.Rd
index b290dd1e7..8133d5313 100644
--- a/man/remove_blank_chi.Rd
+++ b/man/remove_blank_chi.Rd
@@ -12,3 +12,33 @@ remove_blank_chi(episode_file)
 \description{
 Convert blank strings to NA and remove NAs from CHI column
 }
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()}
+}
+\concept{individual_file}
diff --git a/tests/testthat/_snaps/convert_sending_location_to_lca.md b/tests/testthat/_snaps/convert_sending_location_to_lca.md
index 1fa02dc14..db223d6db 100644
--- a/tests/testthat/_snaps/convert_sending_location_to_lca.md
+++ b/tests/testthat/_snaps/convert_sending_location_to_lca.md
@@ -1,7 +1,7 @@
 # Can convert a SC sending location to lca code
 
     Code
-      convert_sending_location_to_lca(c(100L, 110L, 120L, 130L, 355L, 150L, 395L,
+      convert_sc_sending_location_to_lca(c(100L, 110L, 120L, 130L, 355L, 150L, 395L,
         170L, 180L, 190L, 200L, 210L, 220L, 230L, 240L, 250L, 260L, 270L, 280L, 290L,
         300L, 310L, 320L, 330L, 340L, 350L, 360L, 370L, 380L, 390L, 400L, 235L, 999L,
         0L, NA_integer_))
diff --git a/tests/testthat/_snaps/get_la_code_opendata_lookup.md b/tests/testthat/_snaps/get_la_code_opendata_lookup.md
new file mode 100644
index 000000000..40365d570
--- /dev/null
+++ b/tests/testthat/_snaps/get_la_code_opendata_lookup.md
@@ -0,0 +1,20 @@
+# LA Code lookup is correct
+
+    Code
+      get_la_code_opendata_lookup()
+    Output
+      # A tibble: 36 x 3
+         CA        CAName                sending_local_authority_name
+         <chr>     <chr>                 <chr>                       
+       1 S12000005 Clackmannanshire      Clackmannanshire            
+       2 S12000006 Dumfries and Galloway Dumfries & Galloway         
+       3 S12000008 East Ayrshire         East Ayrshire               
+       4 S12000010 East Lothian          East Lothian                
+       5 S12000011 East Renfrewshire     East Renfrewshire           
+       6 S12000013 Na h-Eileanan Siar    Eilean Siar                 
+       7 S12000014 Falkirk               Falkirk                     
+       8 S12000015 Fife                  Fife                        
+       9 S12000017 Highland              Highland                    
+      10 S12000018 Inverclyde            Inverclyde                  
+      # i 26 more rows
+
diff --git a/tests/testthat/test-convert_sending_location_to_lca.R b/tests/testthat/test-convert_sending_location_to_lca.R
index eb66802a6..5d286311a 100644
--- a/tests/testthat/test-convert_sending_location_to_lca.R
+++ b/tests/testthat/test-convert_sending_location_to_lca.R
@@ -1,6 +1,6 @@
 test_that("Can convert a SC sending location to lca code", {
   expect_snapshot(
-    convert_sending_location_to_lca(
+    convert_sc_sending_location_to_lca(
       c(
         100L,
         110L,
@@ -44,9 +44,9 @@ test_that("Can convert a SC sending location to lca code", {
 
 test_that("Errors on unexpected input", {
   expect_error(
-    convert_sending_location_to_lca("100")
+    convert_sc_sending_location_to_lca("100")
   )
   expect_error(
-    convert_sending_location_to_lca(c("100", 99L))
+    convert_sc_sending_location_to_lca(c("100", 99L))
   )
 })
diff --git a/tests/testthat/test-get_gpprac_opendata.R b/tests/testthat/test-get_gpprac_opendata.R
new file mode 100644
index 000000000..c70d753b4
--- /dev/null
+++ b/tests/testthat/test-get_gpprac_opendata.R
@@ -0,0 +1,18 @@
+skip_if_offline()
+
+test_that("GP prac cluster lookup is correct", {
+  gp_cluster_lookup <- expect_warning(get_gpprac_opendata())
+
+  expect_s3_class(gp_cluster_lookup, "tbl_df")
+  expect_named(
+    gp_cluster_lookup,
+    c(
+      "gpprac",
+      "practice_name",
+      "postcode",
+      "cluster",
+      "partnership",
+      "health_board"
+    )
+  )
+})
diff --git a/tests/testthat/test-get_la_code_opendata_lookup.R b/tests/testthat/test-get_la_code_opendata_lookup.R
new file mode 100644
index 000000000..f46c17c04
--- /dev/null
+++ b/tests/testthat/test-get_la_code_opendata_lookup.R
@@ -0,0 +1,13 @@
+skip_if_offline()
+
+test_that("LA Code lookup is correct", {
+  la_code_lookup <- get_la_code_opendata_lookup()
+
+  expect_s3_class(la_code_lookup, "tbl_df")
+  expect_named(
+    la_code_lookup,
+    c("CA", "CAName", "sending_local_authority_name")
+  )
+
+  expect_snapshot(get_la_code_opendata_lookup())
+})

From 570f395f8e3e6daec9e3c545d60a054607658c95 Mon Sep 17 00:00:00 2001
From: Jennit07 <67372904+Jennit07@users.noreply.github.com>
Date: Mon, 2 Oct 2023 10:37:27 +0100
Subject: [PATCH 02/17] Rename function `add_smrtype` (#840)

* rename to `add_smrtype`

* Rename script to `add_smrtype`

* update documentation

* Remove TODO comment

* Style code

* Update documentation

---------

Co-authored-by: Jennit07 <Jennit07@users.noreply.github.com>
Co-authored-by: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>
---
 R/{add_smr_type.R => add_smrtype.R}     | 17 +++++++----------
 R/process_extract_acute.R               |  2 +-
 R/process_extract_ae.R                  |  2 +-
 R/process_extract_care_home.R           |  2 +-
 R/process_extract_cmh.R                 |  2 +-
 R/process_extract_district_nursing.R    |  2 +-
 R/process_extract_gp_ooh.R              |  2 +-
 R/process_extract_homelessness.R        |  2 +-
 R/process_extract_maternity.R           |  2 +-
 R/process_extract_mental_health.R       |  2 +-
 R/process_extract_nrs_deaths.R          |  2 +-
 R/process_extract_outpatients.R         |  2 +-
 R/process_extract_prescribing.R         |  2 +-
 man/{add_smr_type.Rd => add_smrtype.Rd} |  8 ++++----
 14 files changed, 23 insertions(+), 26 deletions(-)
 rename R/{add_smr_type.R => add_smrtype.R} (93%)
 rename man/{add_smr_type.Rd => add_smrtype.Rd} (87%)

diff --git a/R/add_smr_type.R b/R/add_smrtype.R
similarity index 93%
rename from R/add_smr_type.R
rename to R/add_smrtype.R
index aa9e383bc..3d0959112 100644
--- a/R/add_smr_type.R
+++ b/R/add_smrtype.R
@@ -10,15 +10,12 @@
 #' @return A vector of `smrtype`
 #'
 #' @family Codes
-add_smr_type <- function(recid,
-                         mpat = NULL,
-                         ipdc = NULL,
-                         hc_service = NULL,
-                         main_applicant_flag = NULL,
-                         consultation_type = NULL) {
-  # TODO rename this function to `add_smrtype()` to match the name of the
-  # variable. Need to make sure to change all places where it is used as well.
-
+add_smrtype <- function(recid,
+                        mpat = NULL,
+                        ipdc = NULL,
+                        hc_service = NULL,
+                        main_applicant_flag = NULL,
+                        consultation_type = NULL) {
   # Situation where some recids are not in the accepted values
   if (!all(recid %in% c(
     "00B",
@@ -188,7 +185,7 @@ add_smr_type <- function(recid,
 
   if (anyNA(smrtype)) {
     cli::cli_warn(
-      "Some {.var smrtype}s were not properly set by {.fun add_smr_type}."
+      "Some {.var smrtype}s were not properly set by {.fun add_smrtype}."
     )
   }
 
diff --git a/R/process_extract_acute.R b/R/process_extract_acute.R
index c327f4b66..dcfdb47c0 100644
--- a/R/process_extract_acute.R
+++ b/R/process_extract_acute.R
@@ -45,7 +45,7 @@ process_extract_acute <- function(data, year, write_to_disk = TRUE) {
     dplyr::mutate(
       stay = calculate_stay(year, .data$record_keydate1, .data$record_keydate2),
       # create and populate SMRType
-      smrtype = add_smr_type(recid = .data$recid, ipdc = .data$ipdc)
+      smrtype = add_smrtype(recid = .data$recid, ipdc = .data$ipdc)
     ) %>%
     # Apply new costs for C3 specialty, these are taken from the 2017/18 file
     fix_c3_costs(year) %>%
diff --git a/R/process_extract_ae.R b/R/process_extract_ae.R
index 785797395..dd3823a36 100644
--- a/R/process_extract_ae.R
+++ b/R/process_extract_ae.R
@@ -62,7 +62,7 @@ process_extract_ae <- function(data, year, write_to_disk = TRUE) {
     # Create month variable
     dplyr::mutate(
       month = strftime(.data$record_keydate1, "%m"),
-      smrtype = add_smr_type(.data$recid)
+      smrtype = add_smrtype(.data$recid)
     ) %>%
     # Allocate the costs to the correct month
     create_day_episode_costs(.data$record_keydate1, .data$cost_total_net)
diff --git a/R/process_extract_care_home.R b/R/process_extract_care_home.R
index f6b3bca15..210dae531 100644
--- a/R/process_extract_care_home.R
+++ b/R/process_extract_care_home.R
@@ -58,7 +58,7 @@ process_extract_care_home <- function(
     dplyr::mutate(
       year = year,
       recid = "CH",
-      smrtype = add_smr_type(recid = "CH")
+      smrtype = add_smrtype(recid = "CH")
     ) %>%
     # compute lca variable from sending_location
     dplyr::mutate(
diff --git a/R/process_extract_cmh.R b/R/process_extract_cmh.R
index bbce59f0f..418b95b00 100644
--- a/R/process_extract_cmh.R
+++ b/R/process_extract_cmh.R
@@ -32,7 +32,7 @@ process_extract_cmh <- function(data,
     # create recid, year, SMRType variables
     dplyr::mutate(
       recid = "CMH",
-      smrtype = add_smr_type(recid = .data$recid),
+      smrtype = add_smrtype(recid = .data$recid),
       year = year
     ) %>%
     # contact end time
diff --git a/R/process_extract_district_nursing.R b/R/process_extract_district_nursing.R
index 02f23719f..6254926f0 100644
--- a/R/process_extract_district_nursing.R
+++ b/R/process_extract_district_nursing.R
@@ -37,7 +37,7 @@ process_extract_district_nursing <- function(
     dplyr::mutate(
       year = year,
       recid = "DN",
-      smrtype = add_smr_type(recid = "DN")
+      smrtype = add_smrtype(recid = "DN")
     ) %>%
     # deal with gpprac
     dplyr::mutate(gpprac = convert_eng_gpprac_to_dummy(.data$gpprac))
diff --git a/R/process_extract_gp_ooh.R b/R/process_extract_gp_ooh.R
index 3503888b6..37cfc8f3f 100644
--- a/R/process_extract_gp_ooh.R
+++ b/R/process_extract_gp_ooh.R
@@ -62,7 +62,7 @@ process_extract_gp_ooh <- function(year, data_list, write_to_disk = TRUE) {
       # Replace location unknown with NA
       location = dplyr::na_if(.data$location, "UNKNOWN"),
       recid = "OoH",
-      smrtype = add_smr_type(.data$recid, consultation_type = .data$consultation_type),
+      smrtype = add_smrtype(.data$recid, consultation_type = .data$consultation_type),
       kis_accessed = factor(
         dplyr::case_when(
           kis_accessed == "Y" ~ 1L,
diff --git a/R/process_extract_homelessness.R b/R/process_extract_homelessness.R
index c1afff837..ab674988b 100644
--- a/R/process_extract_homelessness.R
+++ b/R/process_extract_homelessness.R
@@ -37,7 +37,7 @@ process_extract_homelessness <- function(
     dplyr::mutate(
       year = as.character(year),
       recid = "HL1",
-      smrtype = add_smr_type(
+      smrtype = add_smrtype(
         recid = .data$recid,
         main_applicant_flag = .data$main_applicant_flag
       )
diff --git a/R/process_extract_maternity.R b/R/process_extract_maternity.R
index 7bb016243..eab3fb713 100644
--- a/R/process_extract_maternity.R
+++ b/R/process_extract_maternity.R
@@ -55,7 +55,7 @@ process_extract_maternity <- function(data, year, write_to_disk = TRUE) {
       discondition = factor(.data$discondition,
         levels = c(1L:5L, 8L)
       ),
-      smrtype = add_smr_type(.data$recid, .data$mpat),
+      smrtype = add_smrtype(.data$recid, .data$mpat),
       ipdc = dplyr::case_match(
         .data$smrtype,
         "Matern-IP" ~ "I",
diff --git a/R/process_extract_mental_health.R b/R/process_extract_mental_health.R
index b8d89377d..4326630fe 100644
--- a/R/process_extract_mental_health.R
+++ b/R/process_extract_mental_health.R
@@ -64,7 +64,7 @@ process_extract_mental_health <- function(data, year, write_to_disk = TRUE) {
         .data$record_keydate2
       ),
       # SMR type
-      smrtype = add_smr_type(.data$recid)
+      smrtype = add_smrtype(.data$recid)
     )
 
   mh_processed <- mh_clean %>%
diff --git a/R/process_extract_nrs_deaths.R b/R/process_extract_nrs_deaths.R
index 71e19d456..e707e74f6 100644
--- a/R/process_extract_nrs_deaths.R
+++ b/R/process_extract_nrs_deaths.R
@@ -22,7 +22,7 @@ process_extract_nrs_deaths <- function(data, year, write_to_disk = TRUE) {
       recid = "NRS",
       year = year,
       gpprac = convert_eng_gpprac_to_dummy(.data$gpprac),
-      smrtype = add_smr_type(.data$recid)
+      smrtype = add_smrtype(.data$recid)
     )
 
   if (write_to_disk) {
diff --git a/R/process_extract_outpatients.R b/R/process_extract_outpatients.R
index 86262e6b3..fdf4ee63d 100644
--- a/R/process_extract_outpatients.R
+++ b/R/process_extract_outpatients.R
@@ -28,7 +28,7 @@ process_extract_outpatients <- function(data, year, write_to_disk = TRUE) {
       # Set recid variable
       recid = "00B",
       # Set smrtype variable
-      smrtype = add_smr_type(.data$recid)
+      smrtype = add_smrtype(.data$recid)
     ) %>%
     dplyr::mutate(gpprac = convert_eng_gpprac_to_dummy(.data$gpprac)) %>%
     # compute record key date2
diff --git a/R/process_extract_prescribing.R b/R/process_extract_prescribing.R
index c54a55b65..c79e0a513 100644
--- a/R/process_extract_prescribing.R
+++ b/R/process_extract_prescribing.R
@@ -37,7 +37,7 @@ process_extract_prescribing <- function(data, year, write_to_disk = TRUE) {
       record_keydate1 = end_fy(year),
       record_keydate2 = .data$record_keydate1,
       # Add SMR type variable
-      smrtype = add_smr_type(.data$recid)
+      smrtype = add_smrtype(.data$recid)
     )
 
   # Issue a warning if rows were removed
diff --git a/man/add_smr_type.Rd b/man/add_smrtype.Rd
similarity index 87%
rename from man/add_smr_type.Rd
rename to man/add_smrtype.Rd
index 554e35575..1898ed05d 100644
--- a/man/add_smr_type.Rd
+++ b/man/add_smrtype.Rd
@@ -1,10 +1,10 @@
 % Generated by roxygen2: do not edit by hand
-% Please edit documentation in R/add_smr_type.R
-\name{add_smr_type}
-\alias{add_smr_type}
+% Please edit documentation in R/add_smrtype.R
+\name{add_smrtype}
+\alias{add_smrtype}
 \title{Add smrtype variable based on record ID}
 \usage{
-add_smr_type(
+add_smrtype(
   recid,
   mpat = NULL,
   ipdc = NULL,

From 9f2825bce02af7cc83d07b2bd73fc46b2d50432b Mon Sep 17 00:00:00 2001
From: Zihao Li <lizihao_anu@outlook.com>
Date: Wed, 18 Oct 2023 16:13:03 +0100
Subject: [PATCH 03/17] add_keep_population_flag for create_individual_file

---
 R/add_keep_population_flag.R | 151 +++++++++++++++++++++++++++++++++++
 R/create_individual_file.R   |   3 +-
 2 files changed, 153 insertions(+), 1 deletion(-)
 create mode 100644 R/add_keep_population_flag.R

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
new file mode 100644
index 000000000..93560df84
--- /dev/null
+++ b/R/add_keep_population_flag.R
@@ -0,0 +1,151 @@
+#' Add keep_popluation flag
+#'
+#' @description Add keep_population flag to individual files
+#' @param data A data frame
+#'
+#' @return A data frame with keep_population flags
+#' @family individual_file
+
+add_keep_population_flag <- function(individual_file, year) {
+  calendar_year = paste0("20", substr(year, 1, 2)) %>% as.integer()
+
+  if (!check_year_valid(year, "NSU")) {
+    individual_file <- individual_file %>%
+      dplyr::mutate(keep_population = 1L)
+  } else{
+    ## Obtain the population estimates for Locality AgeGroup and Gender.
+    pop_estimates <-
+      readRDS(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) %>%
+      dplyr::as_tibble() %>%
+      dplyr::select(year, datazone2011, sex, age0:age90plus)
+
+    # Step 1: Obtain the population estimates for Locality, AgeGroup, and Gender
+    # Select out the estimates for the year of interest.
+    # if we don't have estimates for this year (and so have to use previous year).
+    year_available = pop_estimates %>% dplyr::pull(year) %>% unique()
+    if (calendar_year %in% year_available) {
+      pop_estimates <- pop_estimates %>%
+        dplyr::filter(year == calendar_year)
+    } else{
+      previous_year <- sort(year_available, decreasing = TRUE)[1]
+      pop_estimates <- pop_estimates %>%
+        dplyr::filter(year = previous_year)
+    }
+
+    pop_estimates <- pop_estimates %>%
+      # Recode gender to make it match source.
+      dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
+      dplyr::rename("age90" = "age90plus",
+                    "gender" = "sex") %>%
+      tidyr::pivot_longer(
+        names_to = "age",
+        names_prefix = "age",
+        values_to = "population_estimate",
+        cols = "age0":"age90"
+      ) %>%
+      dplyr::mutate(
+        age = as.integer(age),
+        age_group = dplyr::case_when(
+          age >= 0 & age <= 4 ~ "0-4",
+          age >= 5 & age <= 14 ~ "5-14",
+          age >= 15 & age <= 24  ~ "15-24",
+          age >= 25 & age <= 34 ~ "25-34",
+          age >= 35 & age <= 44 ~ "35-44",
+          age >= 45 & age <= 54 ~ "45-54",
+          age >= 55 & age <= 64 ~ "55-64",
+          age >= 65 & age <= 74 ~ "65-74",
+          age >= 75 & age <= 84 ~ "75-84",
+          age >= 85 ~ "85+"
+        )
+      ) %>%
+      dplyr::left_join(
+        get_locality_path() %>%
+          readRDS() %>%
+          dplyr::select("locality" = "hscp_locality", datazone2011),
+        by = "datazone2011"
+      ) %>%
+      dplyr::group_by(locality, age_group, gender) %>%
+      dplyr::summarize(population_estimate = sum(population_estimate)) %>%
+      dplyr::ungroup()
+
+    # Step 2: Work out the current population sizes in the SLF for Locality, AgeGroup, and Gender
+    # Work out the current population sizes in the SLF for Locality AgeGroup and Gender.
+    individual_file = slfhelper::read_slf_individual(year,
+                                                     columns = c("chi",
+                                                                 "locality",
+                                                                 "age",
+                                                                 "gender",
+                                                                 # "nsu",
+                                                                 "death_date")) %>%
+      dplyr::mutate(nsu = 0L) # delete this before merge
+
+    individual_file_1 = individual_file %>%
+      dplyr::mutate(
+        age = as.integer(age),
+        age_group = dplyr::case_when(
+          age >= 0 & age <= 4 ~ "0-4",
+          age >= 5 & age <= 14 ~ "5-14",
+          age >= 15 & age <= 24  ~ "15-24",
+          age >= 25 & age <= 34 ~ "25-34",
+          age >= 35 & age <= 44 ~ "35-44",
+          age >= 45 & age <= 54 ~ "45-54",
+          age >= 55 & age <= 64 ~ "55-64",
+          age >= 65 & age <= 74 ~ "65-74",
+          age >= 75 & age <= 84 ~ "75-84",
+          age >= 85 ~ "85+"
+        )
+      )
+
+    set.seed(100)
+    mid_year = lubridate::dmy(stringr::str_glue("30-06-{calendar_year}"))
+    ## issues with age being negative
+    # If they don't have a locality, they're no good as we won't have an estimate to match them against.
+    # Same for age and gender.
+    nsu_keep_lookup = individual_file_1 %>%
+      dplyr::filter(!is.na(locality), !is.na(age)) %>%
+      # Remove people who died before the mid-point of the calender year.
+      # This will make our numbers line up better with the methodology used for the mid-year population estimates.
+      # anyone who died 5 years before the file shouldn't be in it anyway...
+      dplyr::filter(death_date > mid_year | nsu != 0) %>%
+      # Calculate the populations of the whole SLF and of the NSU.
+      dplyr::group_by(locality, age_group, gender) %>%
+      dplyr::summarise(nsu_population = sum(nsu),
+                       total_source_population = dplyr::n()) %>%
+      dplyr::left_join(pop_estimates,
+                       by = c("locality", "age_group", "gender")) %>%
+      dplyr::mutate(
+        difference = total_source_population - population_estimate,
+        new_nsu_figure = nsu_population - difference,
+        scaling_factor = new_nsu_figure / nsu_population,
+        scaling_factor = dplyr::case_when(scaling_factor < 0 ~ 0,
+                                          scaling_factor > 1 ~ 1,
+                                          .default = scaling_factor),
+        keep_nsu = rbinom(1, 1, scaling_factor)
+      ) %>%
+      dplyr::filter(keep_nsu == 1L) %>%
+      dplyr::ungroup()
+
+    individual_file = individual_file_1 %>%
+      dplyr::left_join(nsu_keep_lookup,
+                       by = c("locality", "age_group", "gender")) %>%
+      dplyr::rename("keep_population" = "keep_nsu") %>%
+      dplyr::mutate(
+        # Flag all non-NSUs as Keep.
+        keep_population = dplyr::if_else(nsu == 0, 1, keep_population),
+        # If the flag is missing they must be a non-keep NSU so set to 0.
+        keep_population = dplyr::if_else(is.na(keep_population), 0, keep_population),
+      ) %>%
+      dplyr::select(
+        -c(
+          "age_group",
+          "nsu_population",
+          "total_source_population",
+          "population_estimate",
+          "difference",
+          "new_nsu_figure",
+          "scaling_factor"
+        )
+      )
+  }
+  return(individual_file)
+}
diff --git a/R/create_individual_file.R b/R/create_individual_file.R
index cbf1777a3..6c19cee1d 100644
--- a/R/create_individual_file.R
+++ b/R/create_individual_file.R
@@ -96,7 +96,8 @@ create_individual_file <- function(
     join_sparra_hhg(year) %>%
     join_slf_lookup_vars() %>%
     dplyr::mutate(year = year) %>%
-    add_hri_variables(chi_variable = "chi")
+    add_hri_variables(chi_variable = "chi") %>%
+    add_keep_population_flag(year)
 
   if (!check_year_valid(year, type = c("CH", "HC", "AT", "SDS"))) {
     individual_file <- individual_file %>%

From 8e72800e8fa8482c35411e07dde16c92713deb3b Mon Sep 17 00:00:00 2001
From: Zihao Li <lizihao_anu@outlook.com>
Date: Wed, 18 Oct 2023 16:15:48 +0100
Subject: [PATCH 04/17] fix homelessness path

---
 R/get_source_extract_path.R     | 2 +-
 R/process_lookup_homelessness.R | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/R/get_source_extract_path.R b/R/get_source_extract_path.R
index 6be47d61a..b4ccf4920 100644
--- a/R/get_source_extract_path.R
+++ b/R/get_source_extract_path.R
@@ -64,7 +64,7 @@ get_source_extract_path <- function(year,
     "pis" ~ "prescribing_file_for_source",
     "sds" ~ "sds-for-source"
   ) %>%
-    stringr::str_glue("-{year}.parquet")
+    stringr::str_glue("-20{year}.parquet")
 
   source_extract_path <- get_file_path(
     directory = get_year_dir(year),
diff --git a/R/process_lookup_homelessness.R b/R/process_lookup_homelessness.R
index c0138d10a..42edb87a2 100644
--- a/R/process_lookup_homelessness.R
+++ b/R/process_lookup_homelessness.R
@@ -12,7 +12,7 @@
 #' @family process extracts
 create_homelessness_lookup <- function(
     year,
-    homelessness_data = read_file(get_source_extract_path(year, "Homelessness"))) {
+    homelessness_data = read_file(get_source_extract_path(year, "homelessness"))) {
   homelessness_lookup <- homelessness_data %>%
     dplyr::distinct(.data$chi, .data$record_keydate1, .data$record_keydate2) %>%
     tidyr::drop_na(.data$chi) %>%

From 280a404e09e647fa104deb8078139adfe16144d3 Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Wed, 18 Oct 2023 15:19:02 +0000
Subject: [PATCH 05/17] Update documentation

---
 man/add_acute_columns.Rd          |  1 +
 man/add_ae_columns.Rd             |  1 +
 man/add_all_columns.Rd            |  1 +
 man/add_at_columns.Rd             |  1 +
 man/add_ch_columns.Rd             |  1 +
 man/add_cij_columns.Rd            |  1 +
 man/add_cmh_columns.Rd            |  1 +
 man/add_dd_columns.Rd             |  1 +
 man/add_dn_columns.Rd             |  1 +
 man/add_gls_columns.Rd            |  1 +
 man/add_hc_columns.Rd             |  1 +
 man/add_hl1_columns.Rd            |  1 +
 man/add_ipdc_cols.Rd              |  1 +
 man/add_keep_population_flag.Rd   | 48 +++++++++++++++++++++++++++++++
 man/add_mat_columns.Rd            |  1 +
 man/add_mh_columns.Rd             |  1 +
 man/add_nrs_columns.Rd            |  1 +
 man/add_nsu_columns.Rd            |  1 +
 man/add_ooh_columns.Rd            |  1 +
 man/add_op_columns.Rd             |  1 +
 man/add_pis_columns.Rd            |  1 +
 man/add_sds_columns.Rd            |  1 +
 man/add_standard_cols.Rd          |  1 +
 man/clean_up_ch.Rd                |  1 +
 man/condition_cols.Rd             |  1 +
 man/create_homelessness_lookup.Rd |  2 +-
 man/create_individual_file.Rd     |  1 +
 man/recode_gender.Rd              |  1 +
 man/remove_blank_chi.Rd           |  1 +
 29 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 man/add_keep_population_flag.Rd

diff --git a/man/add_acute_columns.Rd b/man/add_acute_columns.Rd
index b7be171cf..104c0e87d 100644
--- a/man/add_acute_columns.Rd
+++ b/man/add_acute_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_ae_columns.Rd b/man/add_ae_columns.Rd
index 37d60f466..288b98e9f 100644
--- a/man/add_ae_columns.Rd
+++ b/man/add_ae_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_all_columns.Rd b/man/add_all_columns.Rd
index 2aba7f5ad..345a59e01 100644
--- a/man/add_all_columns.Rd
+++ b/man/add_all_columns.Rd
@@ -27,6 +27,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_at_columns.Rd b/man/add_at_columns.Rd
index 537a01f40..4ed268c28 100644
--- a/man/add_at_columns.Rd
+++ b/man/add_at_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_ch_columns.Rd b/man/add_ch_columns.Rd
index 360bb29db..15188c090 100644
--- a/man/add_ch_columns.Rd
+++ b/man/add_ch_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_cij_columns.Rd b/man/add_cij_columns.Rd
index f8d2528f2..3e0020a8c 100644
--- a/man/add_cij_columns.Rd
+++ b/man/add_cij_columns.Rd
@@ -26,6 +26,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_cmh_columns.Rd b/man/add_cmh_columns.Rd
index 654e03f75..1eb12056a 100644
--- a/man/add_cmh_columns.Rd
+++ b/man/add_cmh_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_dd_columns.Rd b/man/add_dd_columns.Rd
index a920a7979..420423c96 100644
--- a/man/add_dd_columns.Rd
+++ b/man/add_dd_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_dn_columns.Rd b/man/add_dn_columns.Rd
index 6d6fa61cb..5fef0cf68 100644
--- a/man/add_dn_columns.Rd
+++ b/man/add_dn_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_gls_columns.Rd b/man/add_gls_columns.Rd
index 84c49848a..ef17cbb12 100644
--- a/man/add_gls_columns.Rd
+++ b/man/add_gls_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_hc_columns.Rd b/man/add_hc_columns.Rd
index d5154acfd..d19301fd4 100644
--- a/man/add_hc_columns.Rd
+++ b/man/add_hc_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_gls_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_hl1_columns.Rd b/man/add_hl1_columns.Rd
index 87df2969b..13b41865d 100644
--- a/man/add_hl1_columns.Rd
+++ b/man/add_hl1_columns.Rd
@@ -30,6 +30,7 @@ Other individual_file:
 \code{\link{add_gls_columns}()},
 \code{\link{add_hc_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_ipdc_cols.Rd b/man/add_ipdc_cols.Rd
index f78ddd981..3ebf8c0ff 100644
--- a/man/add_ipdc_cols.Rd
+++ b/man/add_ipdc_cols.Rd
@@ -36,6 +36,7 @@ Other individual_file:
 \code{\link{add_gls_columns}()},
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_keep_population_flag.Rd b/man/add_keep_population_flag.Rd
new file mode 100644
index 000000000..546bf8e03
--- /dev/null
+++ b/man/add_keep_population_flag.Rd
@@ -0,0 +1,48 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/add_keep_population_flag.R
+\name{add_keep_population_flag}
+\alias{add_keep_population_flag}
+\title{Add keep_popluation flag}
+\usage{
+add_keep_population_flag(individual_file, year)
+}
+\arguments{
+\item{data}{A data frame}
+}
+\value{
+A data frame with keep_population flags
+}
+\description{
+Add keep_population flag to individual files
+}
+\seealso{
+Other individual_file: 
+\code{\link{add_acute_columns}()},
+\code{\link{add_ae_columns}()},
+\code{\link{add_all_columns}()},
+\code{\link{add_at_columns}()},
+\code{\link{add_ch_columns}()},
+\code{\link{add_cij_columns}()},
+\code{\link{add_cmh_columns}()},
+\code{\link{add_dd_columns}()},
+\code{\link{add_dn_columns}()},
+\code{\link{add_gls_columns}()},
+\code{\link{add_hc_columns}()},
+\code{\link{add_hl1_columns}()},
+\code{\link{add_ipdc_cols}()},
+\code{\link{add_mat_columns}()},
+\code{\link{add_mh_columns}()},
+\code{\link{add_nrs_columns}()},
+\code{\link{add_nsu_columns}()},
+\code{\link{add_ooh_columns}()},
+\code{\link{add_op_columns}()},
+\code{\link{add_pis_columns}()},
+\code{\link{add_sds_columns}()},
+\code{\link{add_standard_cols}()},
+\code{\link{clean_up_ch}()},
+\code{\link{condition_cols}()},
+\code{\link{create_individual_file}()},
+\code{\link{recode_gender}()},
+\code{\link{remove_blank_chi}()}
+}
+\concept{individual_file}
diff --git a/man/add_mat_columns.Rd b/man/add_mat_columns.Rd
index 8c4e26290..f78527051 100644
--- a/man/add_mat_columns.Rd
+++ b/man/add_mat_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
 \code{\link{add_nsu_columns}()},
diff --git a/man/add_mh_columns.Rd b/man/add_mh_columns.Rd
index 64c1ded97..221a39a73 100644
--- a/man/add_mh_columns.Rd
+++ b/man/add_mh_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_nrs_columns}()},
 \code{\link{add_nsu_columns}()},
diff --git a/man/add_nrs_columns.Rd b/man/add_nrs_columns.Rd
index e793fefb0..420fb0f89 100644
--- a/man/add_nrs_columns.Rd
+++ b/man/add_nrs_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nsu_columns}()},
diff --git a/man/add_nsu_columns.Rd b/man/add_nsu_columns.Rd
index bb72fab58..4b5b5e2aa 100644
--- a/man/add_nsu_columns.Rd
+++ b/man/add_nsu_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_ooh_columns.Rd b/man/add_ooh_columns.Rd
index 9caf53eac..36acea4af 100644
--- a/man/add_ooh_columns.Rd
+++ b/man/add_ooh_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_op_columns.Rd b/man/add_op_columns.Rd
index 52ba219cf..33fc5d7b2 100644
--- a/man/add_op_columns.Rd
+++ b/man/add_op_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_pis_columns.Rd b/man/add_pis_columns.Rd
index 1b94ba8f7..11417e814 100644
--- a/man/add_pis_columns.Rd
+++ b/man/add_pis_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_sds_columns.Rd b/man/add_sds_columns.Rd
index 167290d54..6f293696e 100644
--- a/man/add_sds_columns.Rd
+++ b/man/add_sds_columns.Rd
@@ -31,6 +31,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/add_standard_cols.Rd b/man/add_standard_cols.Rd
index 3d0e1e69e..5bb286522 100644
--- a/man/add_standard_cols.Rd
+++ b/man/add_standard_cols.Rd
@@ -42,6 +42,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/clean_up_ch.Rd b/man/clean_up_ch.Rd
index 9dadbd808..786e9581d 100644
--- a/man/clean_up_ch.Rd
+++ b/man/clean_up_ch.Rd
@@ -29,6 +29,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/condition_cols.Rd b/man/condition_cols.Rd
index 8cbbda825..e536847a7 100644
--- a/man/condition_cols.Rd
+++ b/man/condition_cols.Rd
@@ -26,6 +26,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/create_homelessness_lookup.Rd b/man/create_homelessness_lookup.Rd
index 4a0be24f9..610a96c26 100644
--- a/man/create_homelessness_lookup.Rd
+++ b/man/create_homelessness_lookup.Rd
@@ -6,7 +6,7 @@
 \usage{
 create_homelessness_lookup(
   year,
-  homelessness_data = read_file(get_source_extract_path(year, "Homelessness"))
+  homelessness_data = read_file(get_source_extract_path(year, "homelessness"))
 )
 }
 \arguments{
diff --git a/man/create_individual_file.Rd b/man/create_individual_file.Rd
index 128819711..e8c46ad47 100644
--- a/man/create_individual_file.Rd
+++ b/man/create_individual_file.Rd
@@ -48,6 +48,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/recode_gender.Rd b/man/recode_gender.Rd
index 4d1094b4d..71c9e9c43 100644
--- a/man/recode_gender.Rd
+++ b/man/recode_gender.Rd
@@ -27,6 +27,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},
diff --git a/man/remove_blank_chi.Rd b/man/remove_blank_chi.Rd
index 8133d5313..8ff86d0c2 100644
--- a/man/remove_blank_chi.Rd
+++ b/man/remove_blank_chi.Rd
@@ -27,6 +27,7 @@ Other individual_file:
 \code{\link{add_hc_columns}()},
 \code{\link{add_hl1_columns}()},
 \code{\link{add_ipdc_cols}()},
+\code{\link{add_keep_population_flag}()},
 \code{\link{add_mat_columns}()},
 \code{\link{add_mh_columns}()},
 \code{\link{add_nrs_columns}()},

From 0528d3160cdd995feae6164f065435d3f8e722e4 Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Wed, 18 Oct 2023 15:23:31 +0000
Subject: [PATCH 06/17] Style code

---
 R/add_keep_population_flag.R | 62 +++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index 93560df84..686328423 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -7,12 +7,12 @@
 #' @family individual_file
 
 add_keep_population_flag <- function(individual_file, year) {
-  calendar_year = paste0("20", substr(year, 1, 2)) %>% as.integer()
+  calendar_year <- paste0("20", substr(year, 1, 2)) %>% as.integer()
 
   if (!check_year_valid(year, "NSU")) {
     individual_file <- individual_file %>%
       dplyr::mutate(keep_population = 1L)
-  } else{
+  } else {
     ## Obtain the population estimates for Locality AgeGroup and Gender.
     pop_estimates <-
       readRDS(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) %>%
@@ -22,11 +22,13 @@ add_keep_population_flag <- function(individual_file, year) {
     # Step 1: Obtain the population estimates for Locality, AgeGroup, and Gender
     # Select out the estimates for the year of interest.
     # if we don't have estimates for this year (and so have to use previous year).
-    year_available = pop_estimates %>% dplyr::pull(year) %>% unique()
+    year_available <- pop_estimates %>%
+      dplyr::pull(year) %>%
+      unique()
     if (calendar_year %in% year_available) {
       pop_estimates <- pop_estimates %>%
         dplyr::filter(year == calendar_year)
-    } else{
+    } else {
       previous_year <- sort(year_available, decreasing = TRUE)[1]
       pop_estimates <- pop_estimates %>%
         dplyr::filter(year = previous_year)
@@ -35,8 +37,10 @@ add_keep_population_flag <- function(individual_file, year) {
     pop_estimates <- pop_estimates %>%
       # Recode gender to make it match source.
       dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
-      dplyr::rename("age90" = "age90plus",
-                    "gender" = "sex") %>%
+      dplyr::rename(
+        "age90" = "age90plus",
+        "gender" = "sex"
+      ) %>%
       tidyr::pivot_longer(
         names_to = "age",
         names_prefix = "age",
@@ -48,7 +52,7 @@ add_keep_population_flag <- function(individual_file, year) {
         age_group = dplyr::case_when(
           age >= 0 & age <= 4 ~ "0-4",
           age >= 5 & age <= 14 ~ "5-14",
-          age >= 15 & age <= 24  ~ "15-24",
+          age >= 15 & age <= 24 ~ "15-24",
           age >= 25 & age <= 34 ~ "25-34",
           age >= 35 & age <= 44 ~ "35-44",
           age >= 45 & age <= 54 ~ "45-54",
@@ -70,22 +74,25 @@ add_keep_population_flag <- function(individual_file, year) {
 
     # Step 2: Work out the current population sizes in the SLF for Locality, AgeGroup, and Gender
     # Work out the current population sizes in the SLF for Locality AgeGroup and Gender.
-    individual_file = slfhelper::read_slf_individual(year,
-                                                     columns = c("chi",
-                                                                 "locality",
-                                                                 "age",
-                                                                 "gender",
-                                                                 # "nsu",
-                                                                 "death_date")) %>%
+    individual_file <- slfhelper::read_slf_individual(year,
+      columns = c(
+        "chi",
+        "locality",
+        "age",
+        "gender",
+        # "nsu",
+        "death_date"
+      )
+    ) %>%
       dplyr::mutate(nsu = 0L) # delete this before merge
 
-    individual_file_1 = individual_file %>%
+    individual_file_1 <- individual_file %>%
       dplyr::mutate(
         age = as.integer(age),
         age_group = dplyr::case_when(
           age >= 0 & age <= 4 ~ "0-4",
           age >= 5 & age <= 14 ~ "5-14",
-          age >= 15 & age <= 24  ~ "15-24",
+          age >= 15 & age <= 24 ~ "15-24",
           age >= 25 & age <= 34 ~ "25-34",
           age >= 35 & age <= 44 ~ "35-44",
           age >= 45 & age <= 54 ~ "45-54",
@@ -97,11 +104,11 @@ add_keep_population_flag <- function(individual_file, year) {
       )
 
     set.seed(100)
-    mid_year = lubridate::dmy(stringr::str_glue("30-06-{calendar_year}"))
+    mid_year <- lubridate::dmy(stringr::str_glue("30-06-{calendar_year}"))
     ## issues with age being negative
     # If they don't have a locality, they're no good as we won't have an estimate to match them against.
     # Same for age and gender.
-    nsu_keep_lookup = individual_file_1 %>%
+    nsu_keep_lookup <- individual_file_1 %>%
       dplyr::filter(!is.na(locality), !is.na(age)) %>%
       # Remove people who died before the mid-point of the calender year.
       # This will make our numbers line up better with the methodology used for the mid-year population estimates.
@@ -109,25 +116,30 @@ add_keep_population_flag <- function(individual_file, year) {
       dplyr::filter(death_date > mid_year | nsu != 0) %>%
       # Calculate the populations of the whole SLF and of the NSU.
       dplyr::group_by(locality, age_group, gender) %>%
-      dplyr::summarise(nsu_population = sum(nsu),
-                       total_source_population = dplyr::n()) %>%
+      dplyr::summarise(
+        nsu_population = sum(nsu),
+        total_source_population = dplyr::n()
+      ) %>%
       dplyr::left_join(pop_estimates,
-                       by = c("locality", "age_group", "gender")) %>%
+        by = c("locality", "age_group", "gender")
+      ) %>%
       dplyr::mutate(
         difference = total_source_population - population_estimate,
         new_nsu_figure = nsu_population - difference,
         scaling_factor = new_nsu_figure / nsu_population,
         scaling_factor = dplyr::case_when(scaling_factor < 0 ~ 0,
-                                          scaling_factor > 1 ~ 1,
-                                          .default = scaling_factor),
+          scaling_factor > 1 ~ 1,
+          .default = scaling_factor
+        ),
         keep_nsu = rbinom(1, 1, scaling_factor)
       ) %>%
       dplyr::filter(keep_nsu == 1L) %>%
       dplyr::ungroup()
 
-    individual_file = individual_file_1 %>%
+    individual_file <- individual_file_1 %>%
       dplyr::left_join(nsu_keep_lookup,
-                       by = c("locality", "age_group", "gender")) %>%
+        by = c("locality", "age_group", "gender")
+      ) %>%
       dplyr::rename("keep_population" = "keep_nsu") %>%
       dplyr::mutate(
         # Flag all non-NSUs as Keep.

From 7283953a9b00cb800900e6dbd9c486c9868a4758 Mon Sep 17 00:00:00 2001
From: Megan McNicol <43570769+SwiftySalmon@users.noreply.github.com>
Date: Fri, 20 Oct 2023 14:13:59 +0100
Subject: [PATCH 07/17] change boxi file names back to capitals (#845)

A previous pull request changed all capitals to lowercase - however boxi file names have capitals so it was no longer reading in files. This is a fix

Co-authored-by: marjom02 <megan.mcnicol2@nhs.scot>
Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>
---
 R/get_boxi_extract_path.R | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/R/get_boxi_extract_path.R b/R/get_boxi_extract_path.R
index a4c2e4abc..3c2b4acdc 100644
--- a/R/get_boxi_extract_path.R
+++ b/R/get_boxi_extract_path.R
@@ -41,19 +41,19 @@ get_boxi_extract_path <- function(
 
   file_name <- dplyr::case_match(
     type,
-    "ae" ~ "a&e-episode-level-extract",
-    "ae_cup" ~ "a&e-ucd-cup-extract",
-    "acute" ~ "acute-episode-level-extract",
-    "cmh" ~ "community-mh-contact-level-extract",
-    "dn" ~ "district-nursing-contact-level-extract",
-    "gp_ooh-c" ~ "gp-ooh-consultations-extract",
-    "gp_ooh-d" ~ "gp-ooh-diagnosis-extract",
-    "gp_ooh-o" ~ "gp-ooh-outcomes-extract",
-    "homelessness" ~ "homelessness-extract",
-    "maternity" ~ "maternity-episode-level-extract",
-    "mh" ~ "mental-health-episode-level-extract",
-    "deaths" ~ "nrs-death-registrations-extract",
-    "outpatients" ~ "outpatients-episode-level-extract"
+    "ae" ~ "A&E-episode-level-extract",
+    "ae_cup" ~ "A&E-UCD-CUP-extract",
+    "acute" ~ "Acute-episode-level-extract",
+    "cmh" ~ "Community-MH-contact-level-extract",
+    "dn" ~ "District-Nursing-contact-level-extract",
+    "gp_ooh-c" ~ "GP-OoH-consultations-extract",
+    "gp_ooh-d" ~ "GP-OoH-diagnosis-extract",
+    "gp_ooh-o" ~ "GP-OoH-outcomes-extract",
+    "homelessness" ~ "Homelessness-extract",
+    "maternity" ~ "Maternity-episode-level-extract",
+    "mh" ~ "Mental-Health-episode-level-extract",
+    "deaths" ~ "NRS-death-registrations-extract",
+    "outpatients" ~ "Outpatients-episode-level-extract"
   )
 
   boxi_extract_path_csv_gz <- fs::path(

From 47f460940820bdc41211c3ded9d9e62d2be1901b Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 20 Oct 2023 14:15:00 +0100
Subject: [PATCH 08/17] Bump stefanzweifel/git-auto-commit-action from 4 to 5
 (#846)

Bumps [stefanzweifel/git-auto-commit-action](https://github.com/stefanzweifel/git-auto-commit-action) from 4 to 5.
- [Release notes](https://github.com/stefanzweifel/git-auto-commit-action/releases)
- [Changelog](https://github.com/stefanzweifel/git-auto-commit-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/stefanzweifel/git-auto-commit-action/compare/v4...v5)

---
updated-dependencies:
- dependency-name: stefanzweifel/git-auto-commit-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 .github/workflows/document.yaml | 2 +-
 .github/workflows/style.yaml    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/document.yaml b/.github/workflows/document.yaml
index 0858355fe..73626b610 100644
--- a/.github/workflows/document.yaml
+++ b/.github/workflows/document.yaml
@@ -35,6 +35,6 @@ jobs:
         shell: Rscript {0}
 
       - name: Commit and push changes
-        uses: stefanzweifel/git-auto-commit-action@v4
+        uses: stefanzweifel/git-auto-commit-action@v5
         with:
           commit_message: "Update documentation"
diff --git a/.github/workflows/style.yaml b/.github/workflows/style.yaml
index b8a242270..2efe6e4b7 100644
--- a/.github/workflows/style.yaml
+++ b/.github/workflows/style.yaml
@@ -69,6 +69,6 @@ jobs:
         shell: Rscript {0}
 
       - name: Commit and push changes
-        uses: stefanzweifel/git-auto-commit-action@v4
+        uses: stefanzweifel/git-auto-commit-action@v5
         with:
           commit_message: "Style code"

From 7151cc4491630fc9121ef31a36d21798ddb3fe6c Mon Sep 17 00:00:00 2001
From: Zihao Li <zihao.li@phs.scot>
Date: Tue, 24 Oct 2023 10:13:43 +0100
Subject: [PATCH 09/17] Apply suggestions from code review

Co-authored-by: Jennit07 <67372904+Jennit07@users.noreply.github.com>
---
 R/add_keep_population_flag.R | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index 686328423..3b52560b3 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -15,8 +15,7 @@ add_keep_population_flag <- function(individual_file, year) {
   } else {
     ## Obtain the population estimates for Locality AgeGroup and Gender.
     pop_estimates <-
-      readRDS(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) %>%
-      dplyr::as_tibble() %>%
+      readr::read_rds(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) 
       dplyr::select(year, datazone2011, sex, age0:age90plus)
 
     # Step 1: Obtain the population estimates for Locality, AgeGroup, and Gender
@@ -31,10 +30,10 @@ add_keep_population_flag <- function(individual_file, year) {
     } else {
       previous_year <- sort(year_available, decreasing = TRUE)[1]
       pop_estimates <- pop_estimates %>%
-        dplyr::filter(year = previous_year)
+        dplyr::filter(year == previous_year)
     }
 
-    pop_estimates <- pop_estimates %>%
+    pop_estimates_filtered <- pop_estimates %>%
       # Recode gender to make it match source.
       dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
       dplyr::rename(

From d80cdaab32aab909111b09898991391f0df47025 Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Tue, 24 Oct 2023 09:15:39 +0000
Subject: [PATCH 10/17] Style code

---
 R/add_keep_population_flag.R | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index 3b52560b3..d612dd237 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -15,8 +15,8 @@ add_keep_population_flag <- function(individual_file, year) {
   } else {
     ## Obtain the population estimates for Locality AgeGroup and Gender.
     pop_estimates <-
-      readr::read_rds(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) 
-      dplyr::select(year, datazone2011, sex, age0:age90plus)
+      readr::read_rds(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds"))
+    dplyr::select(year, datazone2011, sex, age0:age90plus)
 
     # Step 1: Obtain the population estimates for Locality, AgeGroup, and Gender
     # Select out the estimates for the year of interest.

From 405d4d9ca507d0dd2a482f5d3f9cf2f43d598e1a Mon Sep 17 00:00:00 2001
From: Zihao Li <lizihao_anu@outlook.com>
Date: Tue, 24 Oct 2023 13:21:26 +0100
Subject: [PATCH 11/17] some changes to add_keep_population_flag

---
 R/add_keep_population_flag.R | 107 +++++++++++++++--------------------
 1 file changed, 47 insertions(+), 60 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index d612dd237..ab332e44c 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -6,7 +6,7 @@
 #' @return A data frame with keep_population flags
 #' @family individual_file
 
-add_keep_population_flag <- function(individual_file, year) {
+add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi") {
   calendar_year <- paste0("20", substr(year, 1, 2)) %>% as.integer()
 
   if (!check_year_valid(year, "NSU")) {
@@ -33,37 +33,21 @@ add_keep_population_flag <- function(individual_file, year) {
         dplyr::filter(year == previous_year)
     }
 
-    pop_estimates_filtered <- pop_estimates %>%
+    pop_estimates <- pop_estimates %>%
       # Recode gender to make it match source.
       dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
-      dplyr::rename(
-        "age90" = "age90plus",
-        "gender" = "sex"
-      ) %>%
+      dplyr::rename("age90" = "age90plus",
+                    "gender" = "sex") %>%
       tidyr::pivot_longer(
         names_to = "age",
         names_prefix = "age",
         values_to = "population_estimate",
         cols = "age0":"age90"
       ) %>%
-      dplyr::mutate(
-        age = as.integer(age),
-        age_group = dplyr::case_when(
-          age >= 0 & age <= 4 ~ "0-4",
-          age >= 5 & age <= 14 ~ "5-14",
-          age >= 15 & age <= 24 ~ "15-24",
-          age >= 25 & age <= 34 ~ "25-34",
-          age >= 35 & age <= 44 ~ "35-44",
-          age >= 45 & age <= 54 ~ "45-54",
-          age >= 55 & age <= 64 ~ "55-64",
-          age >= 65 & age <= 74 ~ "65-74",
-          age >= 75 & age <= 84 ~ "75-84",
-          age >= 85 ~ "85+"
-        )
-      ) %>%
+      dplyr::mutate(age = as.integer(age)) %>%
+      add_age_group("age") %>%
       dplyr::left_join(
-        get_locality_path() %>%
-          readRDS() %>%
+        readr::read_rds(get_locality_path()) %>%
           dplyr::select("locality" = "hscp_locality", datazone2011),
         by = "datazone2011"
       ) %>%
@@ -74,33 +58,19 @@ add_keep_population_flag <- function(individual_file, year) {
     # Step 2: Work out the current population sizes in the SLF for Locality, AgeGroup, and Gender
     # Work out the current population sizes in the SLF for Locality AgeGroup and Gender.
     individual_file <- slfhelper::read_slf_individual(year,
-      columns = c(
-        "chi",
-        "locality",
-        "age",
-        "gender",
-        # "nsu",
-        "death_date"
-      )
-    ) %>%
-      dplyr::mutate(nsu = 0L) # delete this before merge
+                                                      columns = c(
+                                                        chi_var_name,
+                                                        "locality",
+                                                        "age",
+                                                        "gender",
+                                                        "nsu",
+                                                        "death_date"
+                                                      ))
 
     individual_file_1 <- individual_file %>%
-      dplyr::mutate(
-        age = as.integer(age),
-        age_group = dplyr::case_when(
-          age >= 0 & age <= 4 ~ "0-4",
-          age >= 5 & age <= 14 ~ "5-14",
-          age >= 15 & age <= 24 ~ "15-24",
-          age >= 25 & age <= 34 ~ "25-34",
-          age >= 35 & age <= 44 ~ "35-44",
-          age >= 45 & age <= 54 ~ "45-54",
-          age >= 55 & age <= 64 ~ "55-64",
-          age >= 65 & age <= 74 ~ "65-74",
-          age >= 75 & age <= 84 ~ "75-84",
-          age >= 85 ~ "85+"
-        )
-      )
+      dplyr::mutate(age = as.integer(age)) %>%
+      add_age_group("age")
+
 
     set.seed(100)
     mid_year <- lubridate::dmy(stringr::str_glue("30-06-{calendar_year}"))
@@ -108,37 +78,35 @@ add_keep_population_flag <- function(individual_file, year) {
     # If they don't have a locality, they're no good as we won't have an estimate to match them against.
     # Same for age and gender.
     nsu_keep_lookup <- individual_file_1 %>%
-      dplyr::filter(!is.na(locality), !is.na(age)) %>%
+      dplyr::filter(!is.na(locality),!is.na(age)) %>%
       # Remove people who died before the mid-point of the calender year.
       # This will make our numbers line up better with the methodology used for the mid-year population estimates.
       # anyone who died 5 years before the file shouldn't be in it anyway...
       dplyr::filter(death_date > mid_year | nsu != 0) %>%
       # Calculate the populations of the whole SLF and of the NSU.
       dplyr::group_by(locality, age_group, gender) %>%
-      dplyr::summarise(
-        nsu_population = sum(nsu),
-        total_source_population = dplyr::n()
-      ) %>%
+      dplyr::mutate(nsu_population = sum(nsu),
+                    total_source_population = dplyr::n()) %>%
       dplyr::left_join(pop_estimates,
-        by = c("locality", "age_group", "gender")
-      ) %>%
+                       by = c("locality", "age_group", "gender")) %>%
       dplyr::mutate(
         difference = total_source_population - population_estimate,
         new_nsu_figure = nsu_population - difference,
         scaling_factor = new_nsu_figure / nsu_population,
         scaling_factor = dplyr::case_when(scaling_factor < 0 ~ 0,
-          scaling_factor > 1 ~ 1,
-          .default = scaling_factor
-        ),
+                                          scaling_factor > 1 ~ 1,
+                                          .default = scaling_factor),
         keep_nsu = rbinom(1, 1, scaling_factor)
       ) %>%
       dplyr::filter(keep_nsu == 1L) %>%
       dplyr::ungroup()
 
+    # step 3: match the flag back onto the slf
     individual_file <- individual_file_1 %>%
       dplyr::left_join(nsu_keep_lookup,
-        by = c("locality", "age_group", "gender")
-      ) %>%
+                       by = chi_var_name,
+                       suffix = c("", ".y")) %>%
+      dplyr::select(-contains(".y")) %>%
       dplyr::rename("keep_population" = "keep_nsu") %>%
       dplyr::mutate(
         # Flag all non-NSUs as Keep.
@@ -158,5 +126,24 @@ add_keep_population_flag <- function(individual_file, year) {
         )
       )
   }
+}
+
+
+add_age_group = function(individual_file, age_var_name) {
+  individual_file <- individual_file %>%
+    dplyr::mutate(
+      age_group = dplyr::case_when(
+        {{ age_var_name }} >= 0 & {{ age_var_name }} <= 4 ~ "0-4",
+        {{ age_var_name }} >= 5 & {{ age_var_name }} <= 14 ~ "5-14",
+        {{ age_var_name }} >= 15 & {{ age_var_name }} <= 24 ~ "15-24",
+        {{ age_var_name }} >= 25 & {{ age_var_name }} <= 34 ~ "25-34",
+        {{ age_var_name }} >= 35 & {{ age_var_name }} <= 44 ~ "35-44",
+        {{ age_var_name }} >= 45 & {{ age_var_name }} <= 54 ~ "45-54",
+        {{ age_var_name }} >= 55 & {{ age_var_name }} <= 64 ~ "55-64",
+        {{ age_var_name }} >= 65 & {{ age_var_name }} <= 74 ~ "65-74",
+        {{ age_var_name }} >= 75 & {{ age_var_name }} <= 84 ~ "75-84",
+        {{ age_var_name }} >= 85 ~ "85+"
+      )
+    )
   return(individual_file)
 }

From 5bc97ffd4ae330271da6c86de6a006ffc016c1a0 Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Tue, 24 Oct 2023 12:24:04 +0000
Subject: [PATCH 12/17] Style code

---
 R/add_keep_population_flag.R | 46 +++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index ab332e44c..193fde28e 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -36,8 +36,10 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     pop_estimates <- pop_estimates %>%
       # Recode gender to make it match source.
       dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
-      dplyr::rename("age90" = "age90plus",
-                    "gender" = "sex") %>%
+      dplyr::rename(
+        "age90" = "age90plus",
+        "gender" = "sex"
+      ) %>%
       tidyr::pivot_longer(
         names_to = "age",
         names_prefix = "age",
@@ -58,14 +60,15 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # Step 2: Work out the current population sizes in the SLF for Locality, AgeGroup, and Gender
     # Work out the current population sizes in the SLF for Locality AgeGroup and Gender.
     individual_file <- slfhelper::read_slf_individual(year,
-                                                      columns = c(
-                                                        chi_var_name,
-                                                        "locality",
-                                                        "age",
-                                                        "gender",
-                                                        "nsu",
-                                                        "death_date"
-                                                      ))
+      columns = c(
+        chi_var_name,
+        "locality",
+        "age",
+        "gender",
+        "nsu",
+        "death_date"
+      )
+    )
 
     individual_file_1 <- individual_file %>%
       dplyr::mutate(age = as.integer(age)) %>%
@@ -78,24 +81,28 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # If they don't have a locality, they're no good as we won't have an estimate to match them against.
     # Same for age and gender.
     nsu_keep_lookup <- individual_file_1 %>%
-      dplyr::filter(!is.na(locality),!is.na(age)) %>%
+      dplyr::filter(!is.na(locality), !is.na(age)) %>%
       # Remove people who died before the mid-point of the calender year.
       # This will make our numbers line up better with the methodology used for the mid-year population estimates.
       # anyone who died 5 years before the file shouldn't be in it anyway...
       dplyr::filter(death_date > mid_year | nsu != 0) %>%
       # Calculate the populations of the whole SLF and of the NSU.
       dplyr::group_by(locality, age_group, gender) %>%
-      dplyr::mutate(nsu_population = sum(nsu),
-                    total_source_population = dplyr::n()) %>%
+      dplyr::mutate(
+        nsu_population = sum(nsu),
+        total_source_population = dplyr::n()
+      ) %>%
       dplyr::left_join(pop_estimates,
-                       by = c("locality", "age_group", "gender")) %>%
+        by = c("locality", "age_group", "gender")
+      ) %>%
       dplyr::mutate(
         difference = total_source_population - population_estimate,
         new_nsu_figure = nsu_population - difference,
         scaling_factor = new_nsu_figure / nsu_population,
         scaling_factor = dplyr::case_when(scaling_factor < 0 ~ 0,
-                                          scaling_factor > 1 ~ 1,
-                                          .default = scaling_factor),
+          scaling_factor > 1 ~ 1,
+          .default = scaling_factor
+        ),
         keep_nsu = rbinom(1, 1, scaling_factor)
       ) %>%
       dplyr::filter(keep_nsu == 1L) %>%
@@ -104,8 +111,9 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # step 3: match the flag back onto the slf
     individual_file <- individual_file_1 %>%
       dplyr::left_join(nsu_keep_lookup,
-                       by = chi_var_name,
-                       suffix = c("", ".y")) %>%
+        by = chi_var_name,
+        suffix = c("", ".y")
+      ) %>%
       dplyr::select(-contains(".y")) %>%
       dplyr::rename("keep_population" = "keep_nsu") %>%
       dplyr::mutate(
@@ -129,7 +137,7 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
 }
 
 
-add_age_group = function(individual_file, age_var_name) {
+add_age_group <- function(individual_file, age_var_name) {
   individual_file <- individual_file %>%
     dplyr::mutate(
       age_group = dplyr::case_when(

From ab669d2377349fed9ae5de8501173758f694ca76 Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Tue, 24 Oct 2023 12:24:40 +0000
Subject: [PATCH 13/17] Update documentation

---
 man/add_keep_population_flag.Rd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/add_keep_population_flag.Rd b/man/add_keep_population_flag.Rd
index 546bf8e03..59d2d4ba9 100644
--- a/man/add_keep_population_flag.Rd
+++ b/man/add_keep_population_flag.Rd
@@ -4,7 +4,7 @@
 \alias{add_keep_population_flag}
 \title{Add keep_popluation flag}
 \usage{
-add_keep_population_flag(individual_file, year)
+add_keep_population_flag(individual_file, year, chi_var_name = "chi")
 }
 \arguments{
 \item{data}{A data frame}

From d8d4d02147775f6c38ba980418290858780cd86a Mon Sep 17 00:00:00 2001
From: Zihao Li <lizihao_anu@outlook.com>
Date: Tue, 24 Oct 2023 16:13:58 +0100
Subject: [PATCH 14/17] some changes to add_keep_population_flag

---
 R/add_keep_population_flag.R | 44 +++++++++++-------------------------
 1 file changed, 13 insertions(+), 31 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index 193fde28e..37f855c3c 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -15,8 +15,8 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
   } else {
     ## Obtain the population estimates for Locality AgeGroup and Gender.
     pop_estimates <-
-      readr::read_rds(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds"))
-    dplyr::select(year, datazone2011, sex, age0:age90plus)
+      readr::read_rds(get_datazone_pop_path("DataZone2011_pop_est_2011_2021.rds")) %>%
+      dplyr::select(year, datazone2011, sex, age0:age90plus)
 
     # Step 1: Obtain the population estimates for Locality, AgeGroup, and Gender
     # Select out the estimates for the year of interest.
@@ -36,10 +36,8 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     pop_estimates <- pop_estimates %>%
       # Recode gender to make it match source.
       dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
-      dplyr::rename(
-        "age90" = "age90plus",
-        "gender" = "sex"
-      ) %>%
+      dplyr::rename("age90" = "age90plus",
+                    "gender" = "sex") %>%
       tidyr::pivot_longer(
         names_to = "age",
         names_prefix = "age",
@@ -59,17 +57,6 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
 
     # Step 2: Work out the current population sizes in the SLF for Locality, AgeGroup, and Gender
     # Work out the current population sizes in the SLF for Locality AgeGroup and Gender.
-    individual_file <- slfhelper::read_slf_individual(year,
-      columns = c(
-        chi_var_name,
-        "locality",
-        "age",
-        "gender",
-        "nsu",
-        "death_date"
-      )
-    )
-
     individual_file_1 <- individual_file %>%
       dplyr::mutate(age = as.integer(age)) %>%
       add_age_group("age")
@@ -81,28 +68,24 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # If they don't have a locality, they're no good as we won't have an estimate to match them against.
     # Same for age and gender.
     nsu_keep_lookup <- individual_file_1 %>%
-      dplyr::filter(!is.na(locality), !is.na(age)) %>%
+      dplyr::filter(!is.na(locality),!is.na(age)) %>%
       # Remove people who died before the mid-point of the calender year.
       # This will make our numbers line up better with the methodology used for the mid-year population estimates.
       # anyone who died 5 years before the file shouldn't be in it anyway...
       dplyr::filter(death_date > mid_year | nsu != 0) %>%
       # Calculate the populations of the whole SLF and of the NSU.
       dplyr::group_by(locality, age_group, gender) %>%
-      dplyr::mutate(
-        nsu_population = sum(nsu),
-        total_source_population = dplyr::n()
-      ) %>%
+      dplyr::mutate(nsu_population = sum(nsu),
+                    total_source_population = dplyr::n()) %>%
       dplyr::left_join(pop_estimates,
-        by = c("locality", "age_group", "gender")
-      ) %>%
+                       by = c("locality", "age_group", "gender")) %>%
       dplyr::mutate(
         difference = total_source_population - population_estimate,
         new_nsu_figure = nsu_population - difference,
         scaling_factor = new_nsu_figure / nsu_population,
         scaling_factor = dplyr::case_when(scaling_factor < 0 ~ 0,
-          scaling_factor > 1 ~ 1,
-          .default = scaling_factor
-        ),
+                                          scaling_factor > 1 ~ 1,
+                                          .default = scaling_factor),
         keep_nsu = rbinom(1, 1, scaling_factor)
       ) %>%
       dplyr::filter(keep_nsu == 1L) %>%
@@ -111,9 +94,8 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # step 3: match the flag back onto the slf
     individual_file <- individual_file_1 %>%
       dplyr::left_join(nsu_keep_lookup,
-        by = chi_var_name,
-        suffix = c("", ".y")
-      ) %>%
+                       by = chi_var_name,
+                       suffix = c("", ".y")) %>%
       dplyr::select(-contains(".y")) %>%
       dplyr::rename("keep_population" = "keep_nsu") %>%
       dplyr::mutate(
@@ -137,7 +119,7 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
 }
 
 
-add_age_group <- function(individual_file, age_var_name) {
+add_age_group = function(individual_file, age_var_name) {
   individual_file <- individual_file %>%
     dplyr::mutate(
       age_group = dplyr::case_when(

From ec8728b84f421b4e2539723095a76c05116ca413 Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Tue, 24 Oct 2023 15:16:37 +0000
Subject: [PATCH 15/17] Style code

---
 R/add_keep_population_flag.R | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index 37f855c3c..d9386cdac 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -36,8 +36,10 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     pop_estimates <- pop_estimates %>%
       # Recode gender to make it match source.
       dplyr::mutate(sex = dplyr::if_else(sex == "M", 1, 2)) %>%
-      dplyr::rename("age90" = "age90plus",
-                    "gender" = "sex") %>%
+      dplyr::rename(
+        "age90" = "age90plus",
+        "gender" = "sex"
+      ) %>%
       tidyr::pivot_longer(
         names_to = "age",
         names_prefix = "age",
@@ -68,24 +70,28 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # If they don't have a locality, they're no good as we won't have an estimate to match them against.
     # Same for age and gender.
     nsu_keep_lookup <- individual_file_1 %>%
-      dplyr::filter(!is.na(locality),!is.na(age)) %>%
+      dplyr::filter(!is.na(locality), !is.na(age)) %>%
       # Remove people who died before the mid-point of the calender year.
       # This will make our numbers line up better with the methodology used for the mid-year population estimates.
       # anyone who died 5 years before the file shouldn't be in it anyway...
       dplyr::filter(death_date > mid_year | nsu != 0) %>%
       # Calculate the populations of the whole SLF and of the NSU.
       dplyr::group_by(locality, age_group, gender) %>%
-      dplyr::mutate(nsu_population = sum(nsu),
-                    total_source_population = dplyr::n()) %>%
+      dplyr::mutate(
+        nsu_population = sum(nsu),
+        total_source_population = dplyr::n()
+      ) %>%
       dplyr::left_join(pop_estimates,
-                       by = c("locality", "age_group", "gender")) %>%
+        by = c("locality", "age_group", "gender")
+      ) %>%
       dplyr::mutate(
         difference = total_source_population - population_estimate,
         new_nsu_figure = nsu_population - difference,
         scaling_factor = new_nsu_figure / nsu_population,
         scaling_factor = dplyr::case_when(scaling_factor < 0 ~ 0,
-                                          scaling_factor > 1 ~ 1,
-                                          .default = scaling_factor),
+          scaling_factor > 1 ~ 1,
+          .default = scaling_factor
+        ),
         keep_nsu = rbinom(1, 1, scaling_factor)
       ) %>%
       dplyr::filter(keep_nsu == 1L) %>%
@@ -94,8 +100,9 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     # step 3: match the flag back onto the slf
     individual_file <- individual_file_1 %>%
       dplyr::left_join(nsu_keep_lookup,
-                       by = chi_var_name,
-                       suffix = c("", ".y")) %>%
+        by = chi_var_name,
+        suffix = c("", ".y")
+      ) %>%
       dplyr::select(-contains(".y")) %>%
       dplyr::rename("keep_population" = "keep_nsu") %>%
       dplyr::mutate(
@@ -119,7 +126,7 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
 }
 
 
-add_age_group = function(individual_file, age_var_name) {
+add_age_group <- function(individual_file, age_var_name) {
   individual_file <- individual_file %>%
     dplyr::mutate(
       age_group = dplyr::case_when(

From f5b8c8a615619fc8d18bfd252e2b24ec68f06d0e Mon Sep 17 00:00:00 2001
From: Zihao Li <lizihao_anu@outlook.com>
Date: Tue, 31 Oct 2023 11:17:34 +0000
Subject: [PATCH 16/17] minor changes

---
 R/add_keep_population_flag.R | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/R/add_keep_population_flag.R b/R/add_keep_population_flag.R
index d9386cdac..307245391 100644
--- a/R/add_keep_population_flag.R
+++ b/R/add_keep_population_flag.R
@@ -1,12 +1,12 @@
 #' Add keep_popluation flag
 #'
 #' @description Add keep_population flag to individual files
-#' @param data A data frame
+#' @param individual_file individual files under processing
+#' @param year the year of individual files under processing
 #'
 #' @return A data frame with keep_population flags
 #' @family individual_file
-
-add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi") {
+add_keep_population_flag <- function(individual_file, year) {
   calendar_year <- paste0("20", substr(year, 1, 2)) %>% as.integer()
 
   if (!check_year_valid(year, "NSU")) {
@@ -59,7 +59,7 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
 
     # Step 2: Work out the current population sizes in the SLF for Locality, AgeGroup, and Gender
     # Work out the current population sizes in the SLF for Locality AgeGroup and Gender.
-    individual_file_1 <- individual_file %>%
+    individual_file <- individual_file %>%
       dplyr::mutate(age = as.integer(age)) %>%
       add_age_group("age")
 
@@ -69,7 +69,7 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
     ## issues with age being negative
     # If they don't have a locality, they're no good as we won't have an estimate to match them against.
     # Same for age and gender.
-    nsu_keep_lookup <- individual_file_1 %>%
+    nsu_keep_lookup <- individual_file %>%
       dplyr::filter(!is.na(locality), !is.na(age)) %>%
       # Remove people who died before the mid-point of the calender year.
       # This will make our numbers line up better with the methodology used for the mid-year population estimates.
@@ -98,9 +98,9 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
       dplyr::ungroup()
 
     # step 3: match the flag back onto the slf
-    individual_file <- individual_file_1 %>%
+    individual_file <- individual_file %>%
       dplyr::left_join(nsu_keep_lookup,
-        by = chi_var_name,
+        by = "chi",
         suffix = c("", ".y")
       ) %>%
       dplyr::select(-contains(".y")) %>%
@@ -126,6 +126,13 @@ add_keep_population_flag <- function(individual_file, year, chi_var_name = "chi"
 }
 
 
+#' add_age_group
+#'
+#' @description Add age group columns based on age
+#' @param individual_file the individual files under processing
+#' @param age_var_name the column name of age variable, could be "age"
+#'
+#' @return A individual file with age groups added
 add_age_group <- function(individual_file, age_var_name) {
   individual_file <- individual_file %>%
     dplyr::mutate(

From 66807e144de6110e73d0af082e874a5cd62f7fff Mon Sep 17 00:00:00 2001
From: lizihao-anu <lizihao-anu@users.noreply.github.com>
Date: Tue, 31 Oct 2023 11:31:04 +0000
Subject: [PATCH 17/17] Update documentation

---
 man/add_age_group.Rd            | 19 +++++++++++++++++++
 man/add_keep_population_flag.Rd |  6 ++++--
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 man/add_age_group.Rd

diff --git a/man/add_age_group.Rd b/man/add_age_group.Rd
new file mode 100644
index 000000000..00d32d63e
--- /dev/null
+++ b/man/add_age_group.Rd
@@ -0,0 +1,19 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/add_keep_population_flag.R
+\name{add_age_group}
+\alias{add_age_group}
+\title{add_age_group}
+\usage{
+add_age_group(individual_file, age_var_name)
+}
+\arguments{
+\item{individual_file}{the individual files under processing}
+
+\item{age_var_name}{the column name of age variable, could be "age"}
+}
+\value{
+A individual file with age groups added
+}
+\description{
+Add age group columns based on age
+}
diff --git a/man/add_keep_population_flag.Rd b/man/add_keep_population_flag.Rd
index 59d2d4ba9..23073aea0 100644
--- a/man/add_keep_population_flag.Rd
+++ b/man/add_keep_population_flag.Rd
@@ -4,10 +4,12 @@
 \alias{add_keep_population_flag}
 \title{Add keep_popluation flag}
 \usage{
-add_keep_population_flag(individual_file, year, chi_var_name = "chi")
+add_keep_population_flag(individual_file, year)
 }
 \arguments{
-\item{data}{A data frame}
+\item{individual_file}{individual files under processing}
+
+\item{year}{the year of individual files under processing}
 }
 \value{
 A data frame with keep_population flags