Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scottish postcode converter (function) #3

Open
Nic-Chr opened this issue Jul 6, 2021 · 3 comments
Open

Scottish postcode converter (function) #3

Nic-Chr opened this issue Jul 6, 2021 · 3 comments

Comments

@Nic-Chr
Copy link
Contributor

Nic-Chr commented Jul 6, 2021

Proposing a function to convert postcodes to variables found in the Scottish postcode directory.

postcode_match <- function(x, group, factor = FALSE, ...){
  if (length(group) > 1) stop("Please supply a group of length 1")
  y <- gsub(" ", "", x, fixed = TRUE)
  y <- toupper(y)
  utils::data("spd_2021_1", envir = environment(), package = "phsmethods")
  postcodes <- spd_2021_1[["postcode"]]
  group_names <- spd_2021_1[[group]]
  y <- group_names[match(y, postcodes)]
  if (factor) {
    x_levels <- sort(unique(group_names))
    if (sum(is.na(y)) > 0) x_levels <- c(x_levels, NA_character_)
    y <- factor(y, levels = x_levels, ...)
  }
  return(y)
}

It could work like below:

> postcode_match("G2 1AL", group = "ca2019name")
[1] "Glasgow City"
> postcode_match("G2 1AL", group = "hb2019name")
[1] "Greater Glasgow and Clyde"
> postcode_match("G2 1AL", group = "date_of_introduction")
[1] "2011-05-03"
> postcode_match("G2 1AL", group = "hb2019name", factor = TRUE)
[1] Greater Glasgow and Clyde
14 Levels: Ayrshire and Arran Borders Dumfries and Galloway Fife Forth Valley Grampian Greater Glasgow and Clyde Highland Lanarkshire Lothian ... Western Isles
@Moohan
Copy link
Member

Moohan commented May 2, 2023

I think this type of function would be super useful and I would love to see it added to the package. However, I think it has some serious issues that I don't have an immediate solution to.

  1. The data would need to be added to phsmethods, making the package very large. This isn't desirable and particularly is something that CRAN would likely object to.
  2. The data would go out of date. This is the same issue as with match_area (see match_area / area_lookup get out of date phsmethods#71). For that function, there is some code in the package to update the data but that relies on 1) package maintainers remembering to regularly update the data and 2) users updating to the latest version to ensure they have the latest data.

Some ideas I had to work around this would be:

  • Make the package get the data as needed through some type of API request (harder with postcode lookup as I don't think there is an NHS / SG host of it, so we'd have to use a 3rd party one like postcodes.io/) and then use caching to ensure it's only once per postcode per session etc. Issues with this are the complexity it would add to the package (API requests + caching) and the fact that an internet connection to the API would be needed etc.
  • Include the data and make the update of it automated in some way with GitHub actions. This means bundling the data with the package and so all the negatives of that. It would also be non-trivial to set up and would still rely on maintainers including it in new package versions and users then updating.
  • Some combination of the above, where we have a separate package for the data, which phsmethods conditionally depends on, and a check to ensure you have the latest version of it whenever using a function which needs it. I've seen this done before in the gender package (note I'm pretty sure it's no longer maintained.

@Nic-Chr
Copy link
Contributor Author

Nic-Chr commented Mar 12, 2024

Thanks @Moohan,

  1. I agree, I have an internal package that uses a cut down compressed rda format of the SPD and it's 3MB, which is still quite large considering it's a standalone dataset.

I agree that a good solution to this would be to create a separate package containing the postcode directory that could sit within phsverse.

Alternatively we could create a function within phsmethods to allow the user to download the SPD on-demand, which would then get loaded into their R environment for use for the rest of the session. This would require a dependency on an API and some code to make sure it works as expected.

I would lean towards creating a package for the directory (and maybe other common Scottish lookup files, though not sure if this already exists).

@Nic-Chr
Copy link
Contributor Author

Nic-Chr commented Mar 12, 2024

  1. If we do decide to go with a separate package, updating it every say 6 months or so should be fairly manageable I would think.

@Moohan Moohan transferred this issue from Public-Health-Scotland/phsmethods Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants