Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get latest resource #36

Merged
merged 26 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
223527d
Update documentation
Moohan Jul 4, 2024
05bd504
Split `get_latest_resource` to its own branch
Moohan Jul 8, 2024
b018382
Update R/get_latest_resource.R
ross-hull Jul 16, 2024
5c25ada
Update documentation R/get_latest_resource.R
ross-hull Jul 16, 2024
7e8aa12
Update R/get_latest_resource_id.R
ross-hull Jul 16, 2024
bfb7121
Update R/get_latest_resource_id.R
ross-hull Jul 16, 2024
5d07f46
Update R/get_latest_resource_id.R
ross-hull Jul 16, 2024
18172c2
Update R/get_latest_resource_id.R
ross-hull Jul 16, 2024
010310d
Style code (GHA)
ross-hull Jul 16, 2024
8721386
move dataset name checks to get_latest_resource
ross-hull Jul 16, 2024
a22acf3
change tests to accomodate moving dataset checks to get_latest_resource
ross-hull Jul 16, 2024
4e2da8e
Style code (GHA)
ross-hull Jul 16, 2024
1ea8c6e
remove unnecisary dataset name checks get_latest_resource
ross-hull Jul 31, 2024
8eaaa78
Update documentation
ross-hull Jul 31, 2024
2068ad1
Update R/get_latest_resource.R
ross-hull Jul 31, 2024
dd4a3ae
Update R/get_latest_resource.R
ross-hull Jul 31, 2024
fea33fa
Add `{rlang}` to imports
Moohan Aug 1, 2024
663335e
Use `@inheritParams` to simplify documentation
Moohan Aug 1, 2024
912eca4
update applicable datasets
ross-hull Aug 1, 2024
6e2aee9
Update documentation
ross-hull Aug 1, 2024
900c509
Merge branch 'master' into get_latest_resource
Moohan Aug 1, 2024
5748bfe
Present datasets as 'values'
Moohan Aug 1, 2024
40d7d3f
Sort the list of 'applicable datasets' for easier maintenance
Moohan Aug 1, 2024
6786084
Fix typo in comment
Moohan Aug 1, 2024
2518355
Update get_latest_resource.R
Moohan Aug 1, 2024
a82b253
Update documentation
Moohan Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ Config/testthat/parallel: true
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

export("%>%")
export(get_dataset)
export(get_latest_resource)
export(get_resource)
export(get_resource_sql)
importFrom(magrittr,"%>%")
105 changes: 105 additions & 0 deletions R/get_latest_resource.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#' Get the latest resource from a data set
#'
#' Returns the latest resource available in a dataset.
#'
#' There are some datasets on the open data platform that
#' keep historic resources instead of updating existing ones.
#' For these it is useful to be able to retrieve the latest
#' resource. As of 5.7.2024 these data sets include:
#' * gp-practice-populations
#' * gp-practice-contact-details-and-list-sizes
#' * nhsscotland-payments-to-general-practice
#' * dental-practices-and-patient-registrations
#' * general-practitioner-contact-details
#' * prescribed-dispensed
#' * prescriptions-in-the-community
#' * community-pharmacy-contractor-activity
#'
ross-hull marked this conversation as resolved.
Show resolved Hide resolved
#' @param dataset_name name of the dataset as found on
#' \href{https://www.opendata.nhs.scot/}{NHS Open Data platform}
#' @param rows (optional) specify the max number of rows to return.
#' @param row_filters (optional) a named list or vector that specifies values of
#' columns/fields to keep.
#' e.g. list(Date = 20220216, Sex = "Female").
#' @param col_select (optional) a character vector containing the names of
#' desired columns/fields.
#' e.g. c("Date", "Sex").
#' @param include_context (optional) If `TRUE` additional information about the
#' resource will be added as columns to the data, including the resource ID, the
#' resource name, the creation date and the last modified/updated date.
Moohan marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @return a [tibble][tibble::tibble-package] with the data
#' @export
#'
#' @examples
#' dataset_name <- "gp-practice-contact-details-and-list-sizes"
#'
#' data <- get_latest_resource(dataset_name)
#'
#' filters <- list("Postcode" = "DD11 1ES")
#' wanted_cols <- c("PracticeCode", "Postcode", "Dispensing")
#'
#' filtered_data <- get_latest_resource(
#' dataset_name = dataset_name,
#' row_filters = filters,
#' col_select = wanted_cols
#' )
#'
get_latest_resource <- function(dataset_name,
rows = NULL,
row_filters = NULL,
col_select = NULL,
include_context = FALSE) {
applicable_datasets <- c(
"gp-practice-populations", "gp-practice-contact-details-and-list-sizes",
"nhsscotland-payments-to-general-practice", "dental-practices-and-patient-registrations",
"general-practitioner-contact-details", "prescribed-dispensed",
"prescriptions-in-the-community", "community-pharmacy-contractor-activity"
ross-hull marked this conversation as resolved.
Show resolved Hide resolved
)

# throw error if name type/format is invalid
check_dataset_name(dataset_name)

# define query and try API call
query <- list("id" = dataset_name)
content <- try(
phs_GET("package_show", query),
silent = TRUE
)

# if content contains a 'Not Found Error'
# throw error with suggested dataset name
if (grepl("Not Found Error", content[1])) {
suggest_dataset_name(dataset_name)
}

# check if data set is within applicable datasets
# throw error if not
if (!dataset_name %in% applicable_datasets) {
cli::cli_abort(
c(
"The dataset name supplied {.var {dataset_name}} is not within the applicable datasets.
These are:\n
{.var {applicable_datasets}}",
"x" = "Please see get_latest_reource documentation.",
ross-hull marked this conversation as resolved.
Show resolved Hide resolved
"i" = "You can find dataset names in the URL
of a dataset's page on {.url www.opendata.nhs.scot}."
),
call = rlang::caller_env()
)
}


# get the latest resource id
id <- get_latest_resource_id(dataset_name)

data <- get_resource(
id,
rows,
row_filters,
col_select,
include_context
ross-hull marked this conversation as resolved.
Show resolved Hide resolved
)

return(data)
}
55 changes: 55 additions & 0 deletions R/get_latest_resource_id.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#' get_latest_resource_id
#'
#' to be confident that the resource returned is the one intended
#' two conditions have to be met. It has to appear at the top of
#' of the resource list as shown on the open data platform.
#' The order they are returned via the api is the same
#' as they appear on the open data platform. It also
#' has to have the most recent date created
#'
#' There are only some datasets that this functionality
#' is relevant to, these are listed within applicable
#' datasets and are the datasets that keep historic
#' resources instead of over writing them.
#'
#' @inheritParams get_dataset
#'
#' @return a string with the resource id
get_latest_resource_id <- function(dataset_name) {
# send the api request
query <- list("id" = dataset_name)
content <- phs_GET("package_show", query)

# retrieve the resource id's from returned contect
all_ids <- purrr::map_chr(content$result$resources, ~ .x$id)


# add the id, created date and last_modified to a dataframe
id <- c()
created_date <- c()
modified_date <- c()

for (res in content$result$resources) {
id <- append(id, res$id)
created_date <- append(created_date, res$created)
modified_date <- append(modified_date, res$last_modified)
}
all_id_data <- tibble::tibble(
id = id,
created_date = strptime(created_date, format = "%FT%X", tz = "UTC"),
modified_date = strptime(modified_date, format = "%FT%X", tz = "UTC")
) %>%
dplyr::mutate(most_recent_date_created = max(created_date))

# get the first row of the rources, this will be the same that appears on the top
# on the open data platform
all_id_data_first_row <- all_id_data %>%
dplyr::slice(1)

# If the resource at the top as appearing on the open data platform also has the most
# recent date created, return it. Otherwise, error
if (all_id_data_first_row$created_date == all_id_data_first_row$most_recent_date_created) {
return(all_id_data_first_row$id)
}
cli::cli_abort("The most recent id could not be identified")
}
70 changes: 70 additions & 0 deletions man/get_latest_resource.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions man/get_latest_resource_id.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions tests/testthat/test-get_latest_resource.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
test_that("returns data for a dataset that is listed", {
expect_no_error(get_latest_resource("gp-practice-populations"))
})

test_that("returns error for a dataset that is not listed", {
expect_error(get_latest_resource("hospital-codes"))
})
7 changes: 7 additions & 0 deletions tests/testthat/test-get_latest_resource_id.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
test_that("returns data for a dataset that is listed", {
expect_no_error(get_latest_resource("gp-practice-populations"))
})

test_that("returns error for a dataset that is not listed", {
expect_error(get_latest_resource("hospital-codes"))
})