-
Notifications
You must be signed in to change notification settings - Fork 0
/
distance_to_nearest_hospital.Rmd
217 lines (154 loc) · 10.1 KB
/
distance_to_nearest_hospital.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: "Distance/time to travel calculations for hospitals in Scotland"
author: "Jan Savinc"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
html_document:
code_folding: hide
toc: true
toc_float: true
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Load packages
```{r, warning=FALSE, message=FALSE}
## install packages from github if not yet available!
# remotes::install_github("datasciencescotland/opendatascot", force = TRUE)
# remotes::install_github("Health-SocialCare-Scotland/phsmethods", force = TRUE)
library(tidyverse) # for tidy workflow
library(readxl) # for reading excel files
library(tictoc) # for measuring time taken for calculations
# TODO: work out if OSRM would work...
# library(osrm) # for OSM routing data
```
# Introduction
For investigating increased numbers of deaths at home & other non-institutional settings in Scotland, we'll take into account the distance/travel time to the nearest hospital. For this purpose, we need to compute the distance to nearest hospital.
There are two basic approaches: straight-line distance, and road distance / road travel time:
## Straight-line distance
This can be computed using OS National Grid Reference values (Easting & Northing), using the locations of hospitals, and either:
* data zones
* postcodes
Postcodes are more precise, but also more privacy-invasive when working with administrative records. For this reason, we would ideally provide the computed distances at data linkage rather than after data delivery.
Straight-line distances don't necessarily represent a realistic measure of distance or travel time to hospital, however! The main benefit is its ease of calculation.
## Road distance / travel time
This is more realistic & representative than straight-line distance, but more difficult to obtain, since road distances and road spee (used in calculating travel time) aren't openly available like the OS grid reference data are.
# Download files & load
The following data needs downloading:
* Hospitals with geographic coordinates
* Datazone centroids with geographic coordinates (provided by Dr Laurie Berrie!)
* Postcodes with geographic coordinates
The coordinates are given in Easting & Northing coordinates in the OS National Grid basis. We can compute the straight-line distance between them using the Pythagorean identity function:
```{r}
pythagorean_distance <- function(x1, y1, x2, y2) {
distance <- sqrt((x1 - x2)^2 + (y1 - y2)^2)
}
```
## DataZones
```{r}
# url_datazones <- "https://maps.gov.scot/ATOM/shapefiles/SG_DataZoneCent_2011.zip"
#
# if (!file.exists("./distance_to_hospital/SG_DataZoneCent_2011.zip")) download.file(url = url_datazones, destfile = "./distance_to_hospital/SG_DataZoneCent_2011.zip")
datazones <- read_excel("./distance_to_hospital/datazone2011 centroids.xlsx", sheet = 1)
```
## Postcode directory
~~I'm using the August 2020 postcode directory [available here](https://geoportal.statistics.gov.uk/datasets/ons::ons-postcode-directory-august-2020/about)
I've opted for the `.csv` version and limited the columns to read because it is over 1GB in size!~~
The 2021-1 version of the Scottish postcode directory is [available here](https://www.nrscotland.gov.uk/statistics-and-data/geography/our-products/scottish-postcode-directory/2021-1)
There are a few postcodes with duplicate entries for the same postcode where the geographical location is slightly different between them. I've opted to keep the last entry in a series of duplicates, as it probably reflects the most recent information available.
```{r}
# postcodes <- read_csv(file = "./distance_to_hospital/ONSPD_AUG_2020_UK.csv", col_names = c("postcode_PCDS","easting","northing"), col_types = "__c________ii_____________________________________", skip = 1) # this format ensures we only read the postcode and the easting & northing data
# paste0(rep("_", times = 50), collapse="") # used for precisely specifying number of skipped columns!
postcodes <- read_csv(file = "./distance_to_hospital/SPD_PostcodeIndex_Cut_21_1_CSV/SmallUser.csv", col_names = c("postcode","easting","northing"), col_types = "c____ii__________________________________________________", skip = 1) # this format ensures we only read the postcode and the easting & northing data
postcodes <- # keep last entry where there is more than one record per postcode
postcodes %>%
group_by(postcode) %>%
slice(n()) %>%
ungroup
```
## Hospitals
```{r}
url_hospitals <- "https://www.opendata.nhs.scot/dataset/cbd1802e-0e04-4282-88eb-d7bdcfb120f0/resource/c698f450-eeed-41a0-88f7-c1e40a568acc/download/current-hospital_flagged20210506.csv"
if (!file.exists("./distance_to_hospital/current-hospital_flagged20210506.csv")) download.file(url = url_hospitals, destfile = "./distance_to_hospital/current-hospital_flagged20210506.csv")
hospitals <- read_csv(file = "./distance_to_hospital/current-hospital_flagged20210506.csv")
## check that the XCoordinate and YCoordinate and Easting and Northing are the same thing, by checking the DataZone centroid location against hospital location
hospitals %>%
select(LocationName, DataZone, XCoordinate, YCoordinate, BasedOnPostcode) %>%
left_join(datazones, by = c("DataZone")) %>%
mutate(distance = pythagorean_distance(XCoordinate, YCoordinate, Easting, Northing)) %>%
arrange(desc(distance))
```
### Defining acute hospitals
From the [Hospital care website of ISD Scotland](https://www.isdscotland.org/health-topics/hospital-care/):
> This section of our website brings together information on different aspects of acute hospital care, sourced from hospital administrative systems across Scotland. Note that 'Acute' hospital care excludes obstetric, psychiatric and long stay care services that are covered elsewhere on the ISD website. For reference, a listing of all Scottish hospitals (acute and non-acute) can be found by following the Hospitals link on the menu to the left of this page.
Therefore, we can use an ISD(S)1 publication, in this case on Bed Occupancy (https://www.isdscotland.org/Health-Topics/Hospital-Care/Publications/2019-11-26/Acute-Hospital-Publication/data-files/docs/20191126_Beds_by_Health_Board_of_Treatment_and_Specialty.csv), to define Acute hospitals - all hospitals included in this publication are considered acute!
Note: when I spoke to Kirsty Anderson (PHS analyst) on 19 August 2021, she suggestd I extract the list of acute hospitals from the Inpatient activity file of the report, so I will check that also. [The report is available here](https://publichealthscotland.scot/media/7504/inpatient-and-daycase-stays-by-nhs-board-of-residence-to-december-2019.csv)
However, these data aren't provided on hospital-level, but on health-board level instead, so a list of hospitals canot be obtained this way!
**Important: Kirsty Anderson of PHS said that the definition of 'acute' is not a singular definition used across the NHS. 'Acute' was defined for the purposes of their reporting hospital activity, and may not be appropriate in other contexts!**
```{r}
isds1_beds <- read_csv("./distance_to_hospital/beds_by_nhs_board_of_treatment_and_specialty.csv", guess=1e5)
acute_hospitals <- hospitals %>% semi_join(isds1_beds, by = "Location") # keep only hospitals present in the isd(s)1 dataset
# isds1_inpatients <- read_csv("./distance_to_hospital/inpatient-and-daycase-stays-by-nhs-board-of-residence-to-december-2019.csv", guess=1e5) # note: this doesn't include per-hospital data
```
This reduces the number of hospitals from `r n_distinct(hospitals$LocationName)` to `r n_distinct(acute_hospitals$LocationName)`!
# DataZone straight-line distance to nearest hospital
For this purpose, I've removed some duplicate entries in the list of hospitals - I think they are specialist units that operate within a larger hospital and are listed at the same geographical grid reference! Otherwise we get two units that are equidistant for some data zones.
```{r}
tic()
smallest_distance_straight_line_datazones <-
crossing(
datazones %>% select(DataZone),
hospitals %>% select(LocationName) %>% filter(!LocationName %in% c("Orchard View","William Fraser Centre", "Gartnavel General Hospital (Admin Purposes)")) # remove units located in same place as another hospital
) %>%
left_join(datazones %>% select(DataZone, Easting, Northing), by = "DataZone") %>%
left_join(hospitals %>% select(LocationName, XCoordinate, YCoordinate), by = "LocationName") %>%
mutate(distance = pythagorean_distance(x1 = Easting, x2 = XCoordinate, y1 = Northing, y2 = YCoordinate)) %>%
group_by(DataZone) %>%
slice_min(order_by = distance, n = 1) %>%
ungroup %>%
select(DataZone, LocationName, distance)
toc(log = TRUE)
```
## Acute hospital distances
```{r}
tic()
smallest_distance_straight_line_datazones_acute_hospitals <-
crossing(
datazones %>% select(DataZone),
acute_hospitals %>% select(LocationName)
) %>%
left_join(datazones %>% select(DataZone, Easting, Northing), by = "DataZone") %>%
left_join(hospitals %>% select(LocationName, XCoordinate, YCoordinate), by = "LocationName") %>%
mutate(distance = pythagorean_distance(x1 = Easting, x2 = XCoordinate, y1 = Northing, y2 = YCoordinate)) %>%
group_by(DataZone) %>%
slice_min(order_by = distance, n = 1) %>%
ungroup %>%
select(DataZone, LocationName, distance)
toc(log = TRUE)
```
# Postcode straight-line distance to acute hospitals
```{r}
tic()
smallest_distance_straight_line_postcodes_acute_hospitals <-
crossing(
postcodes %>% select(postcode),
acute_hospitals %>% select(LocationName)
) %>%
left_join(postcodes %>% select(postcode, easting, northing), by = "postcode") %>%
left_join(hospitals %>% select(LocationName, XCoordinate, YCoordinate), by = "LocationName") %>%
mutate(distance = pythagorean_distance(x1 = easting, x2 = XCoordinate, y1 = northing, y2 = YCoordinate)) %>%
group_by(postcode) %>%
slice_min(order_by = distance, n = 1) %>%
ungroup %>%
select(postcode, LocationName, distance) %>%
distinct
toc(log = TRUE)
smallest_distance_straight_line_postcodes_acute_hospitals %>%
write_csv(file = "./distance_to_hospital/postcode_distance_to_nearest_acute_hospital_straightline.csv")
```
# Print session info
```{r}
sessionInfo()
```