Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middle task tutorials #58

Merged
merged 82 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
7192961
Add first lesson on Epinow2
amanda-minter Oct 16, 2023
d1dbbdd
Add summary to `quantify-transmissibility`
amanda-minter Oct 17, 2023
b49c0bb
Add citation for generation time
amanda-minter Oct 17, 2023
90d317e
Update lesson times
amanda-minter Oct 17, 2023
8d33625
Add second lesson on `EpiNow2`
amanda-minter Oct 17, 2023
00df2a5
Add section on incomplete observations
amanda-minter Oct 17, 2023
193a55c
Fix typo in starting Rt dist values
amanda-minter Oct 17, 2023
d7e2e54
Minor edits to text
amanda-minter Oct 17, 2023
39a54f3
Add section on regional estimates
amanda-minter Oct 17, 2023
25d3156
Minor edits to text
amanda-minter Oct 19, 2023
e4c44d8
Edits to text
amanda-minter Oct 19, 2023
b325cbf
Minor edits to text
amanda-minter Oct 19, 2023
ce369e1
Update config.yaml
amanda-minter Oct 24, 2023
ffb7c2a
Update renv.lock
amanda-minter Oct 24, 2023
195705b
Reorder sections across two tutorials
amanda-minter Oct 24, 2023
11efb3e
Rename second tutorial to `create-forecast.Rmd`
amanda-minter Oct 24, 2023
83e3495
Update output definitions
amanda-minter Oct 24, 2023
1600c24
Edits to lesson content
amanda-minter Oct 26, 2023
6427eb1
Tidy up example code
amanda-minter Oct 26, 2023
1f8de62
Fix typos
amanda-minter Oct 26, 2023
7fd7631
Fix copy and paste error
amanda-minter Oct 30, 2023
472005d
Add callout on Stan options
amanda-minter Oct 31, 2023
4080fdd
Add number of cores
amanda-minter Oct 31, 2023
3fa3389
Style code update
amanda-minter Oct 31, 2023
89f36a2
Update episodes/quantify-transmissibility.Rmd
amanda-minter Nov 14, 2023
7d8cbe4
Update episodes/quantify-transmissibility.Rmd
amanda-minter Nov 14, 2023
925ea09
Update episodes/quantify-transmissibility.Rmd
amanda-minter Nov 14, 2023
3e90edc
Update episodes/quantify-transmissibility.Rmd
amanda-minter Nov 14, 2023
d227a3b
Update episodes/quantify-transmissibility.Rmd
amanda-minter Nov 14, 2023
eb0788e
Update episodes/quantify-transmissibility.Rmd
amanda-minter Nov 14, 2023
bcdeeae
Apply suggestions from code review
amanda-minter Nov 14, 2023
69ca0ca
Add note on EpiNow2 version
amanda-minter Nov 14, 2023
50e29f4
Remove note on relationship between R0 and Rt
amanda-minter Nov 14, 2023
3410aa7
Remove `nowcast` from heading
amanda-minter Nov 14, 2023
5b0cdd4
Add additional keypoint
amanda-minter Nov 14, 2023
6108c75
Change of phrasing
amanda-minter Nov 14, 2023
c119fce
Replace 'latent' with 'incubation'
amanda-minter Nov 14, 2023
fbde0e1
Add glossary entries
amanda-minter Nov 14, 2023
09e8da5
Clarify the levels of uncertainty
amanda-minter Nov 16, 2023
0fc3bc8
Add callout on how delays can be specified for different data sources
amanda-minter Nov 16, 2023
3278151
Add probability thresholds to expected change callout
amanda-minter Nov 16, 2023
377bfdc
Rearrange results order and explain uncertainty pattern
amanda-minter Nov 17, 2023
a51bd39
Defining transmission speed/strength
amanda-minter Nov 17, 2023
96eac64
Update quantify-transmissibility.Rmd
amanda-minter Nov 17, 2023
6dde0bb
Add callout on options for Rt for forecasting
amanda-minter Nov 17, 2023
06a1f31
Apply suggestions from code review
amanda-minter Nov 23, 2023
eca2652
update language used in prerequisites
amanda-minter Nov 23, 2023
ef22127
add data file for ebola case study challenge
amanda-minter Nov 27, 2023
986e98e
add challenge for middle tasks
amanda-minter Nov 27, 2023
14703ac
update exercise time
amanda-minter Nov 27, 2023
3777691
improve figure sizes
amanda-minter Nov 28, 2023
5f28c78
Fix typo
amanda-minter Dec 4, 2023
a7c5eb1
add project option to callout
amanda-minter Dec 4, 2023
9c096ef
update data formatting and callout
amanda-minter Dec 5, 2023
0261b93
add growth rate to glossary
amanda-minter Dec 5, 2023
080b9a7
improve summaries
amanda-minter Dec 5, 2023
0ee02d9
replace 'lesson' with 'tutorial'
amanda-minter Dec 5, 2023
3bc799d
add forecasting secondary observations section
amanda-minter Dec 7, 2023
d4c5938
fix typo
amanda-minter Dec 7, 2023
1b47bb5
fix typo
amanda-minter Dec 7, 2023
ef17a31
style code
amanda-minter Dec 7, 2023
30767a0
close renv lock bracket
avallecam Jan 29, 2024
7368ae6
install rcppparallel from rspm
avallecam Jan 29, 2024
05fe433
add install rcppparallel from rspm
avallecam Jan 29, 2024
1a3ccc7
replace links to download and read ebola_cases
avallecam Jan 29, 2024
52121a8
activate chunk output type console + stan params
avallecam Jan 29, 2024
501489c
remove the eval false tag to challenge chunks
avallecam Jan 30, 2024
0d1407a
replace paragraph with code within given gh-action
avallecam Jan 30, 2024
15a7c57
move read data for challenge within its solution
avallecam Jan 31, 2024
4a1d7ee
add new renv additions
avallecam Jan 31, 2024
efff440
use lintr addin to resolve local markers
avallecam Feb 2, 2024
3597966
use lintr addin to remove local warnings
avallecam Feb 2, 2024
fc7558f
add explicit data to forecast
avallecam Feb 2, 2024
1162bc3
fix typo
avallecam Feb 2, 2024
db09c58
add details to rt_opts() description
avallecam Feb 2, 2024
8ceddf7
fix typo for rt_opts() function
avallecam Feb 2, 2024
ff59407
clarify the usage of a model different to default
avallecam Feb 2, 2024
0d4be55
clarify the assumed scenario
avallecam Feb 2, 2024
82142a3
fix local options configuration with withr
avallecam Feb 2, 2024
8cf8d31
reactivate chunks to allow running the solution
avallecam Feb 2, 2024
c0aa6d2
use chunk_output_type inline explicit to read data
avallecam Feb 2, 2024
18e56d9
simplify how to extract result from data.table object
avallecam Feb 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ contact: '[email protected]' #FIXME

# Order of episodes in your lesson
episodes:
- quantify-transmissibility.Rmd
- create-forecast.Rmd
- simulating-transmission.Rmd
- model-choices.Rmd
- modelling-interventions.Rmd
Expand Down
348 changes: 348 additions & 0 deletions episodes/create-forecast.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,348 @@
---
title: 'Create a short-term forecast'
teaching: 30
exercises: 30
editor_options:
chunk_output_type: inline
---

```{r setup, echo = FALSE, warning = FALSE, message = FALSE}
library(EpiNow2)
withr::local_options(list(mc.cores = 4))
```


:::::::::::::::::::::::::::::::::::::: questions

- How do I create short-term forecasts from case data?
- How do I account for incomplete reporting in forecasts?


::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Learn how to make forecasts of cases using R package `EpiNow2`
- Learn how to include an observation process in estimation

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: prereq

## Prerequisites

+ Complete tutorial [Quantifying transmission](../episodes/quantify-transmissibility.md)

Learners should familiarise themselves with following concept dependencies before working through this tutorial:

**Statistics** : probability distributions, principle of Bayesian analysis.

**Epidemic theory** : Effective reproduction number.

:::::::::::::::::::::::::::::::::

## Introduction

Given case data, we can create estimates of the current and future number of cases by accounting for both delays in reporting and under reporting. To make statements about the future, we need to make an assumption of how observations up to today are related to what we expect to happen in the future. The simplest way of doing so is to assume "no change", i.e. the reproduction number remains the same in the future as last observed. In this tutorial we will create short-term forecasts by assuming the reproduction number will remain the same as it was estimated to be on the final date for which data was available.

## Create a short-term forecast

The function `epinow()` described in the previous tutorial is a wrapper for the function `estimate_infections()` used to estimate cases by date of infection. Using the same code in the previous tutorial we can extract the short-term forecast using:

```{r, echo = FALSE}
cases <- aggregate(
cases_new ~ date,
data = incidence2::covidregionaldataUK[, c("date", "cases_new")],
FUN = sum
)
colnames(cases) <- c("date", "confirm")
rt_log_mean <- convert_to_logmean(2, 1)
rt_log_sd <- convert_to_logsd(2, 1)

incubation_period_fixed <- dist_spec(
mean = 4, sd = 2,
max = 20, distribution = "gamma"
)

log_mean <- convert_to_logmean(2, 1)
log_sd <- convert_to_logsd(2, 1)
reporting_delay_fixed <- dist_spec(
mean = log_mean, sd = log_sd,
max = 10, distribution = "lognormal"
)

generation_time_fixed <- dist_spec(
mean = 3.6, sd = 3.1,
max = 20, distribution = "lognormal"
)
```


```{r, message = FALSE, eval = TRUE}
reported_cases <- cases[1:90, ]
estimates <- epinow(
reported_cases = reported_cases,
generation_time = generation_time_opts(generation_time_fixed),
delays = delay_opts(incubation_period_fixed + reporting_delay_fixed),
rt = rt_opts(prior = list(mean = rt_log_mean, sd = rt_log_sd))
)
```


We can visualise the estimates of the effective reproduction number and the estimated number of cases using `plot()`. The estimates are split into three categories:

+ **Estimate** (green): utilises all data,

+ **Estimate based on partial data** (orange): estimates that are based less data are therefore more uncertain,

+ **Forecast** (purple): forecasts into the future.


```{r}
plot(estimates)
```


::::::::::::::::::::::::::::::::::::: callout
### Forecasting with estimates of $R_t$

By default, the short-term forecasts are created using the latest estimate of the reproduction number $R_t$. As this estimate is based on partial data, it has considerable uncertainty.

The reproduction number that is projected into the future can be changed to a less recent estimate based on more data using `rt_opts()`:

```{r, eval = FALSE}
rt_opts(future = "estimate")
```

The result will be less uncertain forecasts (as they are based on $R_t$ with a narrower uncertainty interval) but the forecasts will be based on less recent estimates of $R_t$ and assume no change since then.

Additionally, there is the option to project the value of $R_t$ into the future using a generic model by setting `future = "project"`. As this option uses a model to forecast the value of $R_t$, the result will be forecasts that are more uncertain than `estimate`, for an example [see here](https://epiforecasts.io/EpiNow2/dev/articles/estimate_infections_options.html#projecting-the-reproduction-number-with-the-gaussian-process).

::::::::::::::::::::::::::::::::::::::::::::::::


### Incomplete observations

In the previous tutorial we accounted for delays in reporting. In `EpiNow2` we also can account for incomplete observations as in reality, 100% of cases are not reported.

We will pass another input into `epinow()` called `obs` to define an observation model. The format of `obs` must be the `obs_opt()` function (see `?EpiNow2::obs_opts` for more detail).

Let's say we believe the COVID-19 outbreak data from the previous tutorial do not include all reported cases. We believe that we only observe 40% of cases. To specify this in the observation model, we must pass a scaling factor with a mean and standard deviation. If we assume that 40% of cases are in the case data (with standard deviation 1%), then we specify the `scale` input to `obs_opts()` as follows:

```{r}
obs_scale <- list(mean = 0.4, sd = 0.01)
```

To run the inference framework with this observation process, we add `obs = obs_opts(scale = obs_scale)` to the input arguments of `epinow()`:

```{r, message = FALSE, eval = TRUE}
obs_scale <- list(mean = 0.4, sd = 0.01)
reported_cases <- cases[1:90, ]
estimates <- epinow(
reported_cases = reported_cases,
generation_time = generation_time_opts(generation_time_fixed),
delays = delay_opts(incubation_period_fixed + reporting_delay_fixed),
rt = rt_opts(prior = list(mean = rt_log_mean, sd = rt_log_sd)),
obs = obs_opts(scale = obs_scale)
)
summary(estimates)
```


The estimates of transmission measures such as the effective reproduction number and rate of growth are similar (or the same in value) compared to when we didn't account for incomplete observations (see previous tutorial). However the number of new confirmed cases by infection date has changed substantially in magnitude to reflect the assumption that only 40% of cases are in the data set.

We can also change the default distribution from Negative Binomial to Poisson, remove the default week effect and more. See `?EpiNow2::obs_opts` for more detail.


## Forecast secondary observations

`EpiNow2` also has the ability to estimate and forecast secondary observations e.g. deaths, hospitalisations from a primary observation e.g. cases. Here we will illustrate how to forecast the number of deaths arising from observed cases of COVID-19 in the early stages of the UK outbreak.

First, we must format our data to have the following columns:

+ `date` : the date (as a date object see `?is.Date()`),
+ `primary` : number of primary observations on that date, in this example **cases**,
+ `secondary` : number of secondary observations date, in this example **deaths**.

```{r}
reported_cases_deaths <- aggregate(
cbind(cases_new, deaths_new) ~ date,
data =
incidence2::covidregionaldataUK[, c("date", "cases_new", "deaths_new")],
FUN = sum
)
colnames(reported_cases_deaths) <- c("date", "primary", "secondary")
```


Using the first 30 days of data on cases and deaths, we will estimate the relationship between the primary and secondary observations using `estimate_secondary()`, then forecast future deaths using `forecast_secondary()`. For detail on the model see the [model documentation](https://epiforecasts.io/EpiNow2/dev/articles/estimate_secondary.html).

We must specify the type of observation using `type` in `secondary_opts()`, options include:

+ "incidence" : secondary observations arise from previous primary observations, i.e. deaths arising from recorded cases.
+ "prevalence" : secondary observations arise from a combination current primary observations and past secondary observations, i.e. hospital bed usage arising from current hospital admissions and past hospital bed usage.

In this example we specify `secondary_opts(type = "incidence")`. See `?EpiNow2::secondary_opts` for more detail).

The final key input is the delay distribution between the primary and secondary observations. Here this is the delay between case report and death, we assume this follows a gamma distribution with mean of 14 days and standard deviation of 5 days. Using `dist_spec()` we specify a fixed gamma distribution.

There are further function inputs to `estimate_secondary()` which can be specified, including adding an observation process, see `?EpiNow2::estimate_secondary` for detail on the options.

To find the model fit between cases and deaths :
```{r}
estimate_cases_to_deaths <- estimate_secondary(
reports = reported_cases_deaths[1:30, ],
secondary = secondary_opts(type = "incidence"),
delays = delay_opts(dist_spec(
mean = 14, sd = 5,
max = 30, distribution = "gamma"
))
)
```


::::::::::::::::::::::::::::::::::::: callout
### Be cautious of timescale

In the early stages of an outbreak there can be substantial changes in testing and reporting. If there are testing changes from one month to another, then there will be a bias in the model fit. Therefore, you should be cautious of the time-scale of data used in the model fit and forecast.

::::::::::::::::::::::::::::::::::::::::::::::::

We plot the model fit (shaded ribbons) with the secondary observations (bar plot) and primary observations (dotted line) as follows:

```{r}
plot(estimate_cases_to_deaths, primary = TRUE)
```

To use this model fit to forecast deaths, we pass a data frame consisting of the primary observation (cases) for dates not used in the model fit.

*Note : in this tutorial we are using data where we know the deaths and cases, so we create a data frame by extracting the cases. But in practice, this would be a different data set consisting of cases only.*
```{r}
cases_to_forecast <- reported_cases_deaths[31:60, c("date", "primary")]
colnames(cases_to_forecast) <- c("date", "value")
```

To forecast, we use the model fit `estimate_cases_to_deaths`:

```{r}
deaths_forecast <- forecast_secondary(
estimate = estimate_cases_to_deaths,
primary = cases_to_forecast
)
plot(deaths_forecast)
```

The plot shows the forecast secondary observations (deaths) over the dates which we have recorded cases for.
It is also possible to forecast deaths using forecast cases, here you would specify `primary` as the `estimates` output from `estimate_infections()`.


## Challenge : Ebola outbreak analysis

::::::::::::::::::::::::::::::::::::: challenge

Download the file [ebola_cases.csv](data/ebola_cases.csv) and read it into R. The simulated data consists of the date of symptom onset and number of confirmed cases of the early stages of the Ebola outbreak in Sierra Leone in 2014.

Using the first 3 months (120 days) of data:

1. Estimate of cases increasing or decreasing on day 120 of the outbreak (Hint: Find the effective reproduction number and growth rate on day 120)
2. Create a two week forecast of number of cases

You can use the following parameter values for the delay distribution(s) and generation time distribution.

+ Incubation period : Log normal$(2.487,0.330)$ ([Eichner et al. 2011](https://doi.org/10.1016/j.phrp.2011.04.001) via `{epiparameter}`)
+ Generation time : Gamma$(15.3, 10.1)$ ([WHO Ebola Response Team 2014](https://www.nejm.org/doi/full/10.1056/NEJMoa1411100))

You may include some uncertainty around the mean and standard deviation of these distributions.

::::::::::::::::: hint

### HINT : data format

Ensure the data is in the correct format :

+ `date` : the date (as a date object see `?is.Date()`),
+ `confirm` : number of confirmed cases on that date.


::::::::::::::::::::::


::::::::::::::::: solution

### SOLUTION

To estimate the effective reproduction number and growth rate, we will use the function `epinow()`.

As the data consists of date of symptom onset, we only need to specify a delay distribution for the incubation period and the generation time.

We specify the distributions with some uncertainty around the mean and standard deviation of the log normal distribution for the incubation period and the Gamma distribution for the generation time.

```{r,eval=TRUE,echo=TRUE}
ebola_incubation_period <- dist_spec(
mean = 2.487, sd = 0.330,
mean_sd = 0.5, sd_sd = 0.5,
max = 20, distribution = "lognormal"
)

ebola_generation_time <- dist_spec(
mean = 15.3, sd = 10.1,
mean_sd = 0.5, sd_sd = 0.5,
max = 30, distribution = "gamma"
)
```

As we want to also create a two week forecast, we specify `horizon = 14` to forecast 14 days instead of the default 7 days.

```{r, eval=TRUE,echo=FALSE}
# read data from the tutorial repository R project
ebola_cases <-
read.csv(file.path("data", "ebola_cases.csv"))
```

```{r,eval=FALSE,echo=TRUE}
# read data
# e.g.: if path to file is data/raw-data/ebola_cases.csv then:
ebola_cases <-
read.csv(here::here("data", "raw-data", "ebola_cases.csv"))
```

```{r,eval=TRUE,echo=TRUE, message = FALSE}
# format date column
ebola_cases$date <- as.Date(ebola_cases$date)

ebola_estimates <- epinow(
reported_cases = ebola_cases[1:120, ], # first 3 months of data only
generation_time = generation_time_opts(ebola_generation_time),
delays = delay_opts(ebola_incubation_period),
# horizon needs to be 14 days to create two week forecast (default is 7 days)
horizon = 14
)

summary(ebola_estimates)
```

The effective reproduction number $R_t$ estimate (on the last date of the data) is `r summary(ebola_estimates)[measure=="Effective reproduction no."]$estimate`. The exponential growth rate of case numbers is `r summary(ebola_estimates)[measure=="Rate of growth"]$estimate`.

Visualize the estimates:

```{r,eval=FALSE,echo=TRUE}
plot(ebola_estimates)
```

:::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::::::::::::::

## Summary

`EpiNow2` can be used to create short term forecasts and to estimate the relationship between different outcomes. There are a range of model options that can be implemented for different analysis, including adding an observational process to account for incomplete reporting. See the [vignette](https://epiforecasts.io/EpiNow2/dev/articles/estimate_infections_options.html) for more details on different model options in `EpiNow2` that aren't covered in these tutorials.


::::::::::::::::::::::::::::::::::::: keypoints

- We can create short-term forecasts by making assumptions about the future behaviour of the reproduction number
- Incomplete case reporting can be accounted for in estimates
amanda-minter marked this conversation as resolved.
Show resolved Hide resolved


::::::::::::::::::::::::::::::::::::::::::::::::
Loading
Loading