diff --git a/config.yaml b/config.yaml index df081d25..4c2427d2 100644 --- a/config.yaml +++ b/config.yaml @@ -63,9 +63,6 @@ episodes: - clean-data.Rmd - describe-cases.Rmd - simple-analysis.Rmd -- delays-access.Rmd -# - quantify-transmissibility.Rmd -- delays-functions.Rmd # Information for Learners learners: diff --git a/episodes/delays-access.Rmd b/episodes/delays-access.Rmd deleted file mode 100644 index 3ad60711..00000000 --- a/episodes/delays-access.Rmd +++ /dev/null @@ -1,663 +0,0 @@ ---- -title: 'Access epidemiological delay distributions' -teaching: 20 -exercises: 10 -editor_options: - chunk_output_type: inline ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How to get access to disease delay distributions from a pre-established database for use in analysis? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Get delays from a literature search database with `{epiparameter}`. -- Get distribution parameters and summary statistics of delay distributions. - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: prereq - -## Prerequisites - -This episode requires you to be familiar with: - -**Data science** : Basic programming with R. - -**Epidemic theory** : epidemiological parameters, disease time periods, such as the incubation period, generation time, and serial interval. - -::::::::::::::::::::::::::::::::: - -## Introduction - -Infectious diseases follow an infection cycle, which usually includes the following phases: presymptomatic period, symptomatic period and recovery period, as described by their [natural history](../learners/reference.md#naturalhistory). These time periods can be used to understand transmission dynamics and inform disease prevention and control interventions. - -![Definition of key time periods. From [Xiang et al, 2021](https://www.sciencedirect.com/science/article/pii/S2468042721000038)](fig/time-periods.jpg) - - -::::::::::::::::: callout - -### Definitions - -Look at the [glossary](../learners/reference.md) for the definitions of all the time periods of the figure above! - -::::::::::::::::::::::::: - -However, early in an epidemic, modelling efforts can be delayed by the lack of a centralised resource that summarises input parameters for the disease of interest ([Nash et al., 2023](https://mrc-ide.github.io/epireview/)). Projects like `{epiparameter}` and `{epireview}` are building online catalogues following literature synthesis protocols that can help parametrise models by easily accessing a comprenhensive library of previously estimated epidemiological parameters from past outbreaks. - - - -To exemplify how to use the `{epiparameter}` R package in your analysis pipeline, our goal in this episode will be to access one specific set of epidemiological parameters from the literature, instead of copying-and-pasting them by hand, to plug them into an `{EpiNow2}` analysis workflow. - - - -Let's start loading the `{epiparameter}` package. We'll use the pipe `%>%` to connect some of their functions, some `{tibble}` and `{dplyr}` functions, so let's also call to the `{tidyverse}` package: - -```{r,warning=FALSE,message=FALSE} -library(epiparameter) -library(tidyverse) -``` - -::::::::::::::::::: checklist - -### The double-colon - -The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. - -For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package. - -This help us remember package functions and avoid namespace conflicts. - -::::::::::::::::::: - - -## The problem - -If we want to estimate the transmissibility of an infection, it's common to use a package such as `{EpiEstim}` or `{EpiNow2}`. However, both require some epidemiological information as an input. For example, in `{EpiNow2}` we use `EpiNow2::Gamma()` to specify a [generation time](../learners/reference.md#generationtime) as a probability distribution adding its `mean`, standard deviation (`sd`), and maximum value (`max`). - -To specify a `generation_time` that follows a _Gamma_ distribution with mean $\mu = 4$, standard deviation $\sigma = 2$, and a maximum value of 20, we write: - -```r -generation_time <- - EpiNow2::Gamma( - mean = 4, - sd = 2, - max = 20 - ) -``` - -It is a common practice for analysts to manually search the available literature and copy and paste the **summary statistics** or the **distribution parameters** from scientific publications. A challenge that is often faced is that the reporting of different statistical distributions is not consistent across the literature. `{epiparameter}`’s objective is to facilitate the access to reliable estimates of distribution parameters for a range of infectious diseases, so that they can easily be implemented in outbreak analytic pipelines. - -In this episode, we will *access* the summary statistics of generation time for COVID-19 from the library of epidemiological parameters provided by `{epiparameter}`. These metrics can be used to estimate the transmissibility of this disease using `{EpiNow2}` in subsequent episodes. - -Let's start by looking at how many entries are available in the **epidemiological distributions database** in `{epiparameter}` using `epidist_db()` for the epidemiological distribution `epi_dist` called generation time with the string `"generation"`: - -```{r} -epiparameter::epidist_db( - epi_dist = "generation" -) -``` - -Currently, in the library of epidemiological parameters, we have one `"generation"` time entry for Influenza. Instead, we can look at the `serial` intervals for `COVID`-19. Let find what we need to consider for this! - -## Generation time vs serial interval - -The generation time, jointly with the reproduction number ($R$), provide valuable insights on the strength of transmission and inform the implementation of control measures. Given a $R>1$, the shorter the generation time, the earlier the incidence of disease cases will grow. - -![Video from the MRC Centre for Global Infectious Disease Analysis, Ep 76. Science In Context - Epi Parameter Review Group with Dr Anne Cori (27-07-2023) at ](fig/reproduction-generation-time.png) - -In calculating the effective reproduction number ($R_{t}$), the *generation time* distribution is often approximated by the [serial interval](../learners/reference.md#serialinterval) distribution. -This frequent approximation is because it is easier to observe and measure the onset of symptoms than the onset of infectiousness. - -![A schematic of the relationship of different time periods of transmission between an infector and an infectee in a transmission pair. Exposure window is defined as the time interval having viral exposure, and transmission window is defined as the time interval for onward transmission with respect to the infection time ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)).](fig/serial-interval-observed.jpeg) - -However, using the *serial interval* as an approximation of the *generation time* is primarily valid for diseases in which infectiousness starts after symptom onset ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)). In cases where infectiousness starts before symptom onset, the serial intervals can have negative values, which is the case for diseases with pre-symptomatic transmission ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext#gr2)). - - - -::::::::::::::::: callout - -### From time periods to probability distributions. - -When we calculate the *serial interval*, we see that not all case pairs have the same time length. We will observe this variability for any case pair and individual time period, including the [incubation period](../learners/reference.md#incubation) and [infectious period](../learners/reference.md#infectiousness). - -![Serial intervals of possible case pairs in (a) COVID-19 and (b) MERS-CoV. Pairs represent a presumed infector and their presumed infectee plotted by date of symptom onset ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig6)).](fig/serial-interval-pairs.jpg) - -To summarise these data from individual and pair time periods, we can find the **statistical distributions** that best fit the data ([McFarland et al., 2023](https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.27.2200806)). - - - -![Fitted serial interval distribution for (a) COVID-19 and (b) MERS-CoV based on reported transmission pairs in Saudi Arabia. We fitted three commonly used distributions, Log normal, Gamma, and Weibull distributions, respectively ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig5)).](fig/seria-interval-fitted-distributions.jpg) - -Statistical distributions are summarised in terms of their **summary statistics** like the *location* (mean and percentiles) and *spread* (variance or standard deviation) of the distribution, or with their **distribution parameters** that inform about the *form* (shape and rate/scale) of the distribution. These estimated values can be reported with their **uncertainty** (95% confidence intervals). - -| Gamma | mean | shape | rate/scale | -|:--------------|:--------------|:--------------|:--------------| -| MERS-CoV | 14.13(13.9–14.7) | 6.31(4.88–8.52) | 0.43(0.33–0.60) | -| COVID-19 | 5.1(5.0–5.5) | 2.77(2.09–3.88) | 0.53(0.38–0.76) | - -| Weibull | mean | shape | rate/scale | -|:--------------|:--------------|:--------------|:--------------| -| MERS-CoV | 14.2(13.3–15.2) | 3.07(2.64–3.63) | 16.1(15.0–17.1) | -| COVID-19 | 5.2(4.6–5.9) | 1.74(1.46–2.11) | 5.83(5.08–6.67) | - -| Log normal | mean | mean-log | sd-log | -|:--------------|:--------------|:--------------|:--------------| -| MERS-CoV | 14.08(13.1–15.2) | 2.58(2.50–2.68) | 0.44(0.39–0.5) | -| COVID-19 | 5.2(4.2–6.5) | 1.45(1.31–1.61) | 0.63(0.54–0.74) | - -Table: Serial interval estimates using Gamma, Weibull, and Log Normal distributions. 95% confidence intervals for the shape and scale (logmean and sd for Log Normal) parameters are shown in brackets ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#tbl3)). - -::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::: challenge - -### Serial interval - -Assume that COVID-19 and SARS have similar reproduction number values and that the serial interval approximates the generation time. - -Given the serial interval of both infections in the figure below: - -- Which one would be harder to control? -- Why do you conclude that? - -![Serial interval of novel coronavirus (COVID-19) infections overlaid with a published distribution of SARS. ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext))](fig/serial-interval-covid-sars.jpg) - -::::::::::::::::: hint - -The peak of each curve can inform you about the location of the mean of each distribution. The larger the mean, the larger the serial interval. - -:::::::::::::::::::::: - -::::::::::::::::: solution - -**Which one would be harder to control?** - -COVID-19 - -**Why do you conclude that?** - -COVID-19 has the lowest mean serial interval. The approximate mean value for the serial interval of COVID-19 is around four days, and SARS is about seven days. Thus, COVID-19 will likely have newer generations in less time than SARS, assuming similar reproduction numbers. - -:::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::::::::: - -:::::::::::::::::::::: instructor - -The objective of the assessment above is to assess the interpretation of a larger or shorter generation time. - -:::::::::::::::::::::: - -## Choosing epidemiological parameters - -In this section, we will use `{epiparameter}` to obtain the serial interval for COVID-19, as an alternative to the generation time. - -Let's ask now how many parameters we have in the epidemiological distributions database (`epidist_db()`) with the `disease` named `covid`-19. Run this locally! - -```{r,eval=FALSE} -epiparameter::epidist_db( - disease = "covid" -) -``` - -From the `{epiparameter}` package, we can use the `epidist_db()` function to ask for any `disease` and also for a specific epidemiological distribution (`epi_dist`). Run this in your console: - -```{r,eval=FALSE} -epiparameter::epidist_db( - disease = "COVID", - epi_dist = "serial" -) -``` - -With this query combination, we get more than one delay distribution. This output is an `` class object. - -::::::::::::::::: callout - -### CASE-INSENSITIVE - -`epidist_db` is [case-insensitive](https://dillionmegida.com/p/case-sensitivity-vs-case-insensitivity/#case-insensitivity). This means that you can use strings with letters in upper or lower case indistinctly. Strings like `"serial"`, `"serial interval"` or `"serial_interval"` are also valid. - -::::::::::::::::::::::::: - -As suggested in the outputs, to summarise an `` object and get the column names from the underlying parameter database, we can add the `epiparameter::parameter_tbl()` function to the previous code using the pipe `%>%`: - -```{r} -epiparameter::epidist_db( - disease = "covid", - epi_dist = "serial" -) %>% - epiparameter::parameter_tbl() -``` - -In the `epiparameter::parameter_tbl()` output, we can also find different types of probability distributions (e.g., Log-normal, Weibull, Normal). - -`{epiparameter}` uses the `base` R naming convention for distributions. This is why **Log normal** is called `lnorm`. - -::::::::::::::::: spoiler - -### Why do we have an 'NA' entry? - -Entries with a missing value (``) in the `prob_distribution` column are *non-parameterised* entries. They have summary statistics but no probability distribution. Compare these two outputs: - -```{r,eval=FALSE} -# get an object -distribution <- - epiparameter::epidist_db( - disease = "covid", - epi_dist = "serial" - ) - -distribution %>% - # pluck the first entry in the object class - pluck(1) %>% - # check if object have distribution parameters - is_parameterised() - -# check if the second object -# have distribution parameters -distribution %>% - pluck(2) %>% - is_parameterised() -``` - -### Parameterised entries have an Inference method - -As detailed in `?is_parameterised`, a parameterised distribution is the entry that has a probability distribution associated with it provided by an `inference_method` as shown in `metadata`: - -```{r,eval=FALSE} -distribution[[1]]$metadata$inference_method -distribution[[2]]$metadata$inference_method -distribution[[4]]$metadata$inference_method -``` - -::::::::::::::::::::::::: - - -::::::::::::::::::::::::::::::::: challenge - -### Find your delay distributions! - -Take 2 minutes to explore the `{epiparameter}` library. - -**Choose** a disease of interest (e.g., Influenza, Measles, etc.) and a delay distribution (e.g., the incubation period, onset to death, etc.). - -Find: - -- How many delay distributions are for that disease? - -- How many types of probability distribution (e.g., gamma, log normal) are for a given delay in that disease? - -Ask: - -- Do you recognise the papers? - -- Should `{epiparameter}` literature review consider any other paper? - -::::::::::::::::: hint - -The `epidist_db()` function with `disease` alone counts the number of entries like: - -- studies, and -- delay distributions. - -The `epidist_db()` function with `disease` and `epi_dist` gets a list of all entries with: - -- the complete citation, -- the **type** of a probability distribution, and -- distribution parameter values. - -The combo of `epidist_db()` plus `parameter_tbl()` gets a data frame of all entries with columns like: - -- the **type** of the probability distribution per delay, and -- author and year of the study. - -:::::::::::::::::::::: - -::::::::::::::::: solution - -We choose to explore Ebola's delay distributions: - -```{r} -# we expect 16 delays distributions for ebola -epiparameter::epidist_db( - disease = "ebola" -) -``` - -Now, from the output of `epiparameter::epidist_db()`, What is an [offspring distribution](../learners/reference.md#offspringdist)? - -We choose to find Ebola's incubation periods. This output list all the papers and parameters found. Run this locally if needed: - -```{r, eval=FALSE} -epiparameter::epidist_db( - disease = "ebola", - epi_dist = "incubation" -) -``` - -We use `parameter_tbl()` to get a summary display of all: - -```{r,eval=TRUE} -# we expect 2 different types of delay distributions -# for ebola incubation period -epiparameter::epidist_db( - disease = "ebola", - epi_dist = "incubation" -) %>% - parameter_tbl() -``` - -We find two types of probability distributions for this query: _log normal_ and _gamma_. - -How does `{epiparameter}` do the collection and review of peer-reviewed literature? We invite you to read the vignette on ["Data Collation and Synthesis Protocol"](https://epiverse-trace.github.io/epiparameter/articles/data_protocol.html)! - -:::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::::::::: - - -## Select a single distribution - -The `epiparameter::epidist_db()` function works as a filtering or subset function. Let's use the `author` argument to filter `Hiroshi Nishiura` parameters: - -```{r} -epiparameter::epidist_db( - disease = "covid", - epi_dist = "serial", - author = "Hiroshi" -) %>% - epiparameter::parameter_tbl() -``` - -We still get more than one epidemiological parameter. We can set the `single_epidist` argument to `TRUE` to only one: - -```{r} -epiparameter::epidist_db( - disease = "covid", - epi_dist = "serial", - author = "Hiroshi", - single_epidist = TRUE -) -``` - -::::::::::::::::: callout - -### How does 'single_epidist' works? - -Looking at the help documentation for `?epiparameter::epidist_db()`: - -- If multiple entries match the arguments supplied and `single_epidist = TRUE`, then the parameterised -`` with the *largest sample size* will be returned. -- If multiple entries are equal after this sorting, the *first entry* will be returned. - -What is a *parametrised* ``? Look at `?is_parameterised`. - -::::::::::::::::::::::::: - -Let's assign this `` class object to the `covid_serialint` object. - -```{r,message=FALSE} -covid_serialint <- - epiparameter::epidist_db( - disease = "covid", - epi_dist = "serial", - author = "Nishiura", - single_epidist = TRUE - ) -``` - - - -You can use `plot()` to `` objects to visualise: - -- the *Probability Density Function (PDF)* and -- the *Cumulative Distribution Function (CDF)*. - -```{r} -# plot object -plot(covid_serialint) -``` - -With the `day_range` argument, you can change the length or number of days in the `x` axis. Explore what this looks like: - -```{r,eval=FALSE} -# plot object -plot(covid_serialint, day_range = 0:20) -``` - - -## Extract the summary statistics - -We can get the `mean` and standard deviation (`sd`) from this `` diving into the `summary_stats` object: - -```{r} -# get the mean -covid_serialint$summary_stats$mean -``` - -Now, we have an epidemiological parameter we can reuse! Given that the `covid_serialint` is a `lnorm` or log normal distribution, we can replace the **summary statistics** numbers we plug into the `EpiNow2::LogNormal()` function: - -```r -generation_time <- - EpiNow2::LogNormal( - mean = covid_serialint$summary_stats$mean, # replaced! - sd = covid_serialint$summary_stats$sd, # replaced! - max = 20 - ) -``` - -In the next episode we'll learn how to use `{EpiNow2}` to correctly specify distributions, estimate transmissibility. Then, how to use **distribution functions** to get a maximum value (`max`) for `EpiNow2::LogNormal()` and use `{epiparameter}` in your analysis. - -:::::::::::::::::::::::::::::: callout - -### Log normal distributions - -If you need the log normal **distribution parameters** instead of the summary statistics, we can use `epiparameter::get_parameters()`: - -```{r} -covid_serialint_parameters <- - epiparameter::get_parameters(covid_serialint) - -covid_serialint_parameters -``` - -This gets a vector of class `` ready to use as input for any other package! - -:::::::::::::::::::::::::::::: - -## Challenges - -:::::::::::::::::::::::::::::: challenge - -### Ebola's serial interval - -Take 1 minute to: - -Get access to the Ebola serial interval with the highest sample size. - -Answer: - -- What is the `sd` of the epidemiological distribution? - -- What is the `sample_size` used in that study? - -::::::::: hint - -Use the `$` operator plus the tab or keyboard button to explore them as an expandable list: - -```r -covid_serialint$ -``` - -Use the `str()` to display the structure of the `` R object. - -:::::::::::::::::: - -:::::::::: solution - -```{r,eval=TRUE} -# ebola serial interval -ebola_serial <- - epiparameter::epidist_db( - disease = "ebola", - epi_dist = "serial", - single_epidist = TRUE - ) - -ebola_serial -``` - -```{r,eval=TRUE} -# get the sd -ebola_serial$summary_stats$sd - -# get the sample_size -ebola_serial$metadata$sample_size -``` - -Try to visualise this distribution using `plot()`. - -Also, explore all the other nested elements within the `` object. - -Share about: - -- What elements do you find useful for your analysis? -- What other elements would you like to see in this object? How? - -:::::::::::::::::::: - -:::::::::::::::::::::::::::::::: - -:::::::::::::::::::::::::::::: instructor - -An interesting element is the `method_assess` nested entry, which refers to the methods used by the study authors to assess for bias while estimating the serial interval distribution. - -```{r} -covid_serialint$method_assess -``` - -We will explore these concepts following episodes! - -:::::::::::::::::::::::::::::: - - -::::::::::::::::::::::::::::::::: challenge - -### Ebola's severity parameter - -A severity parameter like the duration of hospitalisation could add to the information needed about the bed capacity in response to an outbreak ([Cori et al., 2017](https://royalsocietypublishing.org/doi/10.1098/rstb.2016.0371)). - - - -For Ebola: - -- What is the reported *point estimate* of the mean duration of health care and case isolation? - -::::::::::::::::: hint - -An informative delay should measure the time from symptom onset to recovery or death. - -Find a way to access the whole `{epiparameter}` database and find how that delay may be stored. The `parameter_tbl()` output is a dataframe. - -:::::::::::::::::::::: - -::::::::::::::::: solution - -```{r,eval=TRUE} -# one way to get the list of all the available parameters -epidist_db(disease = "all") %>% - parameter_tbl() %>% - as_tibble() %>% - distinct(epi_distribution) - -ebola_severity <- epidist_db( - disease = "ebola", - epi_dist = "onset to discharge" -) - -# point estimate -ebola_severity$summary_stats$mean -``` - -Check that for some `{epiparameter}` entries you will also have the *uncertainty* around the *point estimate* of each summary statistic: - -```{r} -# 95% confidence intervals -ebola_severity$summary_stats$mean_ci -# limits of the confidence intervals -ebola_severity$summary_stats$mean_ci_limits -``` - -:::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::: discussion - -### The distribution zoo - -Explore this shinyapp called **The Distribution Zoo**! - -Follow these steps to reproduce the form of the COVID serial interval distribution from `{epiparameter}` (`covid_serialint` object): - -1. Access the shiny app website, -2. Go to the left panel, -3. Keep the *Category of distribution*: `Continuous Univariate`, -4. Select a new *Type of distribution*: `Log-Normal`, -5. Move the **sliders**, i.e. the graphical control element that allows you to adjust a value by moving a handle along a horizontal track or bar to the `covid_serialint` parameters. - -Replicate these with the `distribution` object and all its list elements: `[[2]]`, `[[3]]`, and `[[4]]`. Explore how the shape of a distribution changes when its parameters change. - -Share about: - -- What other features of the website do you find helpful? - -::::::::::::::::::::::::: - -::::::::::::::::::::::::: instructor - -In the context of user interfaces and graphical user interfaces (GUIs), like the [Distribution Zoo](https://ben18785.shinyapps.io/distribution-zoo/) shiny app, a **slider** is a graphical control element that allows users to adjust a value by moving a handle along a track or bar. Conceptually, it provides a way to select a numeric value within a specified range by visually sliding or dragging a pointer (the handle) along a continuous axis. - -::::::::::::::::::::::::: - - - -::::::::::::::::::::::::::::::::::::: keypoints - -- Use `{epiparameter}` to access the literature catalogue of epidemiological delay distributions. -- Use `epidist_db()` to select single delay distributions. -- Use `parameter_tbl()` for an overview of multiple delay distributions. -- Reuse known estimates for unknown disease in the early stage of an outbreak when no contact tracing data is available. - -:::::::::::::::::::::::::::::::::::::::::::::::: - diff --git a/episodes/delays-challenges.Rmd b/episodes/delays-challenges.Rmd deleted file mode 100644 index 053d9af8..00000000 --- a/episodes/delays-challenges.Rmd +++ /dev/null @@ -1,252 +0,0 @@ ---- -title: 'Add more delays' -teaching: 10 -exercises: 2 -editor_options: - chunk_output_type: inline ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How to get summary statistics from `` objects with only distribution parameters? -- When should delays be reused from a systematic review? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Reuse reporting delays from `{epiparameter}` as `{EpiNow2}` inputs. -- Convert distribution parameters to summary statistics with `{epiparameter}` - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: prereq - -## Prerequisites - -This episode requires you to be familiar with: - -**Data science** : Basic programming with R. - -**Epidemic theory** : Epidemiological parameters. Time periods. - -::::::::::::::::::::::::::::::::: - -## Introduction - -We have practised how to get epidemiological parameters from the literature and used them as input for other packages. - -You will find complementary challenges and resources to continue your learning path here! - -```{r,warning=FALSE,message=FALSE} -library(epiparameter) -library(tidyverse) -``` - -## Challenge ideas - -- ... - -## Challenge 2 - -::::::::::::::::::::::::::::::::::::::::::: challenge - -use any of the case studies in - - - -::::::::::::::::: hint - -How to get the mean and standard deviation from a generation time with only distribution parameters but no summary statistics like `mean` or `sd` for `EpiNow2::dist_spec()`? - -- Look at how to extract parameters from `{epiparameter}` vignette on [parameter extraction and conversion](https://epiverse-trace.github.io/epiparameter/articles/extract_convert.html) - -:::::::::::::::::::::: - -:::::::::::::::::::::: solution - -```{r} -influenza_generation_discrete <- - discretise(influenza_generation) - -# we have a problem -# the summary statistics do not have mean and sd -influenza_generation$summary_stats - -# one solution is to -# get parameters and convert to summary statistics - -# first, -# get parameters -influenza_generation_params <- - get_parameters(influenza_generation) - -# then, -# convert distribution parameters to summary statistics -influenza_converted <- - convert_params_to_summary_stats( - distribution = "weibull", - shape = influenza_generation_params["shape"], - scale = influenza_generation_params["scale"] - ) - -influenza_converted -``` - -:::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::::::::: - - - - - - - -::::::::::::::::::::::::::::::::::::: keypoints - -- Reuse known estimates for unknown disease in the early stage of an outbreak when no contact tracing data is available. - -:::::::::::::::::::::::::::::::::::::::::::::::: - diff --git a/episodes/quantify-transmissibility.Rmd b/episodes/quantify-transmissibility.Rmd deleted file mode 100644 index 5a39057e..00000000 --- a/episodes/quantify-transmissibility.Rmd +++ /dev/null @@ -1,555 +0,0 @@ ---- -title: 'Quantifying transmission' -teaching: 30 -exercises: 0 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How can I estimate the time-varying reproduction number ($Rt$) and growth rate from a time series of case data? -- How can I quantify geographical heterogeneity from these transmission metrics? - - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Learn how to estimate transmission metrics from a time series of case data using the R package `EpiNow2` - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: prereq - -## Prerequisites - -Learners should familiarise themselves with following concepts before working through this tutorial: - -**Statistics**: probability distributions, principle of Bayesian analysis. - -**Epidemic theory**: Effective reproduction number. - -**Data science**: Data transformation and visualization. You can review the episode on [Aggregate and visualize](https://epiverse-trace.github.io/tutorials-early/describe-cases.html) incidence data. - -::::::::::::::::::::::::::::::::: - - - -::::::::::::::::::::::::::::::::::::: callout -### Reminder: the Effective Reproduction Number, $R_t$ - -The [basic reproduction number](../learners/reference.md#basic), $R_0$, is the average number of cases caused by one infectious individual in a entirely susceptible population. - -But in an ongoing outbreak, the population does not remain entirely susceptible as those that recover from infection are typically immune. Moreover, there can be changes in behaviour or other factors that affect transmission. When we are interested in monitoring changes in transmission we are therefore more interested in the value of the **effective reproduction number**, $R_t$, the average number of cases caused by one infectious individual in the population at time $t$. - -:::::::::::::::::::::::::::::::::::::::::::::::: - - -## Introduction - -The transmission intensity of an outbreak is quantified using two key metrics: the reproduction number, which informs on the strength of the transmission by indicating how many new cases are expected from each existing case; and the [growth rate](../learners/reference.md#growth), which informs on the speed of the transmission by indicating how rapidly the outbreak is spreading or declining (doubling/halving time) within a population. For more details on the distinction between speed and strength of transmission and implications for control, review [Dushoff & Park, 2021](https://royalsocietypublishing.org/doi/full/10.1098/rspb.2020.1556). - -To estimate these key metrics using case data we must account for delays between the date of infections and date of reported cases. In an outbreak situation, data are usually available on reported dates only, therefore we must use estimation methods to account for these delays when trying to understand changes in transmission over time. - -In the next tutorials we will focus on how to use the functions in `{EpiNow2}` to estimate transmission metrics of case data. We will not cover the theoretical background of the models or inference framework, for details on these concepts see the [vignette](https://epiforecasts.io/EpiNow2/dev/articles/estimate_infections.html). - -In this tutorial we are going to learn how to use the `{EpiNow2}` package to estimate the time-varying reproduction number. We'll get input data from `{incidence2}`. We’ll use the `{tidyr}` and `{dplyr}` packages to arrange some of its outputs, `{ggplot2}` to visualize case distribution, and the pipe `%>%` to connect some of their functions, so let’s also call to the `{tidyverse}` package: - -```r -library(EpiNow2) -library(incidence2) -library(tidyverse) -``` - -```{r,echo=FALSE,eval=TRUE,message=FALSE,warning=FALSE} -library(tidyverse) -``` - - -::::::::::::::::::: checklist - -### The double-colon - -The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. - -For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package. - -This help us remember package functions and avoid namespace conflicts. - -::::::::::::::::::: - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor - -This tutorial illustrates the usage of `epinow()` to estimate the time-varying reproduction number and infection times. Learners should understand the necessary inputs to the model and the limitations of the model output. - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - - -::::::::::::::::::::::::::::::::::::: callout -### Bayesian inference - -The R package `EpiNow2` uses a [Bayesian inference](../learners/reference.md#bayesian) framework to estimate reproduction numbers and infection times based on reporting dates. - -In Bayesian inference, we use prior knowledge (prior distributions) with data (in a likelihood function) to find the posterior probability. - -

Posterior probability $\propto$ likelihood $\times$ prior probability -

- -:::::::::::::::::::::::::::::::::::::::::::::::: - -:::::::::::::::::::::::::::::::::::::::::::::::: instructor - -Refer to the prior probability distribution and the [posterior probability](https://en.wikipedia.org/wiki/Posterior_probability) distribution. - -In the ["`Expected change in daily cases`" callout](#expected-change-in-daily-cases), by "the posterior probability that $R_t < 1$", we refer specifically to the [area under the posterior probability distribution curve](https://www.nature.com/articles/nmeth.3368/figures/1). - -:::::::::::::::::::::::::::::::::::::::::::::::: - - -## Delay distributions and case data -### Case data - -To illustrate the functions of `EpiNow2` we will use outbreak data of the start of the COVID-19 pandemic from the United Kingdom. The data are available in the R package `{incidence2}`. - -```{r} -dplyr::as_tibble(incidence2::covidregionaldataUK) -``` - -To use the data, we must format the data to have two columns: - -+ `date`: the date (as a date object see `?is.Date()`), -+ `confirm`: number of confirmed cases on that date. - -Let's use `{tidyr}` and `{incidence2}` for this: - -```{r, warning = FALSE, message = FALSE} -cases <- incidence2::covidregionaldataUK %>% - # use {tidyr} to preprocess missing values - tidyr::replace_na(base::list(cases_new = 0)) %>% - # use {incidence2} to compute the daily incidence - incidence2::incidence( - date_index = "date", - counts = "cases_new", - count_values_to = "confirm", - date_names_to = "date", - complete_dates = TRUE - ) %>% - dplyr::select(-count_variable) -``` - -With `incidence2::incidence()` we aggregate cases in different time *intervals* (i.e., days, weeks or months) or per *group* categories. Also we can have complete dates for all the range of dates per group category using `complete_dates = TRUE` -Explore later the [`incidence2::incidence()` reference manual](https://www.reconverse.org/incidence2/reference/incidence.html) - -::::::::::::::::::::::::: spoiler - -### Can we replicate {incidence2} with {dplyr}? - -We can get an object similar to `cases` from the `incidence2::covidregionaldataUK` data frame using the `{dplyr}` package. - -```{r, warning = FALSE, message = FALSE, eval=FALSE} -incidence2::covidregionaldataUK %>% - dplyr::select(date, cases_new) %>% - dplyr::group_by(date) %>% - dplyr::summarise(confirm = sum(cases_new, na.rm = TRUE)) %>% - dplyr::ungroup() -``` - -However, the `incidence2::incidence()` function contains convenient arguments like `complete_dates` that facilitate getting an incidence object with the same range of dates for each grouping without the need of extra code lines or a time-series package. - -::::::::::::::::::::::::: - -There are case data available for `r dim(cases)[1]` days, but in an outbreak situation it is likely we would only have access to the beginning of this data set. Therefore we assume we only have the first 90 days of this data. - -```{r echo = FALSE} -# keep the first 90 dates and visualize epicurve -cases %>% - dplyr::slice_head(n = 90) %>% - # use ggplot2 - ggplot(aes(x = date, y = confirm)) + - geom_col() + - theme_grey( - base_size = 15 - ) -``` - -### Delay distributions - -We assume there are delays from the time of infection until the time a case is reported. We specify these delays as distributions to account for the uncertainty in individual level differences. The delay can consist of multiple types of delays/processes. A typical delay from time of infection to case reporting may consist of: - -

**time from infection to symptom onset** (the [incubation period](../learners/reference.md#incubation)) + **time from symptom onset to case notification** (the reporting time) -.

- -The delay distribution for each of these processes can either estimated from data or obtained from the literature. We can express uncertainty about what the correct parameters of the distributions by assuming the distributions have **fixed** parameters or whether they have **variable** parameters. To understand the difference between **fixed** and **variable** distributions, let's consider the incubation period. - -::::::::::::::::::::::::::::::::::::: callout - -### Delays and data -The number of delays and type of delay are a flexible input that depend on the data. The examples below highlight how the delays can be specified for different data sources: - -
- -| Data source | Delay(s) | -| ------------- |-------------| -|Time of symptom onset |Incubation period | -|Time of case report |Incubation period + time from symptom onset to case notification | -|Time of hospitalisation |Incubation period + time from symptom onset to hospitalisation | - -
- - -:::::::::::::::::::::::::::::::::::::::::::::::: - - - -#### Incubation period distribution - -The distribution of incubation period for many diseases can usually be obtained from the literature. The package `{epiparameter}` contains a library of epidemiological parameters for different diseases obtained from the literature. - -We specify a (fixed) gamma distribution with mean $\mu = 4$ and standard deviation $\sigma= 2$ (shape = $4$, scale = $1$) using the function `Gamma()` as follows: - -```{r} -incubation_period_fixed <- EpiNow2::Gamma( - mean = 4, - sd = 2, - max = 20 -) - -incubation_period_fixed -``` - -The argument `max` is the maximum value the distribution can take, in this example 20 days. - -::::::::::::::::::::::::::::::::::::: callout - -### Why a gamma distrubution? - -The incubation period has to be positive in value. Therefore we must specific a distribution in `{EpiNow2}` which is for positive values only. - -`Gamma()` supports Gamma distributions and `LogNormal()` Log-normal distributions, which are distributions for positive values only. - -For all types of delay, we will need to use distributions for positive values only - we don't want to include delays of negative days in our analysis! - -:::::::::::::::::::::::::::::::::::::::::::::::: - - - -#### Including distribution uncertainty - -To specify a **variable** distribution, we include uncertainty around the mean $\mu$ and standard deviation $\sigma$ of our gamma distribution. If our incubation period distribution has a mean $\mu$ and standard deviation $\sigma$, then we assume the mean ($\mu$) follows a Normal distribution with standard deviation $\sigma_{\mu}$: - -$$\mbox{Normal}(\mu,\sigma_{\mu}^2)$$ - -and a standard deviation ($\sigma$) follows a Normal distribution with standard deviation $\sigma_{\sigma}$: - -$$\mbox{Normal}(\sigma,\sigma_{\sigma}^2).$$ - -We specify this using `Normal()` for each argument: the mean ($\mu=4$ with $\sigma_{\mu}=0.5$) and standard deviation ($\sigma=2$ with $\sigma_{\sigma}=0.5$). - -```{r,warning=FALSE,message=FALSE} -incubation_period_variable <- EpiNow2::Gamma( - mean = EpiNow2::Normal(mean = 4, sd = 0.5), - sd = EpiNow2::Normal(mean = 2, sd = 0.5), - max = 20 -) - -incubation_period_variable -``` - - -#### Reporting delays - -After the incubation period, there will be an additional delay of time from symptom onset to case notification: the reporting delay. We can specify this as a fixed or variable distribution, or estimate a distribution from data. - -When specifying a distribution, it is useful to visualise the probability density to see the peak and spread of the distribution, in this case we will use a *log normal* distribution. We can use the functions `convert_to_logmean()` and `convert_to_logsd()` to convert the mean and standard deviation of a normal distribution to that of a log normal distribution. - -If we want to assume that the mean reporting delay is 2 days (with a standard deviation of 1 day), we write: - -```{r} -# convert mean to logmean -log_mean <- EpiNow2::convert_to_logmean(mean = 2, sd = 1) - -# convert sd to logsd -log_sd <- EpiNow2::convert_to_logsd(mean = 2, sd = 1) -``` - -:::::::::::::::::::::: spoiler - -### Visualize a log Normal distribution using {epiparameter} - -Using `epiparameter::epidist()` we can create a custom distribution. The log normal distribution will look like: - -```r -library(epiparameter) -``` - -```{r,message=FALSE,warning=FALSE} -epiparameter::epidist( - disease = "covid", - epi_dist = "reporting delay", - prob_distribution = "lnorm", - prob_distribution_params = c( - meanlog = log_mean, - sdlog = log_sd - ) -) %>% - plot() -``` - -:::::::::::::::::::::: - -Using the mean and standard deviation for the log normal distribution, we can specify a fixed or variable distribution using `LogNormal()` as before: - -```{r,warning=FALSE,message=FALSE} -reporting_delay_variable <- EpiNow2::LogNormal( - meanlog = EpiNow2::Normal(mean = log_mean, sd = 0.5), - sdlog = EpiNow2::Normal(mean = log_sd, sd = 0.5), - max = 10 -) -``` - -We can plot single and combined distributions generated by `{EpiNow2}` using `plot()`. Let's combine in one plot the delay from infection to report which includes the incubation period and reporting delay: - -```{r} -plot(incubation_period_variable + reporting_delay_variable) -``` - - -:::::::::::::::::: callout - -If data is available on the time between symptom onset and reporting, we can use the function `estimate_delay()` to estimate a log normal distribution from a vector of delays. The code below illustrates how to use `estimate_delay()` with synthetic delay data. - -```{r, eval = FALSE } -delay_data <- rlnorm(500, log(5), 1) # synthetic delay data - -reporting_delay <- EpiNow2::estimate_delay( - delay_data, - samples = 1000, - bootstraps = 10 -) -``` - -:::::::::::::::::: - -#### Generation time - -We also must specify a distribution for the generation time. Here we will use a log normal distribution with mean 3.6 and standard deviation 3.1 ([Ganyani et al. 2020](https://doi.org/10.2807/1560-7917.ES.2020.25.17.2000257)). - - -```{r,warning=FALSE,message=FALSE} -generation_time_variable <- EpiNow2::LogNormal( - mean = EpiNow2::Normal(mean = 3.6, sd = 0.5), - sd = EpiNow2::Normal(mean = 3.1, sd = 0.5), - max = 20 -) -``` - - -## Finding estimates - -The function `epinow()` is a wrapper for the function `estimate_infections()` used to estimate cases by date of infection. The generation time distribution and delay distributions must be passed using the functions ` generation_time_opts()` and `delay_opts()` respectively. - -There are numerous other inputs that can be passed to `epinow()`, see `?EpiNow2::epinow()` for more detail. -One optional input is to specify a *log normal* prior for the effective reproduction number $R_t$ at the start of the outbreak. We specify a mean of 2 days and standard deviation of 2 days as arguments of `prior` within `rt_opts()`: - -```{r, eval = TRUE} -# define Rt prior distribution -rt_prior <- EpiNow2::rt_opts(prior = base::list(mean = 2, sd = 2)) -``` - -::::::::::::::::::::::::::::::::::::: callout - -### Bayesian inference using Stan - -The Bayesian inference is performed using MCMC methods with the program [Stan](https://mc-stan.org/). There are a number of default inputs to the Stan functions including the number of chains and number of samples per chain (see `?EpiNow2::stan_opts()`). - -To reduce computation time, we can run chains in parallel. To do this, we must set the number of cores to be used. By default, 4 MCMC chains are run (see `stan_opts()$chains`), so we can set an equal number of cores to be used in parallel as follows: - -```{r,warning=FALSE,message=FALSE} -withr::local_options(base::list(mc.cores = 4)) -``` - -To find the maximum number of available cores on your machine, use `parallel::detectCores()`. - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::: checklist - -**Note:** In the code below `_fixed` distributions are used instead of `_variable` (delay distributions with uncertainty). This is to speed up computation time. It is generally recommended to use variable distributions that account for additional uncertainty. - -```{r, echo = TRUE} -# fixed alternatives -generation_time_fixed <- EpiNow2::LogNormal( - mean = 3.6, - sd = 3.1, - max = 20 -) - -reporting_delay_fixed <- EpiNow2::LogNormal( - mean = log_mean, - sd = log_sd, - max = 10 -) -``` - -::::::::::::::::::::::::: - -Now you are ready to run `EpiNow2::epinow()` to estimate the time-varying reproduction number: - -```{r, message = FALSE, eval = TRUE} -reported_cases <- cases %>% - dplyr::slice_head(n = 90) - -estimates <- EpiNow2::epinow( - # cases - data = reported_cases, - # delays - generation_time = EpiNow2::generation_time_opts(generation_time_fixed), - delays = EpiNow2::delay_opts(incubation_period_fixed + reporting_delay_fixed), - # prior - rt = rt_prior -) -``` - -::::::::::::::::::::::::::::::::: callout - -### Do not wait for this to continue - -We can optionally use the `stan = stan_opts()` argument and function. For the purpose of this tutorial on reducing computation time, we can specify a fixed number of `samples = 1000` and `chains = 3` to the `stan` argument using the `stan_opts()` function. We expect this to take approximately 3 minutes. - -```r -# you can add the `stan` argument -EpiNow2::epinow( - ..., - stan = EpiNow2::stan_opts(samples = 1000, chains = 3) -) -``` - -**Remember:** Using an appropriate number of *samples* and *chains* is crucial for ensuring convergence and obtaining reliable estimates in Bayesian computations using Stan. Inadequate sampling or insufficient chains may lead to issues such as divergent transitions, impacting the accuracy and stability of the inference process. - -::::::::::::::::::::::::::::::::: - -### Results - -We can extract and visualise estimates of the effective reproduction number through time: - -```{r} -estimates$plots$R -``` - -The uncertainty in the estimates increases through time. This is because estimates are informed by data in the past - within the delay periods. This difference in uncertainty is categorised into **Estimate** (green) utilises all data and **Estimate based on partial data** (orange) estimates that are based on less data (because infections that happened at the time are more likely to not have been observed yet) and therefore have increasingly wider intervals towards the date of the last data point. Finally, the **Forecast** (purple) is a projection ahead of time. - -We can also visualise the growth rate estimate through time: -```{r} -estimates$plots$growth_rate -``` - -To extract a summary of the key transmission metrics at the *latest date* in the data: - -```{r} -summary(estimates) -``` - -As these estimates are based on partial data, they have a wide uncertainty interval. - -+ From the summary of our analysis we see that the expected change in daily cases is `r summary(estimates)$estimate[summary(estimates)$measure=="Expected change in daily cases"]` with the estimated new confirmed cases `r summary(estimates)$estimate[summary(estimates)$measure=="New confirmed cases by infection date"]`. - -+ The effective reproduction number $R_t$ estimate (on the last date of the data) is `r summary(estimates)$estimate[summary(estimates)$measure=="Effective reproduction no."]`. - -+ The exponential growth rate of case numbers is `r summary(estimates)$estimate[summary(estimates)$measure=="Rate of growth"]`. - -+ The doubling time (the time taken for case numbers to double) is `r summary(estimates)$estimate[summary(estimates)$measure=="Doubling/halving time (days)"]`. - -::::::::::::::::::::::::::::::::::::: callout -### `Expected change in daily cases` - -A factor describing expected change in daily cases based on the posterior probability that $R_t < 1$. - -
-| Probability ($p$) | Expected change | -| ------------- |-------------| -|$p < 0.05$ |Increasing | -|$0.05 \leq p< 0.4$ |Likely increasing | -|$0.4 \leq p< 0.6$ |Stable | -|$0.6 \leq p < 0.95$ |Likely decreasing | -|$0.95 \leq p \leq 1$ |Decreasing | -
- -:::::::::::::::::::::::::::::::::::::::::::::::: - - - - -## Quantify geographical heterogeneity - -The outbreak data of the start of the COVID-19 pandemic from the United Kingdom from the R package `{incidence2}` includes the region in which the cases were recorded. To find regional estimates of the effective reproduction number and cases, we must format the data to have three columns: - -+ `date`: the date, -+ `region`: the region, -+ `confirm`: number of confirmed cases for a region on a given date. - -```{r,warning=FALSE,message=FALSE} -regional_cases <- incidence2::covidregionaldataUK %>% - # use {tidyr} to preprocess missing values - tidyr::replace_na(base::list(cases_new = 0)) %>% - # use {incidence2} to convert aggregated data to incidence data - incidence2::incidence( - date_index = "date", - groups = "region", - counts = "cases_new", - count_values_to = "confirm", - date_names_to = "date", - complete_dates = TRUE - ) %>% - dplyr::select(-count_variable) - -# keep the first 90 dates for all regions - -# get vector of first 90 dates -date_range <- regional_cases %>% - dplyr::distinct(date) %>% - # from incidence2, dates are already arranged in ascendant order - dplyr::slice_head(n = 90) %>% - dplyr::pull(date) - -# filter dates in date_range -regional_cases <- regional_cases %>% - dplyr::filter(magrittr::is_in(x = date, table = date_range)) - -dplyr::as_tibble(regional_cases) -``` - -To find regional estimates, we use the same inputs as `epinow()` to the function `regional_epinow()`: - -```{r, message = FALSE, eval = TRUE} -estimates_regional <- EpiNow2::regional_epinow( - # cases - data = regional_cases, - # delays - generation_time = EpiNow2::generation_time_opts(generation_time_fixed), - delays = EpiNow2::delay_opts(incubation_period_fixed + reporting_delay_fixed), - # prior - rt = rt_prior -) - -estimates_regional$summary$summarised_results$table - -estimates_regional$summary$plots$R -``` - - - - - - - - - -## Summary - -`EpiNow2` can be used to estimate transmission metrics from case data at any time in the course of an outbreak. The reliability of these estimates depends on the quality of the data and appropriate choice of delay distributions. In the next tutorial we will learn how to make forecasts and investigate some of the additional inference options available in `EpiNow2`. - -::::::::::::::::::::::::::::::::::::: keypoints - -- Transmission metrics can be estimated from case data after accounting for delays -- Uncertainty can be accounted for in delay distributions - -:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/learners/setup.md b/learners/setup.md index efc3f8c0..0e285e1d 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -39,14 +39,14 @@ Our strategy is to gradually incorporate specialised **R packages** into a tradi :::::::::::::::::::::::::::: prereq -This course assumes intermediate R knowledge. This workshop is for you if: +This content assumes intermediate R knowledge. This tutorials are for you if: - You can read data into R, transform and reshape data, and make a wide variety of graphs - You are familiar with functions from `{dplyr}`, `{tidyr}`, and `{ggplot2}` - You can use the magrittr pipe `%>%` and/or native pipe `|>`. -We expect participants to have some exposure to basic Statistical, Mathematical and Epidemic theory concepts, but NOT intermediate or expert familiarity with modeling. +We expect learners to have some exposure to basic Statistical, Mathematical and Epidemic theory concepts, but NOT intermediate or expert familiarity with modeling. :::::::::::::::::::::::::::: @@ -104,10 +104,12 @@ During the tutorial, we will need a number of R packages. Packages contain usefu Open RStudio and **copy and paste** the following code chunk into the [console window](https://docs.posit.co/ide/user/ide/guide/code/console.html), then press the Enter (Windows and Linux) or Return (MacOS) to execute the command: ```r -if(!require("pak")) install.packages("pak") # for episodes on read, clean, validate and visualize linelist + +if(!require("pak")) install.packages("pak") + new_packages <- c( - "epiverse-trace/cleanepi", + "cleanepi", "rio", "here", "DBI", @@ -120,29 +122,18 @@ new_packages <- c( "tidyverse" ) -pak::pkg_install(new_packages) - -# for episodes on access delays and quantify transmission -new_packages <- c( - "EpiNow2", - "epiverse-trace/epiparameter", - "incidence2", - "tidyverse" -) - pak::pkg_install(new_packages) ``` These installation steps could ask you `? Do you want to continue (Y/n)` write `Y` and press Enter. + - 1. **Verify `Rtools` installation**. You can do so by using Windows search across your system. Optionally, you can use `{devtools}` running: ```r @@ -162,7 +153,7 @@ devtools::find_rtools() ``` ::::::::::::::::::::::::::::: - +--> ::::::::::::::::::::::::::::: spoiler @@ -171,8 +162,11 @@ devtools::find_rtools() If you get an error message when installing {epiparameter}, try this alternative code: ```r -# for epiparameter -install.packages("epiparameter", repos = c("https://epiverse-trace.r-universe.dev")) +# for simulist +install.packages("simulist", repos = c("https://epiverse-trace.r-universe.dev")) + +# for tracetheme +install.packages("tracetheme", repos = c("https://epiverse-trace.r-universe.dev")) ``` ::::::::::::::::::::::::::::: @@ -224,9 +218,18 @@ You should update **all of the packages** required for the tutorial, even if you When the installation has finished, you can try to load the packages by pasting the following code into the console: ```r -library(EpiNow2) -library(epiparameter) +# for episodes on read, clean, validate and visualize linelist + +library(cleanepi) +library(rio) +library(here) +library(DBI) +library(RSQLite) +library(dbplyr) +library(linelist) +library(simulist) library(incidence2) +library(tracetheme) library(tidyverse) ``` diff --git a/renv/profiles/lesson-requirements/renv.lock b/renv/profiles/lesson-requirements/renv.lock index f391ea5c..6230b82b 100644 --- a/renv/profiles/lesson-requirements/renv.lock +++ b/renv/profiles/lesson-requirements/renv.lock @@ -472,16 +472,11 @@ }, "cleanepi": { "Package": "cleanepi", - "Version": "0.0.2", - "Source": "GitHub", - "RemoteType": "github", - "RemoteHost": "api.github.com", - "RemoteRepo": "cleanepi", - "RemoteUsername": "epiverse-trace", - "RemotePkgRef": "epiverse-trace/cleanepi", - "RemoteRef": "HEAD", - "RemoteSha": "4f1e888e6ec92eaa2c234a698508233a0e35a65a", + "Version": "1.0.2", + "Source": "Repository", + "Repository": "CRAN", "Requirements": [ + "R", "arsenal", "checkmate", "dplyr", @@ -497,7 +492,7 @@ "utils", "withr" ], - "Hash": "a764d610b3572d869fdc56e39f5f9033" + "Hash": "2b9d9c7abb275271aab4f8a55ebde050" }, "cli": { "Package": "cli",