diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd
index 39d5b5d1..0a54fe13 100644
--- a/episodes/delays-refresher.Rmd
+++ b/episodes/delays-refresher.Rmd
@@ -4,7 +4,7 @@ teaching: 10
exercises: 2
---
-:::::::::::::::::::::::::::::::::::::: questions
+:::::::::::::::::::::::::::::::::::::: questions
- How to calculate delays from line list data?
- How to fit a probability distribution to delay data?
@@ -57,10 +57,8 @@ training/
|__ training.Rproj
```
-**RStudio Projects** allows you to use _relative file_ paths with respect to the `R` Project,
-making your code more portable and less error-prone.
-Avoids using `setwd()` with _absolute paths_
-like `"C:/Users/MyName/WeirdPath/training/data/file.csv"`.
+**RStudio Projects** allows you to use *relative file* paths with respect to the `R` Project, making your code more portable and less error-prone.
+Avoids using `setwd()` with *absolute paths* like `"C:/Users/MyName/WeirdPath/training/data/file.csv"`.
:::::::::::::::
@@ -81,9 +79,12 @@ Let's starts by creating `New Quarto Document`!
## Introduction
-A new Ebola Virus Disease (EVD) outbreak has been notified in a country in West Africa. The Ministry of Health is coordinating the outbreak response and has contracted you as a consultant in epidemic analysis to inform the response in real-time. The available report of cases is coming from hospital admissions.
+A new Ebola Virus Disease (EVD) outbreak has been notified in a country in West Africa.
+The Ministry of Health is coordinating the outbreak response and has contracted you as a consultant in epidemic analysis to inform the response in real-time.
+The available report of cases is coming from hospital admissions.
-Let's start by loading the package `{readr}` to read `.csv` data, `{dplyr}` to manipulate data, `{tidyr}` to rearrange it, and `{here}` to write file paths within your RStudio project. We'll use the pipe `%>%` to connect some of their functions, including others from the package `{ggplot2}`, so let's call to the package `{tidyverse}` that loads them all:
+Let's start by loading the package `{readr}` to read `.csv` data, `{dplyr}` to manipulate data, `{tidyr}` to rearrange it, and `{here}` to write file paths within your RStudio project.
+We'll use the pipe `%>%` to connect some of their functions, including others from the package `{ggplot2}`, so let's call to the package `{tidyverse}` that loads them all:
```{r}
# Load packages
@@ -94,7 +95,7 @@ library(tidyverse) # loads readr, dplyr, tidyr and ggplot2
**The double-colon**
-The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.
+The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.
For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.
@@ -104,9 +105,10 @@ This helps us remember package functions and avoid namespace conflicts.
## Explore data
-For the purpose of this episode, we will read a pre-cleaned line list data. Following episodes will tackle how to solve cleaning tasks.
+For the purpose of this episode, we will read a pre-cleaned line list data.
+Following episodes will tackle how to solve cleaning tasks.
-```{r,eval=FALSE,echo=TRUE,message=FALSE}
+```{r, eval=FALSE, echo=TRUE, message=FALSE}
# Read data
# e.g.: if path to file is data/linelist.csv then:
cases <- readr::read_csv(
@@ -114,7 +116,7 @@ cases <- readr::read_csv(
)
```
-```{r,eval=TRUE,echo=FALSE,message=FALSE}
+```{r, eval=TRUE, echo=FALSE, message=FALSE}
# Read data
cases <- readr::read_csv(
file.path("data", "linelist.csv")
@@ -125,12 +127,16 @@ cases <- readr::read_csv(
**Why should we use the {here} package?**
-The package `{here}` simplifies file referencing in R projects. It allows them to work across different operating systems (Windows, Mac, Linux). This feature, called **cross-environment compatibility**, eliminates the need to adjust file paths. For example:
+The package `{here}` simplifies file referencing in R projects.
+It allows them to work across different operating systems (Windows, Mac, Linux).
+This feature, called **cross-environment compatibility**, eliminates the need to adjust file paths.
+For example:
-- On Windows, paths are written using backslashes ( `\` ) as the separator between folder names: `"data\raw-data\file.csv"`
+- On Windows, paths are written using backslashes ( `\` ) as the separator between folder names: `"data\raw-data\file.csv"`
- On Unix based operating system such as macOS or Linux the forward slash ( `/` ) is used as the path separator: `"data/raw-data/file.csv"`
-The `{here}` package adds one more layer of reproducibility to your work. For more, read this tutorial about [open, sustainable, and reproducible epidemic analysis with R](https://epiverse-trace.github.io/research-compendium/)
+The `{here}` package adds one more layer of reproducibility to your work.
+For more, read this tutorial about [open, sustainable, and reproducible epidemic analysis with R](https://epiverse-trace.github.io/research-compendium/)
::::::::::::::::::::
@@ -143,7 +149,7 @@ cases
Take a moment to review the data and its structure..
-- Do the data and format resemble line lists you’ve encountered before?
+- Do the data and format resemble line lists you've encountered before?
- If you were part of the outbreak investigation team, what additional information might you want to collect?
::::::::::::::
@@ -199,10 +205,9 @@ Why do we have more missings on date of infection or date of outcome?
:::::::::::::
-
## Calculate severity
-A frequent indicator for measuring severity is the case fatality risk (CFR).
+A frequent indicator for measuring severity is the case fatality risk (CFR).
CFR is defined as the conditional probability of death given confirmed diagnosis, calculated as the cumulative number of deaths from an infectious disease over the number of confirmed diagnosed cases.
@@ -231,7 +236,7 @@ However, when assessing severity, CFR estimation is sensitive to:
- **Right-censoring bias**. If we include observations with unknown final status we can underestimate the true CFR.
-- **Selection bias**. At the beginning of an outbreak, given that health systems collect most clinically severe cases, an early estimate of the CFR can overestimate the true CFR.
+- **Selection bias**. At the beginning of an outbreak, given that health systems collect most clinically severe cases, an early estimate of the CFR can overestimate the true CFR.
::::::::::::
@@ -257,7 +262,8 @@ This way of writing almost look like writing a recipe!
:::::::::::: challenge
-Calculate the CFR as the division of the number of **deaths** among **known outcomes**. Do this by adding one more pipe `%>%` in the last code chunk.
+Calculate the CFR as the division of the number of **deaths** among **known outcomes**.
+Do this by adding one more pipe `%>%` in the last code chunk.
Report:
@@ -267,7 +273,7 @@ Report:
You can use the column names of variables to create one more column:
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
# calculate the naive CFR
cases %>%
count(outcome) %>%
@@ -291,7 +297,7 @@ cases %>%
dplyr::mutate(cfr = death / cases_known_outcome)
```
-This calculation is _naive_ because it tends to yield a biased and mostly underestimated CFR due to the time-delay from onset to death, only stabilising at the later stages of the outbreak.
+This calculation is *naive* because it tends to yield a biased and mostly underestimated CFR due to the time-delay from onset to death, only stabilising at the later stages of the outbreak.
Now, as a comparison, how much a CFR estimate changes if we include unknown outcomes in the denominator?
@@ -314,15 +320,18 @@ Due to **right-censoring bias**, if we include observations with unknown final s
::::::::::::
-Data of today will not include outcomes from patients that are still hospitalised. Then, one relevant question to ask is: In average, how much time it would take to know the outcomes of hospitalised cases? For this we can calculate **delays**!
+Data of today will not include outcomes from patients that are still hospitalised.
+Then, one relevant question to ask is: In average, how much time it would take to know the outcomes of hospitalised cases?
+For this we can calculate **delays**!
## Calculate delays
-The time between sequence of dated events can vary between subjects. For example, we would expect the date of infection to always be before the date of symptom onset, and the later always before the date of hospitalization.
+The time between sequence of dated events can vary between subjects.
+For example, we would expect the date of infection to always be before the date of symptom onset, and the later always before the date of hospitalization.
-In a random sample of 30 observations from the `cases` data frame we observe variability between the date of hospitalization and date of outcome:
+In a random sample of 30 observations from the `cases` data frame we observe variability between the date of hospitalization and date of outcome:
-```{r,echo=FALSE,eval=TRUE}
+```{r, echo=FALSE, eval=TRUE}
# demo code not to run by learner
set.seed(99)
@@ -383,7 +392,8 @@ cases %>%
**Inconsistencies among sequence of dated-events?**
-Wait! Is it consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death?
+Wait!
+Is it consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death?
In the next episode called **Clean data** we will learn how to check sequence of dated-events and other frequent and challenging inconsistencies!
@@ -391,7 +401,7 @@ In the next episode called **Clean data** we will learn how to check sequence of
::::::::::::::::: challenge
-To calculate a _delay-adjusted_ CFR, we need to assume a known delay from onset to death.
+To calculate a *delay-adjusted* CFR, we need to assume a known delay from onset to death.
Using the `cases` object:
@@ -401,7 +411,7 @@ Using the `cases` object:
Keep the rows that match a condition like `outcome == "Death"`:
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
# delay from onset to death
cases %>%
dplyr::filter(outcome == "Death") %>%
@@ -414,7 +424,7 @@ Is it consistent to have negative delays from onset of symptoms to death?
::::::::::::: solution
-```{r,warning=FALSE,message=FALSE}
+```{r, warning=FALSE, message=FALSE}
# delay from onset to death
cases %>%
dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>%
@@ -424,7 +434,9 @@ cases %>%
skimr::skim(delay_onset_death)
```
-Where is the source of the inconsistency? Let's say you want to keep the rows with negative delay values to investigate them. How would you do it?
+Where is the source of the inconsistency?
+Let's say you want to keep the rows with negative delay values to investigate them.
+How would you do it?
:::::::::::::
@@ -441,16 +453,18 @@ cases %>%
dplyr::filter(delay_onset_death < 0)
```
-More on estimating a _delay-adjusted_ CFR on the episode about **Estimating outbreak severity**!
+More on estimating a *delay-adjusted* CFR on the episode about **Estimating outbreak severity**!
::::::::::::
:::::::::::::::::
-
## Epidemic curve
-The first question we want to know is simply: how bad is it? The first step of the analysis is descriptive. We want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset.
+The first question we want to know is simply: how bad is it?
+The first step of the analysis is descriptive.
+We want to draw an epidemic curve or epicurve.
+This visualises the incidence over time by date of symptom onset.
From the `cases` object we will use:
@@ -496,7 +510,7 @@ cases %>%
You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the **reporting delay** from this line list data:
-```{r,warning=FALSE,message=FALSE}
+```{r, warning=FALSE, message=FALSE}
# reporting delay
cases %>%
dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>%
@@ -505,11 +519,12 @@ cases %>%
geom_histogram(binwidth = 1)
```
-The distribution of the reporting delay in day units is heavily skewed. Symptomatic cases may take up to **two weeks** to be reported.
+The distribution of the reporting delay in day units is heavily skewed.
+Symptomatic cases may take up to **two weeks** to be reported.
From reports (hospitalisations) in the most recent two weeks, we completed the exponential growth trend of incidence cases within the last four weeks:
-```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE}
+```{r, eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
cases %>%
dplyr::mutate(
delayed = dplyr::case_when(
@@ -526,7 +541,8 @@ cases %>%
labs(fill = "Reported cases")
```
-Given to reporting delays during this outbreak, it seemed that two weeks ago we had a decay of cases during the last three weeks. We needed to wait a couple of weeks to complete the incidence of cases on each week.
+Given to reporting delays during this outbreak, it seemed that two weeks ago we had a decay of cases during the last three weeks.
+We needed to wait a couple of weeks to complete the incidence of cases on each week.
:::::::::::::: challenge
@@ -541,7 +557,7 @@ Report:
More on this topic on episodes about **Aggregate and visualize** and **Quantifying transmission**.
-```{r,eval=TRUE,echo=FALSE}
+```{r, eval=TRUE, echo=FALSE}
dat <- cases %>%
incidence2::incidence(
date_index = "date_of_onset",
@@ -593,20 +609,21 @@ plot(dat) +
)
```
-```{r,eval=TRUE,echo=FALSE}
+```{r, eval=TRUE, echo=FALSE}
fitted %>%
mutate(fit_tidy = map(.x = model, .f = broom::tidy)) %>%
unnest(fit_tidy) %>%
select(-data, -model)
```
-Note: Due to the diagnosed reporting delay, We conveniently truncated the epidemic curve one week before to fit the model! This improves the fitted model to data when quantifying the growth rate during the exponential phase.
+Note: Due to the diagnosed reporting delay, We conveniently truncated the epidemic curve one week before to fit the model!
+This improves the fitted model to data when quantifying the growth rate during the exponential phase.
:::::::::
::::::::::::::
-Lastly, in order to account for these _epidemiological delays_ when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**!
+Lastly, in order to account for these *epidemiological delays* when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**!
## Fit a probability distribution to delays
@@ -616,7 +633,9 @@ Assess learners based on video refreshers on distributions, likelihood, and maxi
:::::::::::::::::::
-We fit a probability distribution to data (like delays) to make inferences about it. These inferences can be useful for Public health interventions and decision making. For example:
+We fit a probability distribution to data (like delays) to make inferences about it.
+These inferences can be useful for Public health interventions and decision making.
+For example:
- From the [incubation period](reference.md#incubation) distribution we can inform the length of active monitoring or quarantine. We can infer the time by which 99% of infected individuals are expected to show symptoms ([Lauer et al., 2020](https://pubmed.ncbi.nlm.nih.gov/32150748/)).
@@ -628,32 +647,35 @@ We fit a probability distribution to data (like delays) to make inferences about
**From time periods to probability distributions**
-When we calculate the *serial interval*, we see that not all case pairs have the same time length. We will observe this variability for any case pair and individual time period.
+When we calculate the *serial interval*, we see that not all case pairs have the same time length.
+We will observe this variability for any case pair and individual time period.
![Serial intervals of possible case pairs in (a) COVID-19 and (b) MERS-CoV. Pairs represent a presumed infector and their presumed infectee plotted by date of symptom onset ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig6)).](fig/serial-interval-pairs.jpg)
-To summarise these data from individual and pair time periods, we can find the **statistical distributions** that best fit the data ([McFarland et al., 2023](https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.27.2200806)).
+To summarise these data from individual and pair time periods, we can find the **statistical distributions** that best fit the data (\[McFarland et al., 2023\](https://www.eurosurveillance.org/content/10.2807/1560-7917.
+ES.2023.28.27.2200806)).
![Fitted serial interval distribution for (a) COVID-19 and (b) MERS-CoV based on reported transmission pairs in Saudi Arabia. We fitted three commonly used distributions, Log normal, Gamma, and Weibull distributions, respectively ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig5)).](fig/seria-interval-fitted-distributions.jpg)
-Statistical distributions are summarised in terms of their **summary statistics** like the *location* (mean and percentiles) and *spread* (variance or standard deviation) of the distribution, or with their **distribution parameters** that inform about the *form* (shape and rate/scale) of the distribution. These estimated values can be reported with their **uncertainty** (95% confidence intervals).
+Statistical distributions are summarised in terms of their **summary statistics** like the *location* (mean and percentiles) and *spread* (variance or standard deviation) of the distribution, or with their **distribution parameters** that inform about the *form* (shape and rate/scale) of the distribution.
+These estimated values can be reported with their **uncertainty** (95% confidence intervals).
-| Gamma | mean | shape | rate/scale |
-|:--------------|:--------------|:--------------|:--------------|
-| MERS-CoV | 14.13(13.9–14.7) | 6.31(4.88–8.52) | 0.43(0.33–0.60) |
-| COVID-19 | 5.1(5.0–5.5) | 2.77(2.09–3.88) | 0.53(0.38–0.76) |
+| Gamma | mean | shape | rate/scale |
+| :--------- | :--------------- | :-------------- | :-------------- |
+| MERS-CoV | 14\.13(13.9–14.7) | 6\.31(4.88–8.52) | 0\.43(0.33–0.60) |
+| COVID-19 | 5\.1(5.0–5.5) | 2\.77(2.09–3.88) | 0\.53(0.38–0.76) |
-| Weibull | mean | shape | rate/scale |
-|:--------------|:--------------|:--------------|:--------------|
-| MERS-CoV | 14.2(13.3–15.2) | 3.07(2.64–3.63) | 16.1(15.0–17.1) |
-| COVID-19 | 5.2(4.6–5.9) | 1.74(1.46–2.11) | 5.83(5.08–6.67) |
+| Weibull | mean | shape | rate/scale |
+| :--------- | :--------------- | :-------------- | :-------------- |
+| MERS-CoV | 14\.2(13.3–15.2) | 3\.07(2.64–3.63) | 16\.1(15.0–17.1) |
+| COVID-19 | 5\.2(4.6–5.9) | 1\.74(1.46–2.11) | 5\.83(5.08–6.67) |
-| Log normal | mean | mean-log | sd-log |
-|:--------------|:--------------|:--------------|:--------------|
-| MERS-CoV | 14.08(13.1–15.2) | 2.58(2.50–2.68) | 0.44(0.39–0.5) |
-| COVID-19 | 5.2(4.2–6.5) | 1.45(1.31–1.61) | 0.63(0.54–0.74) |
+| Log normal | mean | mean-log | sd-log |
+| :--------- | :--------------- | :-------------- | :-------------- |
+| MERS-CoV | 14\.08(13.1–15.2) | 2\.58(2.50–2.68) | 0\.44(0.39–0.5) |
+| COVID-19 | 5\.2(4.2–6.5) | 1\.45(1.31–1.61) | 0\.63(0.54–0.74) |
Table: Serial interval estimates using Gamma, Weibull, and Log Normal distributions. 95% confidence intervals for the shape and scale (logmean and sd for Log Normal) parameters are shown in brackets ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#tbl3)).
@@ -676,9 +698,12 @@ cases %>%
fitdistrplus::fitdist(distr = "lnorm")
```
-Use `summary()` to find goodness-of-fit statistics from the Maximum likelihood. Use `plot()` to visualize the fitted density function and other quality control plots.
+Use `summary()` to find goodness-of-fit statistics from the Maximum likelihood.
+Use `plot()` to visualize the fitted density function and other quality control plots.
-Now we can do inferences from the probability distribution fitted to the epidemiological delay! Want to learn how? Read the "Show details" :)
+Now we can do inferences from the probability distribution fitted to the epidemiological delay!
+Want to learn how?
+Read the "Show details" :)
:::::::::::::::: spoiler
@@ -686,7 +711,8 @@ Now we can do inferences from the probability distribution fitted to the epidemi
If you need it, read in detail about the [R probability functions for the normal distribution](https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc), each of its definitions and identify in which part of a distribution they are located!
-Each probability distribution has a unique set of **parameters** and **probability functions**. Read the [Distributions in the stats package](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html) or `?stats::Distributions` to find the ones available in R.
+Each probability distribution has a unique set of **parameters** and **probability functions**.
+Read the [Distributions in the stats package](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html) or `? stats::Distributions` to find the ones available in R.
For example, assuming that the reporting delay follows a **Log Normal** distribution, we can use `plnorm()` to calculate the probability of observing a reporting delay of 14 days or less:
@@ -711,16 +737,19 @@ Let's review some operators used until now:
- Assignment `<-` assigns a value to a variable from right to left.
- Double colon `::` to call a function from a specific package.
- Pipe `%>%` to structure sequences of data operations left-to-right
+
-We need to add two more to the list:
+We need to add two more to the list:
- Dollar sign `$`
- Square brackets `[]`
:::::::::::::
-Last step is to access to this parameters. Most modeling outputs from R functions will be stored as `list` class objects. In R, the dollar sign operator `$` is used to access elements (like columns) within a data frame or list by name, allowing for easy retrieval of specific components.
+Last step is to access to this parameters.
+Most modeling outputs from R functions will be stored as `list` class objects.
+In R, the dollar sign operator `$` is used to access elements (like columns) within a data frame or list by name, allowing for easy retrieval of specific components.
::::::::::::::: tab
@@ -738,9 +767,10 @@ reporting_delay_fit <- cases %>%
fitdistrplus::fitdist(distr = "lnorm")
```
-Usually, statistical outputs in R are stored as `List` class objects. Run the chunk below to explore it:
+Usually, statistical outputs in R are stored as `List` class objects.
+Run the chunk below to explore it:
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
reporting_delay_fit %>%
str()
```
@@ -752,9 +782,10 @@ reporting_delay_fit %>%
purrr::pluck("estimate")
```
-The code below provides an equivalent result. Try this yourself:
+The code below provides an equivalent result.
+Try this yourself:
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
reporting_delay_fit$estimate
```
@@ -781,9 +812,10 @@ cases_delay %>%
dplyr::pull(reporting_delay_num)
```
-The code below provides an equivalent result. Try this yourself:
+The code below provides an equivalent result.
+Try this yourself:
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
cases_delay$reporting_delay_num
```
@@ -793,13 +825,12 @@ cases_delay$reporting_delay_num
**A code completion tip**
-If we write the **square brackets** `[]` next to the object `reporting_delay_fit$estimate[]`, within `[]` we can use the
-Tab key ↹
-for [code completion feature](https://support.posit.co/hc/en-us/articles/205273297-Code-Completion-in-the-RStudio-IDE)
+If we write the **square brackets** `[]` next to the object `reporting_delay_fit$estimate[]`, within `[]` we can use the Tab key ↹ for [code completion feature](https://support.posit.co/hc/en-us/articles/205273297-Code-Completion-in-the-RStudio-IDE)
-This gives quick access to `"meanlog"` and `"sdlog"`. We invite you to try this out in code chunks and the R console!
+This gives quick access to `"meanlog"` and `"sdlog"`.
+We invite you to try this out in code chunks and the R console!
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
# 1. Place the cursor within the square brackets
# 2. Use the Tab key
# 3. Explore the completion list
@@ -809,23 +840,22 @@ reporting_delay_fit$estimate[]
::::::::::::::::::::::::::::::
-
::::::::::::::: callout
**Estimating epidemiological delays is CHALLENGING!**
-Epidemiological delays need to account for biases like censoring, right truncation, or epidemic phase ([Charniga et al., 2024](https://doi.org/10.1371/journal.pcbi.1012520)).
-
-Additionally, at the beginning of an outbreak, limited data or resources exist to perform this during a real-time analysis. Until we have more appropriate data for the specific disease and region of the ongoing outbreak, we can **reuse delays from past outbreaks** from the same pathogens or close in its phylogeny, independent of the area of origin.
-
-In the following tutorial episodes, we will:
+Epidemiological delays need to account for biases like censoring, right truncation, or epidemic phase ([Charniga et al., 2024](https://doi.org/10.1371/journal.pcbi.1012520)).
-- Efficiently clean and produce epidemic curves to explore patterns of disease spread by difference group and time aggregates. Find more in [Tutorials Early](https://epiverse-trace.github.io/tutorials-early/)!
-- Extract and apply epidemiological parameter distributions to estimate key transmission and severity metrics (e.g. reproduction number and case fatality risk) adjusted by their corresponding delays. Find more in [Tutorials Middle](https://epiverse-trace.github.io/tutorials-middle/)!
-- Use parameters like the basic reproduction number, the latent period and infectious period to simulate transmission trajectories and intervention scenarios. Find more in [Tutorials Late](https://epiverse-trace.github.io/tutorials-late/)!
+Additionally, at the beginning of an outbreak, limited data or resources exist to perform this during a real-time analysis.
+Until we have more appropriate data for the specific disease and region of the ongoing outbreak, we can **reuse delays from past outbreaks** from the same pathogens or close in its phylogeny, independent of the area of origin.
:::::::::::::::
+In the following tutorial episodes, we will:
+
+- Efficiently clean and produce **epidemic curves** to explore patterns of disease spread by difference group and time aggregates. Find more in [Tutorials Early](https://epiverse-trace.github.io/tutorials-early/)!
+- Access to epidemiological **delay distributions** to estimate delay-adjusted transmission and severity metrics (e.g. reproduction number and case fatality risk). Find more in [Tutorials Middle](https://epiverse-trace.github.io/tutorials-middle/)!
+- Use parameter values like the basic reproduction number, and **delays** like the [latent period](reference.md#latent) and [infectious period](reference.md#infectiousness) to simulate transmission trajectories and intervention scenarios. Find more in [Tutorials Late](https://epiverse-trace.github.io/tutorials-late/)!
## Challenges
@@ -858,7 +888,7 @@ cases %>%
Visualize the distribution:
-```{r,warning=FALSE,message=FALSE}
+```{r, warning=FALSE, message=FALSE}
cases %>%
dplyr::select(case_id, date_of_infection, date_of_onset) %>%
dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>%
@@ -890,7 +920,7 @@ qlnorm(
)
```
-With the distribution parameters of the incubation period we can infer the length of active monitoring or quarantine.
+With the distribution parameters of the incubation period we can infer the length of active monitoring or quarantine.
[Lauer et al., 2020](https://pubmed.ncbi.nlm.nih.gov/32150748/) estimated the incubation period of Coronavirus Disease 2019 (COVID-19) from publicly reported confirmed cases.
:::::::::::::
@@ -899,13 +929,15 @@ With the distribution parameters of the incubation period we can infer the lengt
::::::::::::::: challenge
-Let's create **reproducible examples (`reprex`)**. A reprex help us to communicate our coding problems with software developers. Explore this Applied Epi entry:
+Let's create **reproducible examples (`reprex`)**.
+A reprex help us to communicate our coding problems with software developers.
+Explore this Applied Epi entry:
Create a `reprex` with your answer:
- What is the value of the CFR from the data set in the chuck below?
-```{r,eval=FALSE,echo=TRUE}
+```{r, eval=FALSE, echo=TRUE}
outbreaks::ebola_sim_clean %>%
pluck("linelist") %>%
as_tibble() %>%
@@ -914,16 +946,18 @@ outbreaks::ebola_sim_clean %>%
:::::::::::::::
-::::::::::::::::::::::::::::::::::::: keypoints
+::::::::::::::::::::::::::::::::::::: keypoints
- Use packages from the `tidyverse` like `{dplyr}`, `{tidyr}`, and `{ggplot2}` for exploratory data analysis.
-- Epidemiological delays condition the estimation of indicators for severity or transmission.
+- Epidemiological delays condition the estimation of indicators for severity or transmission.
- Fit probability distribution to delays to make inferences from them for decision-making.
::::::::::::::::::::::::::::::::::::::::::::::::
### References
-- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 1 · Recon Learn, RECON learn. Available at: https://www.reconlearn.org/post/real-time-response-1 (Accessed: 06 November 2024).
+- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 1 · Recon Learn, RECON learn. Available at: (Accessed: 06 November 2024).
+
+- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 2 · Recon Learn, RECON learn. Available at: (Accessed: 07 November 2024).
+
-- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 2 · Recon Learn, RECON learn. Available at: https://www.reconlearn.org/post/real-time-response-2 (Accessed: 07 November 2024).