Skip to content

Commit

Permalink
complete visualization section
Browse files Browse the repository at this point in the history
  • Loading branch information
avallecam committed Nov 7, 2024
1 parent fc733f2 commit 6e0857b
Showing 1 changed file with 66 additions and 50 deletions.
116 changes: 66 additions & 50 deletions episodes/delays-refresher.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ set.seed(99)
cases_select <- cases %>%
dplyr::slice_sample(n = 30) %>%
dplyr::arrange(date_of_onset) %>%
dplyr::mutate(case_id = fct_inorder(case_id)) %>%
dplyr::mutate(case_id = fct_inorder(case_id)) %>%
dplyr::mutate(outcome_delay = date_of_outcome - date_of_hospitalisation) %>%
dplyr::filter(outcome_delay > 0) %>%
dplyr::select(
Expand Down Expand Up @@ -389,7 +389,7 @@ cases %>%

::::::::::::::::: callout

**Consistency among sequence of dated-events**
**Inconsistencies among sequence of dated-events?**

Wait! Is is consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death?

Expand Down Expand Up @@ -454,7 +454,7 @@ More on estimating a _delay-adjusted_ CFR on the episode about Estimating outbre

## Visualize transmission

The first question we want to know is simply: how bad is it?. The first step of the analysis is descriptive - we want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset.
The first question we want to know is simply: how bad is it? The first step of the analysis is descriptive - we want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset.

From the `cases` object we will use:

Expand Down Expand Up @@ -496,21 +496,7 @@ cases %>%

:::::::::::

You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the threshold date you choose, as follows:

```{r}
cases %>%
dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>%
dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>%
dplyr::mutate(reporting_delay_num = as.numeric(reporting_delay)) %>%
skimr::skim(reporting_delay_num)
```





Given that the date of hospitalization means the date of report, we can calculate the **reporting delay** from this line list data.
You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the **reporting delay** from this line list data:

From the `cases` object we will use:

Expand All @@ -528,37 +514,27 @@ cases %>%
geom_histogram(binwidth = 1)
```





<!--
```{r}
cases %>%
dplyr::select(case_id, date_of_infection, date_of_onset) %>%
dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>%
dplyr::mutate(incubation_period_num = as.numeric(incubation_period)) %>%
skimr::skim(incubation_period_num)
```


```{r,warning=FALSE,message=FALSE}
cases %>%
dplyr::select(case_id, date_of_infection, date_of_onset) %>%
dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>%
ggplot(aes(x = incubation_period)) +
geom_histogram(binwidth = 1)
dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>%
dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>%
dplyr::mutate(reporting_delay_num = as.numeric(reporting_delay)) %>%
skimr::skim(reporting_delay_num)
```
-->

The distribution of the reporting delay in day units is heavily skewed. Symptomatic cases can take almost **two weeks** to be reported.


From cases reported today, we completed the exponential growth trend of incidence cases within the last two weeks:

```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE}
cases %>%
dplyr::mutate(
delayed = dplyr::case_when(
# date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 5) ~
# "5 weeks before",
# date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 4) ~
# date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 3) ~
# "4 weeks before",
date_of_hospitalisation < max(date_of_hospitalisation) - (7 * 2) ~
"2 week before",
Expand All @@ -570,21 +546,21 @@ cases %>%
) %>%
ggplot(aes(date_of_onset, fill = delayed)) +
geom_histogram(binwidth = 7) +
labs(fill = "Observed cases")
labs(fill = "Reported cases")
```

:::::::::::::: challenge

Report:

- What transmission indicator can we estimate from the incidence curve?
- What indicator can we use to estimate transmission from the incidence curve?

::::::::: solution

- The growth rate! by fitting a linear model.
- The reproduction number
- The reproduction number accounting for delays from secondary observations to infection.

More on that on episodes about quantifying transmission.
More on this topic on episodes about **Aggregate and visualize** and **Quantifying transmission**.

```{r,eval=TRUE,echo=FALSE}
dat <- cases %>%
Expand All @@ -596,7 +572,7 @@ dat <- cases %>%
fitted <- dat %>%
# truncate curve to fit withou delays
filter(date_index<grates::as_isoweek(ymd(20140625))) %>%
filter(date_index < grates::as_isoweek(ymd(20140625))) %>%
nest() %>%
mutate(
model = lapply(
Expand All @@ -622,7 +598,7 @@ intervals <-
)) %>%
unnest(result)
plot(dat, angle = 45) +
plot(dat) +
ggplot2::geom_line(
ggplot2::aes(date_index, y = pred),
data = intervals,
Expand All @@ -637,18 +613,19 @@ plot(dat, angle = 45) +
)
```

```{r,eval=TRUE,echo=FALSE}
fitted %>%
mutate(fit_tidy = map(.x = model, .f = broom::tidy)) %>%
unnest(fit_tidy) %>%
select(-data, -model)
```


:::::::::

::::::::::::::

In order to account for these time delays when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**!

::::::::::: challenge

- What is the name of the delay from infection to symptom onset?

:::::::::::
Lastly, in order to account for these time delays when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**!

## Fit a probability distribution to delays

Expand Down Expand Up @@ -814,6 +791,45 @@ In the next tutorial episodes, we will:
#' expand the number of pre-days to include more backward contacts
```

:::::::::::::::::::::::: challenge

<!-- summative assessment -->

**Relevant delays when estimating transmission**

- Review the definition of the [incubation period](reference.md#incubation) in our glossary page.

- Calculate the summary statistics of the incubation period distribution observed in the line list data.

- Visualize the distribution of the incubation period distribution observed in the line list data.

::::::::::::: solution

Calculate the summary statistics:

```{r}
cases %>%
dplyr::select(case_id, date_of_infection, date_of_onset) %>%
dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>%
skimr::skim(incubation_period)
```

If you want to get the interquartile range (IQR) you can transform the time to numeric adding one step to the pipeline: `dplyr::mutate(incubation_period_num = as.numeric(incubation_period))`

Visualize the distribution:

```{r,warning=FALSE,message=FALSE}
cases %>%
dplyr::select(case_id, date_of_infection, date_of_onset) %>%
dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>%
ggplot(aes(x = incubation_period)) +
geom_histogram(binwidth = 1)
```

:::::::::::::

::::::::::::::::::::::::::

::::::::::::::: challenge

Let's create **reproducible examples (`reprex`)**. A reprex help us to communicate our coding problems with software developers. Explore this Applied Epi entry: <https://community.appliedepi.org/t/how-to-make-a-reproducible-r-code-example/167>
Expand Down

0 comments on commit 6e0857b

Please sign in to comment.