From 4aa0865a555ae7416319d81e0a5dc9c6d5dad738 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Mon, 4 Nov 2024 16:01:21 +0000 Subject: [PATCH 01/52] add a delays refresher wit hR episode --- config.yaml | 14 +++++++------- episodes/delays-refresher.Rmd | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 7 deletions(-) create mode 100644 episodes/delays-refresher.Rmd diff --git a/config.yaml b/config.yaml index 930b95e7..51cc038a 100644 --- a/config.yaml +++ b/config.yaml @@ -14,7 +14,7 @@ carpentry: 'epiverse-trace' title: 'Outbreak analytics with R' # Date the lesson was created (YYYY-MM-DD, this is empty by default) -created: +created: # Comma-separated list of keywords for the lesson keywords: 'forecasts, epidemic models, interventions' @@ -58,24 +58,24 @@ contact: 'andree.valle-campos@lshtm.ac.uk' # - another-learner.md # Order of episodes in your lesson -episodes: -# - template.Rmd +episodes: - introduction.Rmd +- delays-refresher.Rmd # Information for Learners -learners: +learners: # Information for Instructors -instructors: +instructors: # Learner Profiles -profiles: +profiles: # Customisation --------------------------------------------- # # This space below is where custom yaml items (e.g. pinning # sandpaper and varnish versions) should live + varnish: epiverse-trace/varnish@epiversetheme -# this is carpentries/sandpaper#533 in our fork so we can keep it up to date with main sandpaper: epiverse-trace/sandpaper@patch-renv-github-bug diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd new file mode 100644 index 00000000..edecb363 --- /dev/null +++ b/episodes/delays-refresher.Rmd @@ -0,0 +1,34 @@ +--- +title: 'delays-refresher' +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How to start to analyse outbreak data? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Calculate the naive CFR. +- Interpret 95% Confidence Intervals. +- Visualize the growth rate. +- Describe the notification (reporting) delay + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Introduction + +... + +::::::::::::::::::::::::::::::::::::: keypoints + +- Use `.md` files for episodes when you want static content +- Use `.Rmd` files for episodes when you need to generate output +- Run `sandpaper::check_lesson()` to identify any issues with your lesson +- Run `sandpaper::build_lesson()` to preview your lesson locally + +:::::::::::::::::::::::::::::::::::::::::::::::: + From 321c34f39160243a59076c2ad93acef3095301da Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Mon, 4 Nov 2024 21:06:30 +0000 Subject: [PATCH 02/52] add linelist data --- episodes/data/linelist.rds | Bin 0 -> 13881 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 episodes/data/linelist.rds diff --git a/episodes/data/linelist.rds b/episodes/data/linelist.rds new file mode 100644 index 0000000000000000000000000000000000000000..f14575ae6697cd24001314c27d24da4981a9c04b GIT binary patch literal 13881 zcmb7L33wD$woV8M`zE_=%_h()R|-d-tGjPH?O|0^53)GbI!f@)J^z! zw0=rTO0AUIwed3z!?fo3(Ex8#yh~DQr_=*xXpyiMk8&y=_50-@r@X$9><@Fw6I10_ zoKx|TtVKecifKM6pm8dy`aCh=GaOW9pOA%u5pPI%kpjMuU(6Q^MB@=bd3}+H?h#STagWaChctq53zgLpP zUXmmJU|dAz_XLB9b@zE=el^Bth)Et_RM>enpB57{_%y$i$jBoFJW;VDy}_7Q6|?)K zXgDNfLCqsaMCD@AibBsnNnGm2t+MFoA{s8<$odf?M1Wbpv}2%l2a zAMl9k1~uAuJilN}_5?*_n(U9qL}YSIjfX`nK`AapMefnKSBneHpfBJ}oUuqGD62vi zRV6L4Dxsh(`9wXVeorW|?w(lO7ZrQSFNLB3k&#!EJV|R52>Ub-B?6IJ#1%;;S z@g(iKP{<$mh-@Qj(3kigdE)4HB8P}46w$=&S~%p3iMj+mm|x5n4`WLvRz>!NW1?ng zUy@f;Deen}JRlNPs8I35ZGlbM|T+HC}h2@a&8Pzl`(RcB2K(FOK z;}JAWvC=+YFcKFzcu~*9w;!J|H7sI5*`tXsNmNEXMI}5Q`j+q+{P;XYgpl$1vH-6$ocA+QGUWAel2 zA)3bIgKX&DN4_W*@eR0Bft)d4T%M)qVwNOC@=Jjf5H z$rsttOR-bEcrA%0J(|abm-yuWdb}i~xCzN8`5{?4UeYIjge0e0kS(8s@}an>mh=*z zko0MOss;I@d5A~xlAPCruLZ?MFX@tu>PfmZruvYKd{b>`Ofm5MGcYC@#lc^ii*lko zxF6Cde>^T4yYP}v(j~vpO(WYSin&V%rYPpG4qELQwpG9ee>(xNQn%5fauI#RL94z_ zgXb&hXPrfgm1@w4(;OA^9i8}!)x@yTJfIVDIY-56WAhc$BJox7gkC46cg{hp1*1zC z-PFc}enZButEAvj%;N;Wtu`j~>KP7yhs{^a6FLp9mhZOtiTWIMXp4>{`8}#Tc$ffr zw^3{~eytoz+Q!I^t^Upjk>J=v3WCv%*HM&rvbiI+<+!%p;DzMXitQ(;V77 zYt!7WCMxF7wp_8&ZGA3R%>B&X$s7CCq51k))VgS%E{2006hdKd)zyhs%xwg~O5H}Q zzCtMU%=h%_^e!Tyzn7y<1Fp|eF@Li8iunaY9;dSoT#ut-?zHtd%~7$MRpX2EV>LBW z@E~piAossPcjyWN5tm|~vuVXFugE5s1BN5BTVvh}BwU7HXg`Cz|YjkmQs(w>mmBN3)2@`5~lUPWq*~jaDPW!3K(hX%1~()E%0> zpQOLYL5q4k(X2`H&1;_4ZNAmku+jX10D8{mQn;M!Tita>evtr&rKMt{h^rZ+%ZLp* z+jLeV#{XHj`Q{FW=ZS=#bBGjkk3(~R<`O1v!1#w4^7<4rc~{0S*J&V{xz&#G4>IIU z+G*m6L%~$NhR071;NVm~k!SEErEyRX?8KZf8{BRv(eBRs+F`Cs+tbZ@Y zXK`7nN(vsuTuFfW(-a%|VX?Xq3BJ>$6thZq=;}=p9}9siLx6$JJo;TNh#D<*3YqfM}t1jd7b@`IfY@N+BIx!UUd&WP=Xucm! z#%Fn2*BQi6tQHKfClWN@{}*(d7WLmuB>c1UW}b9t@liX)XjV_Fwn2=<<%pltbcb*9 zJoo61UN^-?^Rh#;%TnT8RgxIFRWU+Ew|b^ zDAxZVu`$nK-9fewtj2~-Th}u_i=TgN`8@1$fXfZjJ4R-BsL7_%CT@zCH`{CU)q@bz z{H*@GpIRLihhL;Sc!mIZHq>q8&E0D3&{hM6e1D0)syO;f3241_8~J6zWb0`DLTt#d zS8QaLJf&t|PxMv8(ff@6cC8c}P1Y~1=0qZ|GTlbr&qZG?BR1r_81nq-<7(JqJMCCH%D%DaOAvi3O#-)V&7pd7k!nj zACK3{VDs?4#@2=RJ98hKX9Gjt7cz_#Jm{;sgU1O#^E_Fdt)7a_x6&DQH5}SS0Db5D zB=)&;DHE4FtEq15Sv?4#zr;ay$xD16_v?eqUvJ>>Tp%v&sAFujC%{pLF!D?yH6^7j>dGLi+riW-xjm!@Uk#-E!(Vlq7X4>G;~%h*$0@$ERu9G2=j+1ys?f8V8;-n^ z0P(OsYc*DEny;_P`zJfUqVMuP#QHLeU-%d2)j2nU&-Rs|O_rzCMR(>cVR)D!`wp;Q z61g46tM1hG5uZ+ioTjb=E&1Ir(!!{uZD_4SI<}YXMfz`G)Ki``@yQm?5lLg4_~hh zL}H$!x{X#{!$$r+wrVjxaVg(Ax`Q1YY|AK z?wsTQqOVdTT!Ob2y*hs2A*bTiiIb4odPy|bBffK3s_WWv9YQ=c$F>dUCn4u^Bs&^& zIvKCx>rli~O-Hv`_WYb*Ggfqs3hCW5>z$2>BSpeXFCf>y7M;vX_Ft24`>x@$roL^%=C|ie8gpIlmtnO( zYGq_=TKrF4cW7#1CL`)Rs73R}Yx%#k^E2XVv@j1+`)^=;?OwhP3xNUPl4#y1iI?nyaw49cqmBU{8gqMFrV|%> zx(%Wp!j5F@O^${0cud@$bgK0{9Sl6AM>e*c#yWoR7sKQn(ut3U=fY*&uR!S3*6{-m zLBRF!+j8@>%>XKW9xw?8Gp$?A752pvc*gMi-O9Rqstu=%U) z@pNd?1@5{5dAYgjxTi7;U3cddjL9lgvzesX{aM){^C#34q~{f8qU$^MHx0?p%8hCT z1+D>^`B?>pS!%B9e*Du%er{G_!qU}q$bkFr?!|1HJec>)xG`$N@3sHbSM390|BLz( z!~N{QZ3{}b8SdQ0LuQn%GThJK+4u7|XB+OkNqL_x8)~>`%xo(USPwc_x7BUm8SXK| z+CCItXSk=`llR8?U52}$cU7+sH1JpUK7Pv}=nY%F?$J*`*DIO*)OgtEwy(eSxZ%#< zb8A@?{-=#F-;b>|+*2;}8T;x0%v<}Nk(aKHFV$I6Y*!S9@9CFh%Ap4ksK%|C%WM(#RMb;@v0e<)rR z1m^!Gchzf;L+)+${<{6J@ASanVa0}fW?tLtOrpIPFOOBznhZUtSmbmE>ahWk%xOE<~Le{9Brze+mn#*`LLK4`e74$u7J82tb7Le9lbd%*AA zW99=p4fpt!kL9ETUpl1~%|P5UyM5VxZa2g;u;SBARq+3bFLOQYvVUE2=Eg0+=^I~p zvIO$*;jAwG4EMO3?z-!q&W8KxovDXoIfy@}Xk6dDnD6K#eLjJGZr4`5UwYnfPn@%3 zWZxjKr-NdtxSVPsxZmZ4LLVPp3>RDg~W)ZqMQsnCH~+-&VNcfAmM=N45lh zanHH=br64`j(0xQ^^98CVG`y&yJgaj_rPB`?7>;*;D74ot#{rGdQRzzCH;|4(cEX^ zn?TSb-L~k-R6IWeHpHN^V42f z$3GoPyIjw3Pw3s~o5kZ$-^Vxi>GcETvnEz}Q&5N27jxG9h;A_EYFhE&H<9 zBJ2;>plL&HM4lT4t&g6?exEyV$YAW#yrpH=nI-W1y7t33#5du_BU8pr!Tx&kV8fT; zf9mpn-}&!=f5Vzzyc7Cu-&ys^JoxWezujW2SN7EWdpF^n=nt;V+cE^}*=CdP7MzE% zjW+iG5$p1jZ{e;Mhk)&hf9r&Lyg2)(!VTNuH|MQR-5P@bbiLBQUjVkRnv#Zd@|-?z z{iqw@f5P73yK#=6`R2Ye&r#hb_k4Kx%djuH_@<1!vL6|E`#!AClquU64!#UJFudgG zMdVSsG(3DU>fLPUt@Br7-;9`ZsMcZVE%XH-EwR{^w++=wsjWh zQ#&^N73(#lIM{dX95ndGc$X-yGl5o*Jv38sl_>}IzrI;z4^1>ID-N*M-t$YchL6zF!%1_YI1xWS;SO$McHo)bd5owZEXaif>-k`L(Tz>ui^* zb1y5PGo%S;{-(H2yxPC<4bv4@W!1kMWj0n^rEPOM&xt9n^T9TgQ;sOEi+f+6Uh5S6 z{Jf~cumOnY>+j2Mh25Dx&%XV12=o1|#iEuU!rzCxA8p+Zm|b2t=t<09@80io`XJ8Y z4#j(3#k?QfacM$R&?ASo{`>*tvu@t@y7Mqk%BgqvABCNF@a_LZd}UoIqT^iS$D8~#qO4Ak3p67`n1?7TDx_QhA`jYUN@DkD?pL!Ee4Ycm~$ve9YDQ`t9I9 z^~S8V(~w7D@$}0*P>-X7hmFEMD5?9Gg2=5{x8to|p4J}meg9~O7Kgz9aqPBp@Kb*J z%LYYX!vAn9{iR2N)93D5gLNstyUO+c=kWj0(Se(mBaY_xz0hqi>b=FEGO;yqP*2K+_OU77p|*6rnvn;zHmh@ueRJ zj75CgyXglD;P0*Z%R6FUmX!~CEB+YbxSTcO;eW%=_Sx>dFM!J6l-uHnYv!csjs6M! z*<(@%_J!W(r!wEV0d?tms@a%#p*QfRLw#SyKC82G>hjT8#~}@u&6Yra-!bDL@M0tL z%?A%+o*P_k24G*6_HH_H754YZRo=k^DZ!tS|q;V=K$Ne4dzKdWuN4=iw#8VgIj>CftO27B_uj)8zo@k%fVwKCENQ=H0fRh2M-`AMbt${8iH@O{<4E z|2SsfT`9km`jb~-&!rv>8-gWG0%y;HcRrq1ht)9AX z6Y5u<5*XcWBlyc++bG=*{*jHZ%h>N1-e{luQa1b?S<`PU_Q9E{m5n~bJ~)5#de@7; zA)eb-59xxuDz`ncx)av7ti`2?y>Sjs^eipeO6xWHrnWboKwUp6Yt?uG?EgIB`~(XETw^Bg8DJFQ0*Ph zS93JSG==|rb-nz&XM5*@Pk&0GKZ4l5lw7%TWke0dbgo3dfi_@%>qON8ZB!Qj9{}~A z(sH$YwJ<9$mnk&E#M-F5_^7PhxE4(kG~$B1+ybqTX}9LgOm@}JDoA$Lgp2dW6-M)r gigU%UQ=QB}5HWH8ughIA<+N-OTUxeSh+HoJFB0eRoB#j- literal 0 HcmV?d00001 From ee28c6d2a0561b353e5aa0690d1c9b0465f3ae09 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Mon, 4 Nov 2024 21:06:50 +0000 Subject: [PATCH 03/52] remove introduction episode --- config.yaml | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/config.yaml b/config.yaml index 51cc038a..19762acd 100644 --- a/config.yaml +++ b/config.yaml @@ -14,7 +14,7 @@ carpentry: 'epiverse-trace' title: 'Outbreak analytics with R' # Date the lesson was created (YYYY-MM-DD, this is empty by default) -created: +created: # Comma-separated list of keywords for the lesson keywords: 'forecasts, epidemic models, interventions' @@ -58,18 +58,18 @@ contact: 'andree.valle-campos@lshtm.ac.uk' # - another-learner.md # Order of episodes in your lesson -episodes: -- introduction.Rmd +episodes: +#- introduction.Rmd - delays-refresher.Rmd # Information for Learners -learners: +learners: # Information for Instructors -instructors: +instructors: # Learner Profiles -profiles: +profiles: # Customisation --------------------------------------------- # From a2e0526d34d637aac49a96f1084213c34b4e93fb Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Mon, 4 Nov 2024 21:07:08 +0000 Subject: [PATCH 04/52] add summative to episode --- episodes/delays-refresher.Rmd | 216 +++++++++++++++++++++++++++++++++- 1 file changed, 215 insertions(+), 1 deletion(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index edecb363..7af2900f 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -19,9 +19,223 @@ exercises: 2 :::::::::::::::::::::::::::::::::::::::::::::::: + +#### Basic concepts to be developed + +The following concepts will be developed in this practice: + + +- Person-to-person transmission of infectious diseases + +- Basic reproduction number + +- Instantaneous reproduction number + +- Probability of death (IFR, CFR) + +- Serial interval + +- Growth rate + +- Incidence + +:::::::::: prereq + +### Setup a project and folder + +- Create an RStudio project. If needed, follow this [how-to guide on "Hello RStudio Projects"](https://docs.posit.co/ide/user/ide/get-started/#hello-rstudio-projects) to create one. +- Inside the RStudio project, create the `data/` folder. +- Inside the `data/` folder, save the [linelist.rds](https://epiverse-trace.github.io/tutorials/data/linelist.rds) file. + +:::::::::: + ## Introduction -... +A new Ebola virus (EVD) outbreak in a fictitious West African country + +```{r} +# new Ebola virus outbreak ------------------------------------------------ + +#' what we know? +#' - disease: Ebola +#' - location: western Africa +#' - data collection: cases are registered in the hospital + +``` + + +```{r,eval=TRUE,message=FALSE,warning=FALSE} +# Load packages +library(tidyverse) # for {dplyr} functions and the pipe %>% +``` + +## Data structure + +```{r,eval=FALSE,echo=TRUE,message=FALSE} +# Read data +# e.g.: if path to file is data/simulated_ebola_2.csv then: +cases <- read_rds( + here::here("data", "linelist.rds") +) +``` + +```{r,eval=TRUE,echo=FALSE,message=FALSE} +# Read data +cases <- read_rds( + file.path("data", "linelist.rds") +) +``` + +```{r, message=FALSE} +# Print data frame +cases +``` + +```{r} +# quality evaluation ------------------------------------------------------ + +cases + +cases %>% + glimpse() + +cases %>% + cleanepi::scan_data() + +# why do we have missing on infection date or outcome? ------------------- + +#' date of infection: unknown, contact tracing research, recall bias +#' outcome: reporting delay + +cases %>% + visdat::vis_miss() + +# severity ---------------------------------------------------------------- + +# case fatality ratio ----------------------------------------------------- + +cases %>% + count(outcome) + +# should I consider missing outcomes for CFR? ----------------------------- + +cases %>% + count(outcome) %>% + pivot_wider(names_from = outcome,values_from = n) %>% + cleanepi::standardize_column_names() %>% + mutate(cases_known_outcome = death + recover) %>% + mutate(cfr = death / cases_known_outcome) + + +# how much time it take for us register those outcomes? ------------------ + +#' date of hospitalization means the date of report + +cases %>% + select(case_id, date_of_onset, date_of_hospitalisation) %>% + mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% + ggplot(aes(x = reporting_delay)) + + geom_histogram(binwidth = 1) + +# demo code not to run by learner +cases %>% + arrange(date_of_onset) %>% + mutate(case_id = fct_inorder(case_id)) %>% + slice(10:40) %>% + select(x = case_id, value1 = date_of_onset, value2 = date_of_hospitalisation) %>% + ggplot() + + geom_segment( aes(x=x, xend=x, y=value1, yend=value2), color="grey") + + geom_point( aes(x=x, y=value1), color=rgb(0.2,0.7,0.1,0.5), size=3 ) + + geom_point( aes(x=x, y=value2), color=rgb(0.7,0.2,0.1,0.5), size=3 ) + + coord_flip() + +# transmission ------------------------------------------------------------ + +# incidence curve --------------------------------------------------------- + +cases %>% + ggplot(aes(x = date_of_onset)) + + geom_histogram() + +# what transmission indicator can we estimate from the incidence curve? --- + +#' the growth rate! by fitting a linear model +#' more on that on DAY 3-4 + +# what is the name of the delay from infection to symptom onset? ---------- + +cases %>% + select(case_id, date_of_infection, date_of_onset) %>% + mutate(incubation_period = date_of_onset - date_of_infection) %>% + ggplot(aes(x = incubation_period)) + + geom_histogram(binwidth = 1) + +cases %>% + select(case_id, date_of_infection, date_of_onset) %>% + mutate(incubation_period = date_of_onset - date_of_infection) %>% + mutate(incubation_period_num = as.numeric(incubation_period)) %>% + filter(!is.na(incubation_period_num)) %>% + pull(incubation_period_num) %>% + fitdistrplus::fitdist(distr = "lnorm") %>% + # summary() + # plot() + identity() + +#' explore the +#' https://ben18785.shinyapps.io/distribution-zoo/ +#' for gamma, weibull, lnorm parameters + +# why to fit a distribution to observed data? ----------------------------- + +#' the time difference from date infection and date symptom onset +#' give us the incubation period +#' +#' steps +#' - calculate the time difference +#' - plot +#' - fit a distribution (obtain distribution parameters) +#' - do inferences +#' +#' from the incubation period distribution +#' we can infer the length of active monitoring or quarantine +#' example +#' within what time frame do 99% of individuals +#' exhibiting Ebola symptoms exhibit them after infection? + +#' review probability functions +#' https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc +qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) + +#' reference: https://pubmed.ncbi.nlm.nih.gov/32150748/ +#' Lauer, 2020 +#' The Incubation Period of Coronavirus Disease 2019 (COVID-19) +#' From Publicly Reported Confirmed Cases: Estimation and Application + + +# but this fitting step requires account for biases! ---------------------- + +# what to do when we do not have all these dates or info on biases? ------- + +#' we are going to clean and standardize data on +#' DAY 2 afternoon +#' validate, rearrange, visualize epicurve data on +#' DAY 3 morning +#' we can reuse data from past outbreaks! +#' DAY 3 afternoon +#' and took them to account for delays for transmission and severity +#' DAY 4 morning and afternoon + + +# challenge ---------------------------------------------------------------- + +# use epidemiological times figure TRACE LAC ------------------------------ + +#' PAHO figure +#' if we have the serial interval +#' we can make inferences about the window for contact tracing +#' expand the number of pre-days to include more backward contacts +``` + ::::::::::::::::::::::::::::::::::::: keypoints From 729971a59db53449da32553bb13162ea6dac0c7f Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Mon, 4 Nov 2024 21:07:34 +0000 Subject: [PATCH 05/52] update renv lock file --- renv/profiles/lesson-requirements/renv.lock | 191 ++++++++++++++++++++ 1 file changed, 191 insertions(+) diff --git a/renv/profiles/lesson-requirements/renv.lock b/renv/profiles/lesson-requirements/renv.lock index 4b343fb2..0e62e445 100644 --- a/renv/profiles/lesson-requirements/renv.lock +++ b/renv/profiles/lesson-requirements/renv.lock @@ -218,6 +218,19 @@ ], "Hash": "2288423bb0f20a457800d7fc47f6aa54" }, + "arsenal": { + "Package": "arsenal", + "Version": "3.6.3", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "knitr", + "stats", + "utils" + ], + "Hash": "e16d280d498f4d8f2316a01dc3eed9b6" + }, "askpass": { "Package": "askpass", "Version": "1.2.0", @@ -374,6 +387,30 @@ ], "Hash": "0e14e01ce07e7c88fd25de6d4260d26b" }, + "cleanepi": { + "Package": "cleanepi", + "Version": "1.0.2", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "arsenal", + "checkmate", + "dplyr", + "janitor", + "linelist", + "lubridate", + "magrittr", + "matchmaker", + "numberize", + "readr", + "rlang", + "snakecase", + "utils", + "withr" + ], + "Hash": "2b9d9c7abb275271aab4f8a55ebde050" + }, "cli": { "Package": "cli", "Version": "3.6.3", @@ -680,6 +717,21 @@ ], "Hash": "66fa5a16464666772f4929f8f5b2fc71" }, + "fitdistrplus": { + "Package": "fitdistrplus", + "Version": "1.1-11", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "MASS", + "R", + "grDevices", + "methods", + "stats", + "survival" + ], + "Hash": "f40ef9686e85681a1ccbf33d9236aeb9" + }, "fontawesome": { "Package": "fontawesome", "Version": "0.5.2", @@ -924,6 +976,16 @@ ], "Hash": "9171f898db9d9c4c1b2c745adc2c1ef1" }, + "here": { + "Package": "here", + "Version": "1.0.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "rprojroot" + ], + "Hash": "24b224366f9c2e7534d2344d10d59211" + }, "highr": { "Package": "highr", "Version": "0.11", @@ -1012,6 +1074,28 @@ ], "Hash": "0080607b4a1a7b28979aecef976d8bc2" }, + "janitor": { + "Package": "janitor", + "Version": "2.2.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "dplyr", + "hms", + "lifecycle", + "lubridate", + "magrittr", + "purrr", + "rlang", + "snakecase", + "stringi", + "stringr", + "tidyr", + "tidyselect" + ], + "Hash": "5baae149f1082f466df9d1442ba7aa65" + }, "jquerylib": { "Package": "jquerylib", "Version": "0.1.4", @@ -1098,6 +1182,21 @@ ], "Hash": "b8552d117e1b808b09a832f589b79035" }, + "linelist": { + "Package": "linelist", + "Version": "1.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "checkmate", + "dplyr", + "lifecycle", + "rlang", + "tidyselect" + ], + "Hash": "742c211230f8ebc3a9c543263097dddf" + }, "loo": { "Package": "loo", "Version": "2.8.0", @@ -1148,6 +1247,18 @@ ], "Hash": "5f7886e53a3b39d4a110c7bd7fce9164" }, + "matchmaker": { + "Package": "matchmaker", + "Version": "0.1.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "cli", + "forcats", + "rlang" + ], + "Hash": "faae0c4f0c37c5e91cdfffd238212422" + }, "matrixStats": { "Package": "matrixStats", "Version": "1.4.1", @@ -1249,6 +1360,16 @@ ], "Hash": "df58958f293b166e4ab885ebcad90e02" }, + "numberize": { + "Package": "numberize", + "Version": "1.0.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "36f35267920445fd52af1f328320f202" + }, "oai": { "Package": "oai", "Version": "0.4.0", @@ -1273,6 +1394,16 @@ ], "Hash": "2bcca3848e4734eb3b16103bc9aa4b8e" }, + "outbreaks": { + "Package": "outbreaks", + "Version": "1.9.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "a62a28f56f51694490827b57ee78f970" + }, "pak": { "Package": "pak", "Version": "0.7.2", @@ -1584,6 +1715,16 @@ ], "Hash": "062470668513dcda416927085ee9bdc7" }, + "rprojroot": { + "Package": "rprojroot", + "Version": "2.0.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "4c8415e0ec1e29f3f4f6fc108bef0144" + }, "rstan": { "Package": "rstan", "Version": "2.32.6", @@ -1707,6 +1848,18 @@ ], "Hash": "3838071b66e0c566d55cc26bd6e27bf4" }, + "snakecase": { + "Package": "snakecase", + "Version": "0.11.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "stringi", + "stringr" + ], + "Hash": "58767e44739b76965332e8a4fe3f91f1" + }, "socialmixr": { "Package": "socialmixr", "Version": "0.3.2", @@ -1773,6 +1926,22 @@ ], "Hash": "960e2ae9e09656611e0b8214ad543207" }, + "survival": { + "Package": "survival", + "Version": "3.5-8", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "Matrix", + "R", + "graphics", + "methods", + "splines", + "stats", + "utils" + ], + "Hash": "184d7799bca4ba8c3be72ea396f4b9a3" + }, "sys": { "Package": "sys", "Version": "3.4.2", @@ -2000,6 +2169,28 @@ ], "Hash": "c826c7c4241b6fc89ff55aaea3fa7491" }, + "visdat": { + "Package": "visdat", + "Version": "0.6.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "dplyr", + "forcats", + "ggplot2", + "glue", + "magrittr", + "purrr", + "readr", + "scales", + "stats", + "tibble", + "tidyr" + ], + "Hash": "aa0c558f21cab0196eea9b51433f2c0e" + }, "vroom": { "Package": "vroom", "Version": "1.6.5", From 1e9ffd50159dd084e132f7336e1d0006db6de5d0 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Tue, 5 Nov 2024 00:12:33 +0000 Subject: [PATCH 06/52] update linelist dataset --- episodes/data/linelist.rds | Bin 13881 -> 13881 bytes 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/episodes/data/linelist.rds b/episodes/data/linelist.rds index f14575ae6697cd24001314c27d24da4981a9c04b..f30a8715d4c7a3fb650f1effd57b7eb86c0117fc 100644 GIT binary patch literal 13881 zcmbtb37idA)^A!ju>}(nLGs9g7m@c~y`{>Q5`s*a$tRLA85yanR}~$to9;%);#tyy z2{L4{MQe16T`XxKJS1buBu`{9LL!9NVvPu5`Tn=+o?EY<@tOI3_50mEb+-RG_uO-j z(vqH4)6&u^rd6thpXqo_{~dl*!>WZ>W^ z(t^YVm5AgQem*NC8$uU00&-X*OIh~oQK5^5GLb3izr&u7beaKN`A#hT+pYPA%S9rf|01OjLMo{5?O1KA?qT75((;tpHPf|V(KB{ z!kXgqi9PrfQ<4Q{B%mukVX4bjz^$ho4uxcaDTif6j}T_aGDC{kkrE9i?JpdVLx!vK z`FsI^;`arlutX?+za^WZ){(HTLTuif$^3Ax^~z zni2P;{UN{Ka8Y~#D=f|->I(%Vv4^N;%7)mjW`wP%=!s|m?JE3&5iMj1OvUsmiijRT zG6CTy`@&&KR3e1?pol7Ih7}Bmz79lCco97mj!LejFJeljh~Sqb(-f9|EvO}(JNn) zEGeu6g90-oOA*7Zi)@7>g41-L6co8gVc8cI-D=9Ah%9PtNs6S4F7QjTVhF#8sf41U z=M^gym6G;uD6-glM3y!89tCtc5)ttNN<`L#CFW8@7COa3c}4Vq->(NnW*9?0LsSV@ zIuH^4DfxB95|ywFU&x&rA)gkFx;-HoA)kmI(dB?9&fN_A4PBswaNi>$9-FhesAn{6 z`4y3w6w$(#I5D5!GTgaq`Yhc&cL_s77tu{gk^-X7WYg~piXDY;1HvK#8q0KNNHAjR zE=tg_aOK6+Fhjn8K+!c6P@rf*8D~b85j|ps#r_n{3`9jPmf;VXVz*k@2>ZpUNTwx4 zMU~`$-zSOwRIH%i?W};QX^Q9`Kw6T>GoV>UP((KrT~`FB>z1a7p78nPfLnP(iA3G# z0aLf!bBO98&D{~oAiGmvv(O+S7g;tfOT;szu<7=^9`y$;aYjlY;PVU1h)>ttsTh^Q zn%lcZG!S+7uKUnuqAs`&K~r#{U?A#qYaQ~LNpr%)2obe5gSzR?B}t2f-Fp;8Tf1El zk`+@CCng2`rX_YO8CKYx3{fkngv2dI51Q`u)(q5MoRMbYj}lRdh!xU&!Vg!)inv{% z1bwFHGc#nG0dano95zf*K_%cb+^?A^9#h;+Lop-nH-o7~{P?;^ORHgj8{r3@3Vc6kuQ#^j7AN-@ZWQHIZGs*JEmN3oMJ$c9nMlVp?=qgA+`bd0h^HpEjt zsXDe-vL%{o!*~{l>>=GDZs6!{F2$Jh(mUyrJZBHK<) z%;0nub3?o$PK$}g*o&H&#p!Iuf+ziKE+@X2c$-puHR1TVJjHYTNlsKv%;7kcXGt1+ zTs`?a@mzT+_j4%8iK>Zt<#1fN6UXICIF4hlCf>FE)Wm!)pI;7#>Ph}Y5s%w1;QkBB z@vfbNNBBu{ZclN`?+f-RxC=R7YE!}?swNh3J)LKHJlAevj2RM7R83I+OKluizQpF$ z#Nrrx0ps?<-~R6w4W7rJXgM&$4@!i<)r! zF;*)Ar~;-do1+;fdr&Cktbk5UIr6D))M)q+ zb($SRfycidUCQ^DO2yFN`fGTOOPB6?l!oOEDaWH>b3;n@*P+f3z~xM1PZH(!E4kbm zTJEr8CGiIx|HL+&uJOY?;1+_|B})WyoPd|)iJiu z`H(%yiIP3p(flNy=3D93#ek4O6VxZm1Tf8sf%_=G3sDA%vx zGvkqC(Gc$WF{bj!XGa+zs4b*roE^kM>LclwT?(T%sRxybn`wNWYfL z$!Auq(BRrZTGl1^? zf0}=k56vU$ztlPC{5j#`JNMA#X|537QoA1R zFBxcVQC(;b(43_?ME8;KXzoycsc~sv0pxt{7saW|5jQ@ZEW z2Xvnacav>Lc@TB~vQBk(@^STakISy(Px;b*Xs(Gm@cf)}ba6MPu?OE_w%(O*wRtt+ z%nMged5}I1Z{|<&NxzB9sUDOU)$=3L$JmSe{g|5HC)|#3WRe zb3DqI<|6qMKIKp6LH$qq&+R;1{M0%*yqci?-p1qZvgv;?S8Kq~9#(cB{!96!_X^On zRWWH*V0yOj*zcefV~lg`gpBi~vvFb&vW4dwJ0_zQ!wa5rzKq8bIGiUNo5mo-L)eu2 zH9pLSFTx}r!n;PFiW`Ff582rtOilW#I!@;mvFjPgk3376#LN7VL<+2P9NBK&wB zsXX(u7xyKeaH$@GXZGBdbargW1<&*;Ho`wy2Q%_X&Wmh$F6Dlym6>f0GPCpHSqrNo z{i>^@sTK6$Oy%yTkvquDyH4k4TAERi3$6V-FuV6z9yVHeN_nKxYih*(lQEKgs6NS* z-pMBwlk~~*8_AMmrs9%svi#rKlZ@h$UouZJvL#A3#50;KBftNPrrHxvlrV`BRw_^O zWItv@cJ!XgC)*Q7vJL5pl8$VNCd-p;nVfV)$%iP}fU@4Ncoja$-ClwIO}awA|Ek9c zKOK1KG5o+ov=ZJku;@OrcV;KyX3qXCR@yw`H;QvjX{WqeC^oaZ9S96@B67H_o!OBOvY#!7ogJHDu!W~w+!zFG;jwGTDf6{+iQxt1hfef(Mw;`F7j-)1=HzGt zpY2oN>5`k@KdV5?=9=2wva-SE4M|mGZ=2 z{L@EXPF8{I>1p;@hi+Y(bDx{L=k^`gUvqKo)UNqeU%5X((rRJ{C2?==^ymLr(jo56 zsdDp!PmPXy3pRYxw4h7e`^rZb)*L(&_dcH)@7?;=xVP`r;=H>K$GwA|Sl4XKcKH1+ zP*nVO+?&7e=fHs_aqqC23yXd{68HA6TljufB8PXJMJBr8U1I=>4@{&nZSlW!S2I@h1-$O zz~-+-&)p3FTUr*6J0JIs>=ph)odVi6`v=(~-O zcbA~&(T1o?@!^MGAAr1nzy7u>{eYXXW0BMf^~V)@msLZp0T&peB^h!WaSnW`~Ife zr_&}uzv4OlFwUWHQufjW`e<0!le6Ew3;X)0^*4Dlp#Q4%KYy4E`>$T=CjS8b%-D<< zn!@g#U;K@Fp>HO>y>2x6>J_!c?S9mM@Rg0eeZU`ZMu}&lA4V?ww)!*wz&>u3c24^n z_*2E7X0D5Shfe(R={8}=)7rkY3H_Q=V{Z4%%!qrR zTfXb7L7yOAtCEv%4MV=W2eMeFv=PC+j(>VJ3j z@lRYtU6-k2Uh0Ov(TAi5Y9h}X11EpY`l4Q=xlfv~U)e!dzXi@0ch_x*zUbGuf9`8veyS&-B=f zeA8A2dTc|!hd=(XQ60=odv2A303*{5rJvgq4q~p5uofxUbCv zs^{e4va)KgtDZx{+WpjeJmkN6&V2o(>Ny|#vD3q+;a_#%yd6^^KN{bA`UBN-ZhN17 z#rLb8V`HjRX$ZVSi@zUnYMJUeb$M%zJ~d(g-P)2pufhM=y^UGCpLaKu5=T|fPbbff z+qVPyZkK;Mlc#zvSh=^gn4@|wws_;^j)kh{boP`Xml~>`Uus_3-C+Ro+FrNGhWAy^ z;W>9rY;+IsW_{JJ5%N0JxpZEw`iR#^zVhaMz^z-iPR*67=f`Cg22Fnz@y_o^)Mn1ArQ(ZE0QMT2vw*Re_78hfe&uTAeW1B#J%&A;d$y$$*4+bSKNhka$t?A@S} z>Nz{_^2CWU{07!~;?ljad-cK0kqwY<>HaHs{fPY(hnJ4phCG(f{eDHM>Nz-Pe%{px z;!T@Vl-Eo39G%o}RoAB2Z_igN?F2p7sn2m=7`O@IyZo4xfNJ>~GSeW##pkLpcd&_?8^VX)-Gj3BoN9y*Q-D54}3obRQbsX{$2f8la zi#+bTG9>p1@E_{guO8}kVd6V_{9nj>-ygd#Ks`@4*|W75>U?-qji1$#*k6A8{X3(u zANJY&%3jo~$<<@i#{qZVeJk>|!>)PW{-JDtts)POz8!h(uX*(MIR8V&Q-y;+#(uY7 zx@B$&>Tq)Xz^cgeLbH}FUTKZ`r@L(TVZ5IwAgvox38V z(3dBIV~%`K547g^YDWiSANL%ZQ_9X|-{Y?@Lcd(N_Ki3)szm_jo-!TcerJXKRMn4|UyLck|Q^<#QEByE<_MQIH*9GXGqf2Jjn6(i5 z&t1{DO%C#Ux%z zU)BEw&f{F!n7fbT{LcRV$7omFi?hXzw_Lf5Ja>%T{m;K3FTM7~$;jt|GHlU%@ISLn zo`3lwtAF3>d!B^+!xgu`e--&o88oz3AH)kh{CzV6I8X2VtU2q8rzX@5;=ImumtSiI zyq_0)>%Tezw5-875Gyj>@TdbmM(yNg!4QQgC(8Dzx%23n2dRyMSJvv(H# zckfl7HFL~7tspBmha1#JV6%6w)jKQ4GL0lf4N~Oh|-d-tGjPH?O|0^53)GbI!f@)J^z! zw0=rTO0AUIwed3z!?fo3(Ex8#yh~DQr_=*xXpyiMk8&y=_50-@r@X$9><@Fw6I10_ zoKx|TtVKecifKM6pm8dy`aCh=GaOW9pOA%u5pPI%kpjMuU(6Q^MB@=bd3}+H?h#STagWaChctq53zgLpP zUXmmJU|dAz_XLB9b@zE=el^Bth)Et_RM>enpB57{_%y$i$jBoFJW;VDy}_7Q6|?)K zXgDNfLCqsaMCD@AibBsnNnGm2t+MFoA{s8<$odf?M1Wbpv}2%l2a zAMl9k1~uAuJilN}_5?*_n(U9qL}YSIjfX`nK`AapMefnKSBneHpfBJ}oUuqGD62vi zRV6L4Dxsh(`9wXVeorW|?w(lO7ZrQSFNLB3k&#!EJV|R52>Ub-B?6IJ#1%;;S z@g(iKP{<$mh-@Qj(3kigdE)4HB8P}46w$=&S~%p3iMj+mm|x5n4`WLvRz>!NW1?ng zUy@f;Deen}JRlNPs8I35ZGlbM|T+HC}h2@a&8Pzl`(RcB2K(FOK z;}JAWvC=+YFcKFzcu~*9w;!J|H7sI5*`tXsNmNEXMI}5Q`j+q+{P;XYgpl$1vH-6$ocA+QGUWAel2 zA)3bIgKX&DN4_W*@eR0Bft)d4T%M)qVwNOC@=Jjf5H z$rsttOR-bEcrA%0J(|abm-yuWdb}i~xCzN8`5{?4UeYIjge0e0kS(8s@}an>mh=*z zko0MOss;I@d5A~xlAPCruLZ?MFX@tu>PfmZruvYKd{b>`Ofm5MGcYC@#lc^ii*lko zxF6Cde>^T4yYP}v(j~vpO(WYSin&V%rYPpG4qELQwpG9ee>(xNQn%5fauI#RL94z_ zgXb&hXPrfgm1@w4(;OA^9i8}!)x@yTJfIVDIY-56WAhc$BJox7gkC46cg{hp1*1zC z-PFc}enZButEAvj%;N;Wtu`j~>KP7yhs{^a6FLp9mhZOtiTWIMXp4>{`8}#Tc$ffr zw^3{~eytoz+Q!I^t^Upjk>J=v3WCv%*HM&rvbiI+<+!%p;DzMXitQ(;V77 zYt!7WCMxF7wp_8&ZGA3R%>B&X$s7CCq51k))VgS%E{2006hdKd)zyhs%xwg~O5H}Q zzCtMU%=h%_^e!Tyzn7y<1Fp|eF@Li8iunaY9;dSoT#ut-?zHtd%~7$MRpX2EV>LBW z@E~piAossPcjyWN5tm|~vuVXFugE5s1BN5BTVvh}BwU7HXg`Cz|YjkmQs(w>mmBN3)2@`5~lUPWq*~jaDPW!3K(hX%1~()E%0> zpQOLYL5q4k(X2`H&1;_4ZNAmku+jX10D8{mQn;M!Tita>evtr&rKMt{h^rZ+%ZLp* z+jLeV#{XHj`Q{FW=ZS=#bBGjkk3(~R<`O1v!1#w4^7<4rc~{0S*J&V{xz&#G4>IIU z+G*m6L%~$NhR071;NVm~k!SEErEyRX?8KZf8{BRv(eBRs+F`Cs+tbZ@Y zXK`7nN(vsuTuFfW(-a%|VX?Xq3BJ>$6thZq=;}=p9}9siLx6$JJo;TNh#D<*3YqfM}t1jd7b@`IfY@N+BIx!UUd&WP=Xucm! z#%Fn2*BQi6tQHKfClWN@{}*(d7WLmuB>c1UW}b9t@liX)XjV_Fwn2=<<%pltbcb*9 zJoo61UN^-?^Rh#;%TnT8RgxIFRWU+Ew|b^ zDAxZVu`$nK-9fewtj2~-Th}u_i=TgN`8@1$fXfZjJ4R-BsL7_%CT@zCH`{CU)q@bz z{H*@GpIRLihhL;Sc!mIZHq>q8&E0D3&{hM6e1D0)syO;f3241_8~J6zWb0`DLTt#d zS8QaLJf&t|PxMv8(ff@6cC8c}P1Y~1=0qZ|GTlbr&qZG?BR1r_81nq-<7(JqJMCCH%D%DaOAvi3O#-)V&7pd7k!nj zACK3{VDs?4#@2=RJ98hKX9Gjt7cz_#Jm{;sgU1O#^E_Fdt)7a_x6&DQH5}SS0Db5D zB=)&;DHE4FtEq15Sv?4#zr;ay$xD16_v?eqUvJ>>Tp%v&sAFujC%{pLF!D?yH6^7j>dGLi+riW-xjm!@Uk#-E!(Vlq7X4>G;~%h*$0@$ERu9G2=j+1ys?f8V8;-n^ z0P(OsYc*DEny;_P`zJfUqVMuP#QHLeU-%d2)j2nU&-Rs|O_rzCMR(>cVR)D!`wp;Q z61g46tM1hG5uZ+ioTjb=E&1Ir(!!{uZD_4SI<}YXMfz`G)Ki``@yQm?5lLg4_~hh zL}H$!x{X#{!$$r+wrVjxaVg(Ax`Q1YY|AK z?wsTQqOVdTT!Ob2y*hs2A*bTiiIb4odPy|bBffK3s_WWv9YQ=c$F>dUCn4u^Bs&^& zIvKCx>rli~O-Hv`_WYb*Ggfqs3hCW5>z$2>BSpeXFCf>y7M;vX_Ft24`>x@$roL^%=C|ie8gpIlmtnO( zYGq_=TKrF4cW7#1CL`)Rs73R}Yx%#k^E2XVv@j1+`)^=;?OwhP3xNUPl4#y1iI?nyaw49cqmBU{8gqMFrV|%> zx(%Wp!j5F@O^${0cud@$bgK0{9Sl6AM>e*c#yWoR7sKQn(ut3U=fY*&uR!S3*6{-m zLBRF!+j8@>%>XKW9xw?8Gp$?A752pvc*gMi-O9Rqstu=%U) z@pNd?1@5{5dAYgjxTi7;U3cddjL9lgvzesX{aM){^C#34q~{f8qU$^MHx0?p%8hCT z1+D>^`B?>pS!%B9e*Du%er{G_!qU}q$bkFr?!|1HJec>)xG`$N@3sHbSM390|BLz( z!~N{QZ3{}b8SdQ0LuQn%GThJK+4u7|XB+OkNqL_x8)~>`%xo(USPwc_x7BUm8SXK| z+CCItXSk=`llR8?U52}$cU7+sH1JpUK7Pv}=nY%F?$J*`*DIO*)OgtEwy(eSxZ%#< zb8A@?{-=#F-;b>|+*2;}8T;x0%v<}Nk(aKHFV$I6Y*!S9@9CFh%Ap4ksK%|C%WM(#RMb;@v0e<)rR z1m^!Gchzf;L+)+${<{6J@ASanVa0}fW?tLtOrpIPFOOBznhZUtSmbmE>ahWk%xOE<~Le{9Brze+mn#*`LLK4`e74$u7J82tb7Le9lbd%*AA zW99=p4fpt!kL9ETUpl1~%|P5UyM5VxZa2g;u;SBARq+3bFLOQYvVUE2=Eg0+=^I~p zvIO$*;jAwG4EMO3?z-!q&W8KxovDXoIfy@}Xk6dDnD6K#eLjJGZr4`5UwYnfPn@%3 zWZxjKr-NdtxSVPsxZmZ4LLVPp3>RDg~W)ZqMQsnCH~+-&VNcfAmM=N45lh zanHH=br64`j(0xQ^^98CVG`y&yJgaj_rPB`?7>;*;D74ot#{rGdQRzzCH;|4(cEX^ zn?TSb-L~k-R6IWeHpHN^V42f z$3GoPyIjw3Pw3s~o5kZ$-^Vxi>GcETvnEz}Q&5N27jxG9h;A_EYFhE&H<9 zBJ2;>plL&HM4lT4t&g6?exEyV$YAW#yrpH=nI-W1y7t33#5du_BU8pr!Tx&kV8fT; zf9mpn-}&!=f5Vzzyc7Cu-&ys^JoxWezujW2SN7EWdpF^n=nt;V+cE^}*=CdP7MzE% zjW+iG5$p1jZ{e;Mhk)&hf9r&Lyg2)(!VTNuH|MQR-5P@bbiLBQUjVkRnv#Zd@|-?z z{iqw@f5P73yK#=6`R2Ye&r#hb_k4Kx%djuH_@<1!vL6|E`#!AClquU64!#UJFudgG zMdVSsG(3DU>fLPUt@Br7-;9`ZsMcZVE%XH-EwR{^w++=wsjWh zQ#&^N73(#lIM{dX95ndGc$X-yGl5o*Jv38sl_>}IzrI;z4^1>ID-N*M-t$YchL6zF!%1_YI1xWS;SO$McHo)bd5owZEXaif>-k`L(Tz>ui^* zb1y5PGo%S;{-(H2yxPC<4bv4@W!1kMWj0n^rEPOM&xt9n^T9TgQ;sOEi+f+6Uh5S6 z{Jf~cumOnY>+j2Mh25Dx&%XV12=o1|#iEuU!rzCxA8p+Zm|b2t=t<09@80io`XJ8Y z4#j(3#k?QfacM$R&?ASo{`>*tvu@t@y7Mqk%BgqvABCNF@a_LZd}UoIqT^iS$D8~#qO4Ak3p67`n1?7TDx_QhA`jYUN@DkD?pL!Ee4Ycm~$ve9YDQ`t9I9 z^~S8V(~w7D@$}0*P>-X7hmFEMD5?9Gg2=5{x8to|p4J}meg9~O7Kgz9aqPBp@Kb*J z%LYYX!vAn9{iR2N)93D5gLNstyUO+c=kWj0(Se(mBaY_xz0hqi>b=FEGO;yqP*2K+_OU77p|*6rnvn;zHmh@ueRJ zj75CgyXglD;P0*Z%R6FUmX!~CEB+YbxSTcO;eW%=_Sx>dFM!J6l-uHnYv!csjs6M! z*<(@%_J!W(r!wEV0d?tms@a%#p*QfRLw#SyKC82G>hjT8#~}@u&6Yra-!bDL@M0tL z%?A%+o*P_k24G*6_HH_H754YZRo=k^DZ!tS|q;V=K$Ne4dzKdWuN4=iw#8VgIj>CftO27B_uj)8zo@k%fVwKCENQ=H0fRh2M-`AMbt${8iH@O{<4E z|2SsfT`9km`jb~-&!rv>8-gWG0%y;HcRrq1ht)9AX z6Y5u<5*XcWBlyc++bG=*{*jHZ%h>N1-e{luQa1b?S<`PU_Q9E{m5n~bJ~)5#de@7; zA)eb-59xxuDz`ncx)av7ti`2?y>Sjs^eipeO6xWHrnWboKwUp6Yt?uG?EgIB`~(XETw^Bg8DJFQ0*Ph zS93JSG==|rb-nz&XM5*@Pk&0GKZ4l5lw7%TWke0dbgo3dfi_@%>qON8ZB!Qj9{}~A z(sH$YwJ<9$mnk&E#M-F5_^7PhxE4(kG~$B1+ybqTX}9LgOm@}JDoA$Lgp2dW6-M)r gigU%UQ=QB}5HWH8ughIA<+N-OTUxeSh+HoJFB0eRoB#j- From 948ea8aee84306521cc80329bb3f2e0fa5e584b6 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Tue, 5 Nov 2024 00:13:02 +0000 Subject: [PATCH 07/52] arrange episode content for planned narrative --- episodes/delays-refresher.Rmd | 230 ++++++++++++++++++++++++++++------ 1 file changed, 189 insertions(+), 41 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 7af2900f..d89b6027 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -6,16 +6,23 @@ exercises: 2 :::::::::::::::::::::::::::::::::::::: questions -- How to start to analyse outbreak data? +- How to calculate the naive case fatality risk (CFR)? +- How to calculate delays using line list data? +- How to visualize transmission patterns using line list data? +- How to fit a statistical distribution to delay data? :::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: objectives -- Calculate the naive CFR. -- Interpret 95% Confidence Intervals. -- Visualize the growth rate. -- Describe the notification (reporting) delay +- Use the pipe operator `%>%` to structure sequences of data operations left-to-right. +- Count observations in each group using `count()`. +- Pivot data from wide-to-long or long-to-wide using `pivot_*`. +- Create new columns that are functions of existing variables using `mutate()`. +- Keep or drop columns by their names using `select()`. +- Keep rows that match a condition using `filter()`. +- Extract a single column using `pull()`. +- Create graphics declaratively using `{ggplot2}`. :::::::::::::::::::::::::::::::::::::::::::::::: @@ -27,17 +34,9 @@ The following concepts will be developed in this practice: - Person-to-person transmission of infectious diseases -- Basic reproduction number -- Instantaneous reproduction number -- Probability of death (IFR, CFR) - -- Serial interval - -- Growth rate - -- Incidence +- Incidence curve :::::::::: prereq @@ -69,7 +68,7 @@ A new Ebola virus (EVD) outbreak in a fictitious West African country library(tidyverse) # for {dplyr} functions and the pipe %>% ``` -## Data structure +## Explore data ```{r,eval=FALSE,echo=TRUE,message=FALSE} # Read data @@ -87,7 +86,6 @@ cases <- read_rds( ``` ```{r, message=FALSE} -# Print data frame cases ``` @@ -109,10 +107,14 @@ cases %>% cases %>% visdat::vis_miss() +``` + +## Calculate severity +```{r} # severity ---------------------------------------------------------------- -# case fatality ratio ----------------------------------------------------- +# case fatality risk ----------------------------------------------------- cases %>% count(outcome) @@ -121,14 +123,84 @@ cases %>% cases %>% count(outcome) %>% - pivot_wider(names_from = outcome,values_from = n) %>% + pivot_wider(names_from = outcome, values_from = n) %>% + cleanepi::standardize_column_names() %>% + mutate(cases_known_outcome = death + recover) +``` + +:::::::::::: challenge + +Calculate the CFR as the division of known deaths among known outcomes. Do this by adding one more pipe `%>%` in the last code chunk. Report: + +- What is the value of the naive CFR? + +:::::::::::: hint + +You can use the column names of the reminder data set to create a new column. + +```{r,eval=FALSE,echo=TRUE} +cases %>% + count(outcome) %>% + pivot_wider(names_from = outcome, values_from = n) %>% + cleanepi::standardize_column_names() %>% + mutate(cases_known_outcome = death + recover) %>% + mutate(cfr = ... / ... ) # replace the ... spaces +``` + +:::::::::::: + +:::::::::::: solution + +```{r} +cases %>% + count(outcome) %>% + pivot_wider(names_from = outcome, values_from = n) %>% cleanepi::standardize_column_names() %>% mutate(cases_known_outcome = death + recover) %>% mutate(cfr = death / cases_known_outcome) +``` + +:::::::::::: +:::::::::::: + +However, how much time it would take for us register those outcomes? + +```{r,echo=FALSE,eval=TRUE} +# demo code not to run by learner +cases %>% + slice_sample(n = 30) %>% + arrange(date_of_onset) %>% + mutate(case_id = fct_inorder(case_id)) %>% + # slice(10:40) %>% + select( + x = case_id, + value1 = date_of_onset, + value2 = date_of_hospitalisation + ) %>% + ggplot() + + geom_segment( + aes(x = x, xend = x, y = value1, yend = value2), + color = "grey" + ) + + geom_point( + aes(x = x, y = value1), + color = "darkgreen", + size = 3, + alpha = 0.5 + ) + + geom_point( + aes(x = x, y = value2), + color = "red", + size = 3, + alpha = 0.5 + ) + + coord_flip() +``` -# how much time it take for us register those outcomes? ------------------ +## Calculate delays +```{r} #' date of hospitalization means the date of report cases %>% @@ -136,18 +208,15 @@ cases %>% mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% ggplot(aes(x = reporting_delay)) + geom_histogram(binwidth = 1) +``` -# demo code not to run by learner -cases %>% - arrange(date_of_onset) %>% - mutate(case_id = fct_inorder(case_id)) %>% - slice(10:40) %>% - select(x = case_id, value1 = date_of_onset, value2 = date_of_hospitalisation) %>% - ggplot() + - geom_segment( aes(x=x, xend=x, y=value1, yend=value2), color="grey") + - geom_point( aes(x=x, y=value1), color=rgb(0.2,0.7,0.1,0.5), size=3 ) + - geom_point( aes(x=x, y=value2), color=rgb(0.7,0.2,0.1,0.5), size=3 ) + - coord_flip() + + +## Visualize transmission + +aggregate the date by intervals of 7 days + +```{r} # transmission ------------------------------------------------------------ @@ -155,7 +224,7 @@ cases %>% cases %>% ggplot(aes(x = date_of_onset)) + - geom_histogram() + geom_histogram(binwidth = 7) # what transmission indicator can we estimate from the incidence curve? --- @@ -163,28 +232,105 @@ cases %>% #' more on that on DAY 3-4 # what is the name of the delay from infection to symptom onset? ---------- +``` + +```{r,eval=TRUE,echo=FALSE} +outbreaks::ebola_sim_clean$linelist %>% + as_tibble() %>% + arrange(date_of_infection) %>% + rownames_to_column() %>% + mutate(rowname = as.numeric(rowname)) %>% + # slice_sample(n = 166) %>% + # slice_head(n = 166) %>% + mutate( + looking = case_when( + rowname <= 166 ~ "now", + TRUE ~ "then" + ) + ) %>% + # count(looking) + slice_head(n = 100) %>% + select(case_id, date_of_infection, date_of_onset) %>% + pivot_longer(cols = -case_id, names_to = "date_type", values_to = "date") %>% + count(date, date_type) %>% + group_by(date_type) %>% + mutate(cumulative = cumsum(n)) %>% + ungroup() %>% + ggplot(aes(x = date, y = cumulative, color = date_type)) + + geom_line() +``` + +## Fit a statistical distribution to delays + +```{r} cases %>% select(case_id, date_of_infection, date_of_onset) %>% mutate(incubation_period = date_of_onset - date_of_infection) %>% ggplot(aes(x = incubation_period)) + geom_histogram(binwidth = 1) +``` + +```{r} cases %>% select(case_id, date_of_infection, date_of_onset) %>% mutate(incubation_period = date_of_onset - date_of_infection) %>% mutate(incubation_period_num = as.numeric(incubation_period)) %>% filter(!is.na(incubation_period_num)) %>% pull(incubation_period_num) %>% - fitdistrplus::fitdist(distr = "lnorm") %>% - # summary() - # plot() - identity() + fitdistrplus::fitdist(distr = "lnorm") # try: summary and plot #' explore the #' https://ben18785.shinyapps.io/distribution-zoo/ #' for gamma, weibull, lnorm parameters +``` + +::::::::::::::: callout + +### the dollar sign `$` can `pull()` + +```{r} +cases_delay <- cases %>% + select(case_id, date_of_infection, date_of_onset) %>% + mutate(incubation_period = date_of_onset - date_of_infection) %>% + mutate(incubation_period_num = as.numeric(incubation_period)) %>% + filter(!is.na(incubation_period_num)) + +cases_delay %>% pull(incubation_period_num) +``` + +```{r} +cases_delay$incubation_period_num +``` +::::::::::::::: + +::::::::::::::: callout + +### the dollar sign `$` can `pluck()` + +```{r} +incubation_period_fit <- cases %>% + select(case_id, date_of_infection, date_of_onset) %>% + mutate(incubation_period = date_of_onset - date_of_infection) %>% + mutate(incubation_period_num = as.numeric(incubation_period)) %>% + filter(!is.na(incubation_period_num)) %>% + pull(incubation_period_num) %>% + fitdistrplus::fitdist(distr = "lnorm") + +incubation_period_fit %>% pluck("estimate") +``` + +```{r} +incubation_period_fit$estimate +``` + +::::::::::::::: + +## Why to fit a distribution to data? + +```{r} # why to fit a distribution to observed data? ----------------------------- #' the time difference from date infection and date symptom onset @@ -203,15 +349,17 @@ cases %>% #' exhibiting Ebola symptoms exhibit them after infection? #' review probability functions -#' https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) #' reference: https://pubmed.ncbi.nlm.nih.gov/32150748/ #' Lauer, 2020 #' The Incubation Period of Coronavirus Disease 2019 (COVID-19) #' From Publicly Reported Confirmed Cases: Estimation and Application +``` +## Why if we do not have enough data? +```{r} # but this fitting step requires account for biases! ---------------------- # what to do when we do not have all these dates or info on biases? ------- @@ -224,8 +372,9 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) #' DAY 3 afternoon #' and took them to account for delays for transmission and severity #' DAY 4 morning and afternoon +``` - +```{r} # challenge ---------------------------------------------------------------- # use epidemiological times figure TRACE LAC ------------------------------ @@ -239,10 +388,9 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) ::::::::::::::::::::::::::::::::::::: keypoints -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally +- Use packages from the `tidyverse` like `{dplyr}`, `{tidyr}`, and `{ggplot}` for exploratory data analysis. +- Epidemiological delays conditions the estimation of indicators for severity or transmission. +- Fit statistical distribution to delays to make inferences from them for decision making. :::::::::::::::::::::::::::::::::::::::::::::::: From f067b3fc05aa249d17ca1cca2b5c2aa763ce04c8 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Tue, 5 Nov 2024 00:26:38 +0000 Subject: [PATCH 08/52] add challenge and isolate ending in testimonial --- episodes/delays-refresher.Rmd | 38 ++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index d89b6027..3dcf6564 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -211,6 +211,38 @@ cases %>% ``` +::::::::::::::::: challenge + +Calculate the delay from onset to death. + +::::::::::::: hint + +We can keep the rows that match the following logical statement: `outcome == "Death"`. + +```{r,eval=FALSE,echo=TRUE} +cases %>% + filter(outcome == "Death") +``` + +::::::::::::: + +::::::::::::: solution + +```{r} +cases %>% + select(case_id, date_of_onset, date_of_outcome, outcome) %>% + filter(outcome == "Death") %>% + mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% + ggplot(aes(x = delay_onset_death)) + + geom_histogram(binwidth = 1) +``` + +Is is consistent to have negative delays from secondary observations? + +::::::::::::: + +::::::::::::::::: + ## Visualize transmission @@ -357,7 +389,9 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) #' From Publicly Reported Confirmed Cases: Estimation and Application ``` -## Why if we do not have enough data? +::::::::::::::: testimonial + +### What to do if we do not have enough data? ```{r} # but this fitting step requires account for biases! ---------------------- @@ -374,6 +408,8 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) #' DAY 4 morning and afternoon ``` +::::::::::::::: + ```{r} # challenge ---------------------------------------------------------------- From 83241ec4747a59042d7264337657316c916a0f87 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Tue, 5 Nov 2024 00:29:27 +0000 Subject: [PATCH 09/52] fix lintr checks --- episodes/delays-refresher.Rmd | 39 +++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 3dcf6564..176f5231 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -144,7 +144,7 @@ cases %>% pivot_wider(names_from = outcome, values_from = n) %>% cleanepi::standardize_column_names() %>% mutate(cases_known_outcome = death + recover) %>% - mutate(cfr = ... / ... ) # replace the ... spaces + mutate(cfr = ... / ...) # replace the ... spaces ``` :::::::::::: @@ -169,7 +169,7 @@ However, how much time it would take for us register those outcomes? ```{r,echo=FALSE,eval=TRUE} # demo code not to run by learner cases %>% - slice_sample(n = 30) %>% + slice_sample(n = 30) %>% arrange(date_of_onset) %>% mutate(case_id = fct_inorder(case_id)) %>% # slice(10:40) %>% @@ -231,7 +231,7 @@ cases %>% ```{r} cases %>% select(case_id, date_of_onset, date_of_outcome, outcome) %>% - filter(outcome == "Death") %>% + filter(outcome == "Death") %>% mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% ggplot(aes(x = delay_onset_death)) + geom_histogram(binwidth = 1) @@ -270,8 +270,8 @@ cases %>% outbreaks::ebola_sim_clean$linelist %>% as_tibble() %>% arrange(date_of_infection) %>% - rownames_to_column() %>% - mutate(rowname = as.numeric(rowname)) %>% + rownames_to_column() %>% + mutate(rowname = as.numeric(rowname)) %>% # slice_sample(n = 166) %>% # slice_head(n = 166) %>% mutate( @@ -279,16 +279,15 @@ outbreaks::ebola_sim_clean$linelist %>% rowname <= 166 ~ "now", TRUE ~ "then" ) - ) %>% - # count(looking) + ) %>% slice_head(n = 100) %>% - select(case_id, date_of_infection, date_of_onset) %>% - pivot_longer(cols = -case_id, names_to = "date_type", values_to = "date") %>% - count(date, date_type) %>% - group_by(date_type) %>% - mutate(cumulative = cumsum(n)) %>% - ungroup() %>% - ggplot(aes(x = date, y = cumulative, color = date_type)) + + select(case_id, date_of_infection, date_of_onset) %>% + pivot_longer(cols = -case_id, names_to = "date_type", values_to = "date") %>% + count(date, date_type) %>% + group_by(date_type) %>% + mutate(cumulative = cumsum(n)) %>% + ungroup() %>% + ggplot(aes(x = date, y = cumulative, color = date_type)) + geom_line() ``` @@ -328,11 +327,15 @@ cases_delay <- cases %>% mutate(incubation_period = date_of_onset - date_of_infection) %>% mutate(incubation_period_num = as.numeric(incubation_period)) %>% filter(!is.na(incubation_period_num)) +``` +```{r} cases_delay %>% pull(incubation_period_num) ``` -```{r} +Try this yourself: + +```{r,eval=FALSE,echo=TRUE} cases_delay$incubation_period_num ``` @@ -350,11 +353,15 @@ incubation_period_fit <- cases %>% filter(!is.na(incubation_period_num)) %>% pull(incubation_period_num) %>% fitdistrplus::fitdist(distr = "lnorm") +``` +```{r} incubation_period_fit %>% pluck("estimate") ``` -```{r} +Try this yourself: + +```{r,eval=FALSE,echo=TRUE} incubation_period_fit$estimate ``` From da6d28b704152534127b75b67c45c9763b227a1a Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Tue, 5 Nov 2024 23:40:29 +0000 Subject: [PATCH 10/52] add checklist for project, quarto, reprex --- episodes/delays-refresher.Rmd | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 176f5231..f93d85fd 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -48,6 +48,14 @@ The following concepts will be developed in this practice: :::::::::: +::::::::::::::: checklist + + + + + +::::::::::::::: + ## Introduction A new Ebola virus (EVD) outbreak in a fictitious West African country @@ -89,11 +97,24 @@ cases <- read_rds( cases ``` +:::::::::::::::::::: checklist + +### Why should we use the {here} package? + +The `{here}` package is designed to simplify file referencing in R projects by providing a reliable way to construct file paths relative to the project root. The main reason to use it is **Cross-Environment Compatibility**. + +It works across different operating systems (Windows, Mac, Linux) without needing to adjust file paths. + +- On Windows, paths are written using backslashes ( `\` ) as the separator between folder names: `"data\raw-data\file.csv"` +- On Unix based operating system such as macOS or Linux the forward slash ( `/` ) is used as the path separator: `"data/raw-data/file.csv"` + +The `{here}` package is ideal for adding one more layer of reproducibility to your work. If you are interested in reproducibility, we invite you to [read this tutorial to increase the openess, sustainability, and reproducibility of your epidemic analysis with R](https://epiverse-trace.github.io/research-compendium/) + +:::::::::::::::::::: + ```{r} # quality evaluation ------------------------------------------------------ -cases - cases %>% glimpse() @@ -292,7 +313,7 @@ outbreaks::ebola_sim_clean$linelist %>% ``` -## Fit a statistical distribution to delays +## Try to fit a statistical distribution to delays ```{r} cases %>% @@ -428,6 +449,13 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) #' expand the number of pre-days to include more backward contacts ``` +::::::::::::::: checklist + + + + + +::::::::::::::: ::::::::::::::::::::::::::::::::::::: keypoints From a194fe8364f910c0bf0d6b29cea8ee225f04e06f Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Wed, 6 Nov 2024 17:48:59 +0000 Subject: [PATCH 11/52] fix plot outputs and writing statements --- episodes/delays-refresher.Rmd | 186 +++++++++++++++++++++------------- 1 file changed, 118 insertions(+), 68 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index f93d85fd..ed89f1e9 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -9,7 +9,7 @@ exercises: 2 - How to calculate the naive case fatality risk (CFR)? - How to calculate delays using line list data? - How to visualize transmission patterns using line list data? -- How to fit a statistical distribution to delay data? +- How to fit a probability distribution to delay data? :::::::::::::::::::::::::::::::::::::::::::::::: @@ -26,18 +26,6 @@ exercises: 2 :::::::::::::::::::::::::::::::::::::::::::::::: - -#### Basic concepts to be developed - -The following concepts will be developed in this practice: - - -- Person-to-person transmission of infectious diseases - - - -- Incidence curve - :::::::::: prereq ### Setup a project and folder @@ -50,7 +38,9 @@ The following concepts will be developed in this practice: ::::::::::::::: checklist - +let's create a Rmd or quarto file! + + @@ -60,6 +50,8 @@ The following concepts will be developed in this practice: A new Ebola virus (EVD) outbreak in a fictitious West African country +- Person-to-person transmission of infectious diseases + ```{r} # new Ebola virus outbreak ------------------------------------------------ @@ -153,7 +145,7 @@ cases %>% Calculate the CFR as the division of known deaths among known outcomes. Do this by adding one more pipe `%>%` in the last code chunk. Report: -- What is the value of the naive CFR? +- What is the value of the _naive_ CFR? :::::::::::: hint @@ -181,47 +173,53 @@ cases %>% mutate(cfr = death / cases_known_outcome) ``` +This calculation is _naive_ because it tends to yield a biased and mostly underestimated CFR due to the time-delay from onset to death, only stabilising at the later stages of the outbreak. + :::::::::::: :::::::::::: -However, how much time it would take for us register those outcomes? +However, how much time it would take for us register those outcomes? For this we need to calculate delays! + +## Calculate delays + +The time between sequence of dated events can vary between subjects. ```{r,echo=FALSE,eval=TRUE} # demo code not to run by learner -cases %>% +set.seed(99) + +cases_select <- cases %>% slice_sample(n = 30) %>% arrange(date_of_onset) %>% mutate(case_id = fct_inorder(case_id)) %>% # slice(10:40) %>% select( - x = case_id, - value1 = date_of_onset, - value2 = date_of_hospitalisation - ) %>% - ggplot() + + case_id, + date_of_onset, + date_of_hospitalisation + ) + +cases_long <- cases_select %>% + pivot_longer(cols = -case_id,names_to = "date_type",values_to = "date") %>% + mutate(date_type = fct_relevel(date_type, "date_of_onset")) + +ggplot() + geom_segment( - aes(x = x, xend = x, y = value1, yend = value2), + data = cases_select, + aes(x = date_of_onset, y = case_id, + xend = date_of_hospitalisation, yend = case_id), color = "grey" ) + geom_point( - aes(x = x, y = value1), - color = "darkgreen", - size = 3, - alpha = 0.5 - ) + - geom_point( - aes(x = x, y = value2), - color = "red", - size = 3, - alpha = 0.5 + data = cases_long, + aes(x = date,y = case_id, color = date_type), + alpha = 0.5, size = 3 ) + - coord_flip() + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") ``` -## Calculate delays - -```{r} +```{r,warning=FALSE,message=FALSE} #' date of hospitalization means the date of report cases %>% @@ -238,27 +236,40 @@ Calculate the delay from onset to death. ::::::::::::: hint -We can keep the rows that match the following logical statement: `outcome == "Death"`. +We can keep the rows that match a given logical statement, like `outcome == "Death"`, using the function `dplyr::filter()`: ```{r,eval=FALSE,echo=TRUE} cases %>% - filter(outcome == "Death") + dplyr::filter(outcome == "Death") ``` ::::::::::::: ::::::::::::: solution -```{r} +```{r,warning=FALSE,message=FALSE} cases %>% - select(case_id, date_of_onset, date_of_outcome, outcome) %>% - filter(outcome == "Death") %>% - mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% + dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% + dplyr::filter(outcome == "Death") %>% + dplyr::mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% ggplot(aes(x = delay_onset_death)) + geom_histogram(binwidth = 1) ``` -Is is consistent to have negative delays from secondary observations? +Wait! Is is consistent to have negative time delays from primary to secondary observations, like from date of onset to date of death? + +In the next episode we will learn how to check sequence of dated-events and more inconsistencies! + +We can use `dplyr::filter()` again to identify the inconsistent observations: + +```{r} +cases %>% + dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% + dplyr::filter(outcome == "Death") %>% + dplyr::mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% + dplyr::filter(delay_onset_death<1) +``` + ::::::::::::: @@ -267,6 +278,8 @@ Is is consistent to have negative delays from secondary observations? ## Visualize transmission +- Incidence curve + aggregate the date by intervals of 7 days ```{r} @@ -287,35 +300,56 @@ cases %>% # what is the name of the delay from infection to symptom onset? ---------- ``` -```{r,eval=TRUE,echo=FALSE} +However, as seen before, the date of onset of symptoms is a delayed measurement with respect to the date of infection. + +There is an average time delay between the date of infection, date of onset, date of hospitalisation, and date of outcome. + +```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} +library(tidyverse) outbreaks::ebola_sim_clean$linelist %>% - as_tibble() %>% - arrange(date_of_infection) %>% - rownames_to_column() %>% - mutate(rowname = as.numeric(rowname)) %>% # slice_sample(n = 166) %>% - # slice_head(n = 166) %>% - mutate( - looking = case_when( - rowname <= 166 ~ "now", - TRUE ~ "then" + dplyr::as_tibble() %>% + incidence2::incidence( + date_index = c( + "date_of_infection", + "date_of_onset", + "date_of_outcome" + ), + interval = "week", + complete_dates = TRUE + ) %>% + dplyr::mutate( + count_variable = fct_relevel( + count_variable, + "date_of_infection", + "date_of_onset", + "date_of_outcome" ) ) %>% - slice_head(n = 100) %>% - select(case_id, date_of_infection, date_of_onset) %>% - pivot_longer(cols = -case_id, names_to = "date_type", values_to = "date") %>% - count(date, date_type) %>% - group_by(date_type) %>% - mutate(cumulative = cumsum(n)) %>% - ungroup() %>% - ggplot(aes(x = date, y = cumulative, color = date_type)) + - geom_line() + plot( + # show_cases = TRUE, + angle = 45, + n_breaks = 15 + ) + + geom_vline(xintercept = grates::as_isoweek(ymd(20140825)), linetype = 3) ``` +In order to account for these time delays when estimating indicators of severity or transmission, we need to input delays as **Probability Distributions**! -## Try to fit a statistical distribution to delays +::::::::::::::::::: prereq -```{r} +**Watch** one 5-minute video refresher on probability distributions: + +- StatQuest with Josh Starmer (2017) +**The Main Ideas behind Probability Distributions**, YouTube. +Available at: + + +::::::::::::::::::: + +## Try to fit a probability distribution to delays + +```{r,warning=FALSE,message=FALSE} cases %>% select(case_id, date_of_infection, date_of_onset) %>% mutate(incubation_period = date_of_onset - date_of_infection) %>% @@ -386,9 +420,25 @@ Try this yourself: incubation_period_fit$estimate ``` +But, how do you access to the specific parameter? + ::::::::::::::: -## Why to fit a distribution to data? +:::::::::::::::::::::::::::::: testimonial + +### A code completion tip + +If we write the **square brackets** `[]` next to the object `incubation_period_fit$estimate[]`, within `[]` we can use the +Tab key ↹ +for [code completion feature](https://support.posit.co/hc/en-us/articles/205273297-Code-Completion-in-the-RStudio-IDE) + +This gives quick access to `incubation_period_fit$estimate["meanlog"]` and `incubation_period_fit$estimate["sdlog"]`. + +We invite you to try this out in code chunks and the R console! + +:::::::::::::::::::::::::::::: + +## Why to fit a probability distribution to data? ```{r} # why to fit a distribution to observed data? ----------------------------- @@ -399,7 +449,7 @@ incubation_period_fit$estimate #' steps #' - calculate the time difference #' - plot -#' - fit a distribution (obtain distribution parameters) +#' - fit a probability distribution (obtain distribution parameters) #' - do inferences #' #' from the incubation period distribution @@ -461,7 +511,7 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) - Use packages from the `tidyverse` like `{dplyr}`, `{tidyr}`, and `{ggplot}` for exploratory data analysis. - Epidemiological delays conditions the estimation of indicators for severity or transmission. -- Fit statistical distribution to delays to make inferences from them for decision making. +- Fit probability distribution to delays to make inferences from them for decision making. :::::::::::::::::::::::::::::::::::::::::::::::: From 93fabeee4ffeb10e739c2dda8bd0215a90984f55 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Wed, 6 Nov 2024 21:00:28 +0000 Subject: [PATCH 12/52] write intro and add reference --- episodes/delays-refresher.Rmd | 198 +++++++++++++++++++--------------- 1 file changed, 113 insertions(+), 85 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index ed89f1e9..8e799e64 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -6,10 +6,10 @@ exercises: 2 :::::::::::::::::::::::::::::::::::::: questions -- How to calculate the naive case fatality risk (CFR)? -- How to calculate delays using line list data? -- How to visualize transmission patterns using line list data? -- How to fit a probability distribution to delay data? +- How to calculate the _naive_ case fatality risk (CFR)? +- How to visualize transmission from line list data? +- How to calculate delays from line list data? +- How to fit a probability distribution to delays? :::::::::::::::::::::::::::::::::::::::::::::::: @@ -38,34 +38,48 @@ exercises: 2 ::::::::::::::: checklist -let's create a Rmd or quarto file! +### RStudio projects - +The directory of an RStudio Project named, for example `training`, should look like this: - +``` +training/ +|__ data/ +|__ training.Rproj +``` + +**RStudio Projects** allows you to use _relative file_ paths with respect to the `R` Project, +making your code more portable and less error-prone. +Avoids using `setwd()` with _absolute paths_ +like `"C:/Users/MyName/WeirdPath/training/data/file.csv"`. ::::::::::::::: -## Introduction +::::::::::::::: challenge -A new Ebola virus (EVD) outbreak in a fictitious West African country +Let's starts by creating `New Quarto Document`! -- Person-to-person transmission of infectious diseases +1. In the RStudio IDE, go to: File > New File > Quarto Document +2. Accept the default options +3. Save the file with the name `01-report.qmd` +4. Use the `Render` button to render the file and preview the output. -```{r} -# new Ebola virus outbreak ------------------------------------------------ +To learn more about Quarto, follow their tutorial: -#' what we know? -#' - disease: Ebola -#' - location: western Africa -#' - data collection: cases are registered in the hospital + -``` +::::::::::::::: +## Introduction + +A new Ebola Virus Disease (EVD) outbreak has been notified in a fictional country in West Africa. The Ministry of Health is in charge of coordinating the outbreak response, and have contracted you as a consultant in epidemic analysis to inform the response in real time. + +Let's start by loading the package `{dplyr}` to manipulate data, `{tidyr}` to rearrange it, and `{here}` to write file paths within your RStudio project. We'll use the pipe `%>%` to connect some of their functions, including others from the package `{ggplot2}`, so let's also call to the package `{tidyverse}` that loads them all: ```{r,eval=TRUE,message=FALSE,warning=FALSE} # Load packages -library(tidyverse) # for {dplyr} functions and the pipe %>% +library(tidyverse) # loads dplyr, tidyr and ggplot2 +library(here) ``` ## Explore data @@ -108,7 +122,7 @@ The `{here}` package is ideal for adding one more layer of reproducibility to yo # quality evaluation ------------------------------------------------------ cases %>% - glimpse() + dplyr::glimpse() cases %>% cleanepi::scan_data() @@ -125,25 +139,28 @@ cases %>% ## Calculate severity ```{r} -# severity ---------------------------------------------------------------- - # case fatality risk ----------------------------------------------------- cases %>% - count(outcome) + dplyr::count(outcome) +``` +```{r} # should I consider missing outcomes for CFR? ----------------------------- cases %>% - count(outcome) %>% - pivot_wider(names_from = outcome, values_from = n) %>% + dplyr::count(outcome) %>% + tidyr::pivot_wider(names_from = outcome, values_from = n) %>% cleanepi::standardize_column_names() %>% - mutate(cases_known_outcome = death + recover) + dplyr::mutate(cases_known_outcome = death + recover) ``` + :::::::::::: challenge -Calculate the CFR as the division of known deaths among known outcomes. Do this by adding one more pipe `%>%` in the last code chunk. Report: +Calculate the CFR as the division of known deaths among known outcomes. Do this by adding one more pipe `%>%` in the last code chunk. + +Report: - What is the value of the _naive_ CFR? @@ -166,11 +183,11 @@ cases %>% ```{r} cases %>% - count(outcome) %>% - pivot_wider(names_from = outcome, values_from = n) %>% + dplyr::count(outcome) %>% + tidyr::pivot_wider(names_from = outcome, values_from = n) %>% cleanepi::standardize_column_names() %>% - mutate(cases_known_outcome = death + recover) %>% - mutate(cfr = death / cases_known_outcome) + dplyr::mutate(cases_known_outcome = death + recover) %>% + dplyr::mutate(cfr = death / cases_known_outcome) ``` This calculation is _naive_ because it tends to yield a biased and mostly underestimated CFR due to the time-delay from onset to death, only stabilising at the later stages of the outbreak. @@ -190,30 +207,34 @@ The time between sequence of dated events can vary between subjects. set.seed(99) cases_select <- cases %>% - slice_sample(n = 30) %>% - arrange(date_of_onset) %>% - mutate(case_id = fct_inorder(case_id)) %>% + dplyr::slice_sample(n = 30) %>% + dplyr::arrange(date_of_onset) %>% + dplyr::mutate(case_id = fct_inorder(case_id)) %>% # slice(10:40) %>% - select( + dplyr::select( case_id, date_of_onset, date_of_hospitalisation ) cases_long <- cases_select %>% - pivot_longer(cols = -case_id,names_to = "date_type",values_to = "date") %>% - mutate(date_type = fct_relevel(date_type, "date_of_onset")) + tidyr::pivot_longer( + cols = -case_id, + names_to = "date_type", + values_to = "date" + ) %>% + dplyr::mutate(date_type = fct_relevel(date_type, "date_of_onset")) ggplot() + geom_segment( data = cases_select, - aes(x = date_of_onset, y = case_id, + aes(x = date_of_onset, y = case_id, xend = date_of_hospitalisation, yend = case_id), color = "grey" ) + geom_point( data = cases_long, - aes(x = date,y = case_id, color = date_type), + aes(x = date, y = case_id, color = date_type), alpha = 0.5, size = 3 ) + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") @@ -223,8 +244,8 @@ ggplot() + #' date of hospitalization means the date of report cases %>% - select(case_id, date_of_onset, date_of_hospitalisation) %>% - mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% + dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% + dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% ggplot(aes(x = reporting_delay)) + geom_histogram(binwidth = 1) ``` @@ -266,8 +287,8 @@ We can use `dplyr::filter()` again to identify the inconsistent observations: cases %>% dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% dplyr::filter(outcome == "Death") %>% - dplyr::mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% - dplyr::filter(delay_onset_death<1) + dplyr::mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% + dplyr::filter(delay_onset_death < 1) ``` @@ -302,40 +323,44 @@ cases %>% However, as seen before, the date of onset of symptoms is a delayed measurement with respect to the date of infection. +On the last date of hospitalisation, which is the date when the case is registered in the data collection system, the last date of onset of symptoms happened some days ago: + +```{r} +cases %>% + dplyr::summarise( + max(date_of_onset), + max(date_of_hospitalisation) + ) +``` + + There is an average time delay between the date of infection, date of onset, date of hospitalisation, and date of outcome. ```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} -library(tidyverse) -outbreaks::ebola_sim_clean$linelist %>% - # slice_sample(n = 166) %>% - dplyr::as_tibble() %>% - incidence2::incidence( - date_index = c( - "date_of_infection", - "date_of_onset", - "date_of_outcome" - ), - interval = "week", - complete_dates = TRUE - ) %>% +cases %>% dplyr::mutate( - count_variable = fct_relevel( - count_variable, - "date_of_infection", - "date_of_onset", - "date_of_outcome" + delayed = dplyr::case_when( + # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 5) ~ + # "5 weeks before", + # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 4) ~ + # "4 weeks before", + date_of_hospitalisation < max(date_of_hospitalisation) - (7 * 2) ~ + "2 week before", + TRUE ~ "Today" ) ) %>% - plot( - # show_cases = TRUE, - angle = 45, - n_breaks = 15 - ) + - geom_vline(xintercept = grates::as_isoweek(ymd(20140825)), linetype = 3) + mutate( + delayed = forcats::fct_relevel(delayed, "Today") + ) %>% + ggplot(aes(date_of_onset, fill = delayed)) + + geom_histogram(binwidth = 7) + + labs(fill = "Observed cases") ``` In order to account for these time delays when estimating indicators of severity or transmission, we need to input delays as **Probability Distributions**! +## Fit a probability distribution to delays + ::::::::::::::::::: prereq **Watch** one 5-minute video refresher on probability distributions: @@ -347,12 +372,12 @@ Available at: ::::::::::::::::::: -## Try to fit a probability distribution to delays + ```{r,warning=FALSE,message=FALSE} cases %>% - select(case_id, date_of_infection, date_of_onset) %>% - mutate(incubation_period = date_of_onset - date_of_infection) %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% ggplot(aes(x = incubation_period)) + geom_histogram(binwidth = 1) ``` @@ -360,11 +385,11 @@ cases %>% ```{r} cases %>% - select(case_id, date_of_infection, date_of_onset) %>% - mutate(incubation_period = date_of_onset - date_of_infection) %>% - mutate(incubation_period_num = as.numeric(incubation_period)) %>% - filter(!is.na(incubation_period_num)) %>% - pull(incubation_period_num) %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + dplyr::mutate(incubation_period_num = as.numeric(incubation_period)) %>% + dplyr::filter(!is.na(incubation_period_num)) %>% + dplyr::pull(incubation_period_num) %>% fitdistrplus::fitdist(distr = "lnorm") # try: summary and plot #' explore the @@ -378,14 +403,14 @@ cases %>% ```{r} cases_delay <- cases %>% - select(case_id, date_of_infection, date_of_onset) %>% - mutate(incubation_period = date_of_onset - date_of_infection) %>% - mutate(incubation_period_num = as.numeric(incubation_period)) %>% - filter(!is.na(incubation_period_num)) + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + dplyr::mutate(incubation_period_num = as.numeric(incubation_period)) %>% + dplyr::filter(!is.na(incubation_period_num)) ``` ```{r} -cases_delay %>% pull(incubation_period_num) +cases_delay %>% dplyr::pull(incubation_period_num) ``` Try this yourself: @@ -402,16 +427,16 @@ cases_delay$incubation_period_num ```{r} incubation_period_fit <- cases %>% - select(case_id, date_of_infection, date_of_onset) %>% - mutate(incubation_period = date_of_onset - date_of_infection) %>% - mutate(incubation_period_num = as.numeric(incubation_period)) %>% - filter(!is.na(incubation_period_num)) %>% - pull(incubation_period_num) %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + dplyr::mutate(incubation_period_num = as.numeric(incubation_period)) %>% + dplyr::filter(!is.na(incubation_period_num)) %>% + dplyr::pull(incubation_period_num) %>% fitdistrplus::fitdist(distr = "lnorm") ``` ```{r} -incubation_period_fit %>% pluck("estimate") +incubation_period_fit %>% purrr::pluck("estimate") ``` Try this yourself: @@ -515,3 +540,6 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) :::::::::::::::::::::::::::::::::::::::::::::::: +### References + +- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 1 · Recon Learn, RECON learn. Available at: https://www.reconlearn.org/post/real-time-response-1 (Accessed: 06 November 2024). From acd30054d4791c404e22dce62d5e1ba4e5677cc3 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 06:49:11 +0000 Subject: [PATCH 13/52] add text to explore data section --- episodes/delays-refresher.Rmd | 78 +++++++++++++++++++++++++++++------ 1 file changed, 66 insertions(+), 12 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 8e799e64..62b0e757 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -84,9 +84,11 @@ library(here) ## Explore data +For the purpose of this episode, we will read a pre-cleaned line list data. Following episodes will tackle how to solve cleaning tasks. + ```{r,eval=FALSE,echo=TRUE,message=FALSE} # Read data -# e.g.: if path to file is data/simulated_ebola_2.csv then: +# e.g.: if path to file is data/linelist.rds then: cases <- read_rds( here::here("data", "linelist.rds") ) @@ -99,10 +101,6 @@ cases <- read_rds( ) ``` -```{r, message=FALSE} -cases -``` - :::::::::::::::::::: checklist ### Why should we use the {here} package? @@ -118,24 +116,80 @@ The `{here}` package is ideal for adding one more layer of reproducibility to yo :::::::::::::::::::: -```{r} -# quality evaluation ------------------------------------------------------ +```{r, message=FALSE} +# Print line list data +cases +``` -cases %>% - dplyr::glimpse() +:::::::::::::: discussion + +Take some time to look at the data and structure here. + +- Are the data and format similar to line lists that you have seen in the past? +- If you were part of the outbreak investigation team, what other information might you want to collect? + +:::::::::::::: + +:::::::::::: instructor + +The information to collect will depend on the questions we need to give a response. + +At the beginning of an outbreak, we need data to give a response to questions like: + +- How fast does an epidemic grow? +- What is the risk of death? +- How many cases can I expect in the coming days? + +Informative indicators are: + +- growth rate, reproduction number. +- case fatality risk, hospitalization fatality risk. +- projection or forecast of cases. +Useful data are: + +- date of onset, date of death. +- delays from infection to onset, from onset to death. +- percentage of observations detected by surveillance system. +- subject characteristics to stratify the analysis by person, place, time. + +:::::::::::: + +You may notice that there are missing entries. +An important step in analysis is to identify any mistakes in data entry. +Although it can be difficult to assess errors in hospital names, we +would expect the date of infection to always be before the date of +symptom onset. + +```{r} cases %>% cleanepi::scan_data() +``` + +:::::::::::: discussion + +Why do we have more missings on date of infection or date of outcome? -# why do we have missing on infection date or outcome? ------------------- +:::::::::::: + +::::::::::::: instructor + +- date of infection: mostly unknown, depended on limited coverage of contact tracing or outbreak research, and sensitive to recall bias from subjects. +- date of outcome: reporting delay + +::::::::::::: -#' date of infection: unknown, contact tracing research, recall bias -#' outcome: reporting delay +::::::::::::: spoiler +We can also explore missing data with summary visualizations: + +```{r} cases %>% visdat::vis_miss() ``` +::::::::::::: + ## Calculate severity ```{r} From cb62acf0b3452833f76ca2fffd80dbc447e12456 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 06:50:20 +0000 Subject: [PATCH 14/52] remove namespace only packages --- episodes/delays-refresher.Rmd | 1 - 1 file changed, 1 deletion(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 62b0e757..3584ba6f 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -79,7 +79,6 @@ Let's start by loading the package `{dplyr}` to manipulate data, `{tidyr}` to re ```{r,eval=TRUE,message=FALSE,warning=FALSE} # Load packages library(tidyverse) # loads dplyr, tidyr and ggplot2 -library(here) ``` ## Explore data From 7c812fc8c2c6b03155d1caba9470ac234c512713 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 08:05:12 +0000 Subject: [PATCH 15/52] replace callout boxes headings and content --- episodes/delays-refresher.Rmd | 61 ++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 26 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 3584ba6f..3ee16ce9 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -28,7 +28,7 @@ exercises: 2 :::::::::: prereq -### Setup a project and folder +**Setup an RStudio project and folder** - Create an RStudio project. If needed, follow this [how-to guide on "Hello RStudio Projects"](https://docs.posit.co/ide/user/ide/get-started/#hello-rstudio-projects) to create one. - Inside the RStudio project, create the `data/` folder. @@ -38,7 +38,7 @@ exercises: 2 ::::::::::::::: checklist -### RStudio projects +**RStudio projects** The directory of an RStudio Project named, for example `training`, should look like this: @@ -64,8 +64,6 @@ Let's starts by creating `New Quarto Document`! 3. Save the file with the name `01-report.qmd` 4. Use the `Render` button to render the file and preview the output. -To learn more about Quarto, follow their tutorial: - ::::::::::::::: @@ -102,7 +100,7 @@ cases <- read_rds( :::::::::::::::::::: checklist -### Why should we use the {here} package? +**Why should we use the {here} package?** The `{here}` package is designed to simplify file referencing in R projects by providing a reliable way to construct file paths relative to the project root. The main reason to use it is **Cross-Environment Compatibility**. @@ -245,6 +243,22 @@ cases %>% This calculation is _naive_ because it tends to yield a biased and mostly underestimated CFR due to the time-delay from onset to death, only stabilising at the later stages of the outbreak. +Now, as a comparison, how much a CFR estimate changes if we include unknown outcomes in the denominator? + +:::::::::::: + +:::::::::::: solution + +```{r} +cases %>% + dplyr::count(outcome) %>% + tidyr::pivot_wider(names_from = outcome, values_from = n) %>% + cleanepi::standardize_column_names() %>% + dplyr::mutate(cfr = death / (death + recover + na)) +``` + +Considering unknown outcomes underestimates the CFR calculation. + :::::::::::: :::::::::::: @@ -452,7 +466,7 @@ cases %>% ::::::::::::::: callout -### the dollar sign `$` can `pull()` +**The dollar sign `$` can `pull()`** ```{r} cases_delay <- cases %>% @@ -476,7 +490,7 @@ cases_delay$incubation_period_num ::::::::::::::: callout -### the dollar sign `$` can `pluck()` +**The dollar sign `$` can `pluck()`** ```{r} incubation_period_fit <- cases %>% @@ -504,7 +518,7 @@ But, how do you access to the specific parameter? :::::::::::::::::::::::::::::: testimonial -### A code completion tip +**A code completion tip** If we write the **square brackets** `[]` next to the object `incubation_period_fit$estimate[]`, within `[]` we can use the Tab key ↹ @@ -547,22 +561,15 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) ::::::::::::::: testimonial -### What to do if we do not have enough data? +**What to do if we do not have enough data?** -```{r} -# but this fitting step requires account for biases! ---------------------- - -# what to do when we do not have all these dates or info on biases? ------- - -#' we are going to clean and standardize data on -#' DAY 2 afternoon -#' validate, rearrange, visualize epicurve data on -#' DAY 3 morning -#' we can reuse data from past outbreaks! -#' DAY 3 afternoon -#' and took them to account for delays for transmission and severity -#' DAY 4 morning and afternoon -``` +At the beginning of an outbreak, limited data or resources exist to estimate delays accounting for its biases. Until we have more appropriate data for the specific disease and region for the ongoing outbreak, we can reuse delays from past outbreaks from the same pathogens or close in its phylogeny, independent of the area of origin. + +In the next tutorial episodes, we will: + +- Access and analyse common epidemiological parameters from open-source literature search databases. +- Estimate key transmission metrics, such as the reproduction number, from case or death data, adjusting for incubation period and reporting delays. +- Estimate the case fatality risk (CFR) from individual-level and aggregated incidence case and death data, adjusting for delays between onset of symptoms and disease outcome. ::::::::::::::: @@ -579,9 +586,11 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) ::::::::::::::: checklist - +We invite you to: + +- Create **reproducible examples (reprex)**. A reprex help us to communicate our coding problems with software developers. Explore this Applied Epi entry: - +- Keep using **Quarto**. Follow their Get Started tutorial: ::::::::::::::: @@ -595,4 +604,4 @@ qlnorm(p = 0.99, meanlog = 1.995979, sdlog = 0.776226) ### References -- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 1 · Recon Learn, RECON learn. Available at: https://www.reconlearn.org/post/real-time-response-1 (Accessed: 06 November 2024). +- Cori, A. et al. (2019) Real-time outbreak analysis: Ebola as a case study - part 1 · Recon Learn, RECON learn. Available at: https://www.reconlearn.org/post/real-time-response-1 (Accessed: 06 November 2024). From f042498142fd817c250dca31ad670c8130c2fe2a Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 08:09:05 +0000 Subject: [PATCH 16/52] add episode title --- episodes/delays-refresher.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 3ee16ce9..7807cd3d 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -1,5 +1,5 @@ --- -title: 'delays-refresher' +title: 'Introduction to outbreak analytics using R' teaching: 10 exercises: 2 --- From f3491418798ac2d3a39544a6be2dfcddc2e53e41 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 09:50:38 +0000 Subject: [PATCH 17/52] complete section on severity --- episodes/delays-refresher.Rmd | 106 +++++++++++++++++++++++++--------- 1 file changed, 80 insertions(+), 26 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 7807cd3d..9b44c78a 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -1,5 +1,5 @@ --- -title: 'Introduction to outbreak analytics using R' +title: 'Introduction to outbreak analytics' teaching: 10 exercises: 2 --- @@ -64,15 +64,17 @@ Let's starts by creating `New Quarto Document`! 3. Save the file with the name `01-report.qmd` 4. Use the `Render` button to render the file and preview the output. + + ::::::::::::::: ## Introduction -A new Ebola Virus Disease (EVD) outbreak has been notified in a fictional country in West Africa. The Ministry of Health is in charge of coordinating the outbreak response, and have contracted you as a consultant in epidemic analysis to inform the response in real time. +A new Ebola Virus Disease (EVD) outbreak has been notified in a fictional country in West Africa. The Ministry of Health is coordinating the outbreak response and has contracted you as a consultant in epidemic analysis to inform the response in real-time. The available report of cases is coming from hospital admissions. -Let's start by loading the package `{dplyr}` to manipulate data, `{tidyr}` to rearrange it, and `{here}` to write file paths within your RStudio project. We'll use the pipe `%>%` to connect some of their functions, including others from the package `{ggplot2}`, so let's also call to the package `{tidyverse}` that loads them all: +Let's start by loading the package `{dplyr}` to manipulate data, `{tidyr}` to rearrange it, and `{here}` to write file paths within your RStudio project. We'll use the pipe `%>%` to connect some of their functions, including others from the package `{ggplot2}`, so let's call to the package `{tidyverse}` that loads them all: ```{r,eval=TRUE,message=FALSE,warning=FALSE} # Load packages @@ -152,11 +154,9 @@ Useful data are: :::::::::::: -You may notice that there are missing entries. -An important step in analysis is to identify any mistakes in data entry. -Although it can be difficult to assess errors in hospital names, we -would expect the date of infection to always be before the date of -symptom onset. +You may notice that there are **missing** entries. + +An important step in analysis is to identify any mistakes in data entry. The package `{cleanepi}` includes one function called `scan_data()` to get the percentage of missing observations per variable: ```{r} cases %>% @@ -178,7 +178,7 @@ Why do we have more missings on date of infection or date of outcome? ::::::::::::: spoiler -We can also explore missing data with summary visualizations: +We can also explore missing data with summary a visualization: ```{r} cases %>% @@ -187,18 +187,62 @@ cases %>% ::::::::::::: +::::::::::::::::::: checklist + +**The double-colon** + +The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. + +For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package. + +This help us remember package functions and avoid namespace conflicts. + +::::::::::::::::::: + ## Calculate severity -```{r} -# case fatality risk ----------------------------------------------------- +A frequent indicator for severity is the case fatality risk (CFR). +CFR is defined as the conditional probability of death given confirmed diagnosis, calculated as the cumulative number of deaths from an infectious disease over the number of confirmed diagnosed cases. + +We can use the function `dplyr::count()` to count the observations in each group of the variable `outcome`: + +```{r} cases %>% dplyr::count(outcome) ``` -```{r} -# should I consider missing outcomes for CFR? ----------------------------- +:::::::::::: discussion +Report: + +- What to do with cases whose outcome is `NA`? + +- Should we consider missing outcomes to calculate the CFR? + +:::::::::::: + +:::::::::::: instructor + +CFR estimation is sensitive to: + +- **Right-censoring bias**. If we include observations with unknown final status we can underestimate the true CFR. + +- **Selection bias**. At the beginning of an outbreak, given that health systems collect most clinically severe cases, an early estimate of the CFR can overestimate the true CFR. + +:::::::::::: + +To calculate the CFR we can add more functions using the pipe `%>%` and structure sequences of data operations left-to-right. + +From the `cases` object we will use: + +- `dplyr::count()` to count the observations in each group of the variable `outcome`, +- `tidyr::pivot_wider()` to pivot the data long-to-wide with names from `outcome` and values from `n` columns, +- `cleanepi::standardize_column_names()` to standardize column names, +- `dplyr::mutate()` to create one new column `cases_known_outcome` as a function of existing variables `death` and `recover`. + +```{r} +# calculate the number of cases with known outcome cases %>% dplyr::count(outcome) %>% tidyr::pivot_wider(names_from = outcome, values_from = n) %>% @@ -206,20 +250,22 @@ cases %>% dplyr::mutate(cases_known_outcome = death + recover) ``` +This way of writing almost look like writing a recipe! :::::::::::: challenge -Calculate the CFR as the division of known deaths among known outcomes. Do this by adding one more pipe `%>%` in the last code chunk. +Calculate the CFR as the division of the number of **deaths** among **known outcomes**. Do this by adding one more pipe `%>%` in the last code chunk. Report: -- What is the value of the _naive_ CFR? +- What is the value of the CFR? :::::::::::: hint -You can use the column names of the reminder data set to create a new column. +You can use the column names of variables to create one more column: ```{r,eval=FALSE,echo=TRUE} +# calculate the naive CFR cases %>% count(outcome) %>% pivot_wider(names_from = outcome, values_from = n) %>% @@ -233,6 +279,7 @@ cases %>% :::::::::::: solution ```{r} +# calculate the naive CFR cases %>% dplyr::count(outcome) %>% tidyr::pivot_wider(names_from = outcome, values_from = n) %>% @@ -250,6 +297,7 @@ Now, as a comparison, how much a CFR estimate changes if we include unknown outc :::::::::::: solution ```{r} +# underestimate the naive CFR cases %>% dplyr::count(outcome) %>% tidyr::pivot_wider(names_from = outcome, values_from = n) %>% @@ -257,17 +305,17 @@ cases %>% dplyr::mutate(cfr = death / (death + recover + na)) ``` -Considering unknown outcomes underestimates the CFR calculation. +Considering unknown outcomes underestimates the _naive_ CFR calculation. :::::::::::: :::::::::::: -However, how much time it would take for us register those outcomes? For this we need to calculate delays! +However, in average, how much time it would take to know the outcomes of those cases? For this we can calculate **delays**! ## Calculate delays -The time between sequence of dated events can vary between subjects. +The time between sequence of dated events can vary between subjects. For example, we would expect the date of infection to always be before the date of symptom onset, and the later always before the date of hospitalization. ```{r,echo=FALSE,eval=TRUE} # demo code not to run by learner @@ -307,9 +355,9 @@ ggplot() + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") ``` -```{r,warning=FALSE,message=FALSE} -#' date of hospitalization means the date of report +Given that the date of hospitalization means the date of report +```{r,warning=FALSE,message=FALSE} cases %>% dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% @@ -584,19 +632,25 @@ In the next tutorial episodes, we will: #' expand the number of pre-days to include more backward contacts ``` -::::::::::::::: checklist +::::::::::::::: challenge + +Let's create **reproducible examples (`reprex`)**. A reprex help us to communicate our coding problems with software developers. Explore this Applied Epi entry: -We invite you to: +Create a `reprex` with your answer: -- Create **reproducible examples (reprex)**. A reprex help us to communicate our coding problems with software developers. Explore this Applied Epi entry: +- What is the value of the CFR from the data set in the chuck below? -- Keep using **Quarto**. Follow their Get Started tutorial: +```{r} +outbreaks::ebola_sim_clean %>% + pluck("linelist") %>% + as_tibble() +``` ::::::::::::::: ::::::::::::::::::::::::::::::::::::: keypoints -- Use packages from the `tidyverse` like `{dplyr}`, `{tidyr}`, and `{ggplot}` for exploratory data analysis. +- Use packages from the `tidyverse` like `{dplyr}`, `{tidyr}`, and `{ggplot2}` for exploratory data analysis. - Epidemiological delays conditions the estimation of indicators for severity or transmission. - Fit probability distribution to delays to make inferences from them for decision making. From c57c9d7e9403f9088149c53eb309876dacc22c43 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 10:08:43 +0000 Subject: [PATCH 18/52] complete calculate delays section --- episodes/delays-refresher.Rmd | 47 +++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 10 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 9b44c78a..731b427d 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -26,6 +26,16 @@ exercises: 2 :::::::::::::::::::::::::::::::::::::::::::::::: +:::::::::: instructor + +Useful concepts maps to teach this episode are + +- +- +- + +:::::::::: + :::::::::: prereq **Setup an RStudio project and folder** @@ -341,21 +351,29 @@ cases_long <- cases_select %>% dplyr::mutate(date_type = fct_relevel(date_type, "date_of_onset")) ggplot() + + geom_point( + data = cases_long, + aes(x = date, y = case_id, color = date_type), + alpha = 0.5, size = 3 + ) + geom_segment( data = cases_select, aes(x = date_of_onset, y = case_id, xend = date_of_hospitalisation, yend = case_id), color = "grey" ) + - geom_point( - data = cases_long, - aes(x = date, y = case_id, color = date_type), - alpha = 0.5, size = 3 - ) + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") ``` -Given that the date of hospitalization means the date of report +Given that the date of hospitalization means the date of report, we can calculate the **reporting delay** from this line list data. + +From the `cases` object we will use: + +- `dplyr::select()` to keep columns using their names, +- `dplyr::mutate()` to create one new column `reporting_delay` as a function of existing variables `date_of_hospitalisation` and `date_of_onset`, +- `ggplot()` to declare the input data frame for a graphic, +- `aes()` to describe how variables in the data are mapped to visual properties (aesthetics) of `geoms`, +- `geom_histogram()` to visualise the distribution of a single continuous variable by dividing the x axis into `bins` and counting the number of observations in each `bin`. ```{r,warning=FALSE,message=FALSE} cases %>% @@ -368,7 +386,11 @@ cases %>% ::::::::::::::::: challenge -Calculate the delay from onset to death. +To calculate a _delay-adjusted_ CFR, we need to assume a known the delay from onset to death. + +Using the `cases` object: + +- Calculate and visualize the delay from onset to death. ::::::::::::: hint @@ -392,10 +414,16 @@ cases %>% geom_histogram(binwidth = 1) ``` -Wait! Is is consistent to have negative time delays from primary to secondary observations, like from date of onset to date of death? +Wait! Is is consistent to have negative time delays from primary to secondary observations, i.e., from date of onset to date of death? In the next episode we will learn how to check sequence of dated-events and more inconsistencies! +But, how would you keep the rows with negative delay values? Try this out. + +::::::::::::: + +:::::::::::: solution + We can use `dplyr::filter()` again to identify the inconsistent observations: ```{r} @@ -406,8 +434,7 @@ cases %>% dplyr::filter(delay_onset_death < 1) ``` - -::::::::::::: +:::::::::::: ::::::::::::::::: From 3e931001aabc3b7be01990e008363de5f9b89a26 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 10:09:20 +0000 Subject: [PATCH 19/52] fix lintr checks --- episodes/delays-refresher.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 731b427d..9b75d271 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -668,8 +668,8 @@ Create a `reprex` with your answer: - What is the value of the CFR from the data set in the chuck below? ```{r} -outbreaks::ebola_sim_clean %>% - pluck("linelist") %>% +outbreaks::ebola_sim_clean %>% + pluck("linelist") %>% as_tibble() ``` From fed06822667095f1dff3403ba99e1fc30d2fd823 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 13:34:33 +0000 Subject: [PATCH 20/52] fix content after rendering review --- episodes/delays-refresher.Rmd | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 9b75d271..a9ccfdd5 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -166,12 +166,25 @@ Useful data are: You may notice that there are **missing** entries. + + +::::::::::::: spoiler + +We can also explore missing data with summary a visualization: + +```{r} +cases %>% + visdat::vis_miss() +``` + +::::::::::::: :::::::::::: discussion @@ -186,17 +199,6 @@ Why do we have more missings on date of infection or date of outcome? ::::::::::::: -::::::::::::: spoiler - -We can also explore missing data with summary a visualization: - -```{r} -cases %>% - visdat::vis_miss() -``` - -::::::::::::: - ::::::::::::::::::: checklist **The double-colon** @@ -315,13 +317,13 @@ cases %>% dplyr::mutate(cfr = death / (death + recover + na)) ``` -Considering unknown outcomes underestimates the _naive_ CFR calculation. +Due to **right-censoring bias**, if we include observations with unknown final status we can underestimate the true CFR. :::::::::::: :::::::::::: -However, in average, how much time it would take to know the outcomes of those cases? For this we can calculate **delays**! +Data today will not include outcomes from patients that are still hospitalised. Then, one relevant question to ask is: In average, how much time it would take to know the outcomes of those cases? For this we can calculate **delays**! ## Calculate delays From e1f0ccc52fce25ffb0b14226da5ebbca11acd10f Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 13:52:02 +0000 Subject: [PATCH 21/52] add list of operators --- episodes/delays-refresher.Rmd | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index a9ccfdd5..95809c6f 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -541,9 +541,21 @@ cases %>% #' for gamma, weibull, lnorm parameters ``` -::::::::::::::: callout +Let's review some operators used until now: -**The dollar sign `$` can `pull()`** +- double colon `::` +- assignment `<-` +- pipe `%>%` +- negation `!` + +We need to add two more to the list: + +- dollar sign `$` +- square brackets `[]` + +::::::::::::::: tab + +### The dollar sign `$` can `pull()` ```{r} cases_delay <- cases %>% @@ -554,7 +566,8 @@ cases_delay <- cases %>% ``` ```{r} -cases_delay %>% dplyr::pull(incubation_period_num) +cases_delay %>% + dplyr::pull(incubation_period_num) ``` Try this yourself: @@ -563,11 +576,8 @@ Try this yourself: cases_delay$incubation_period_num ``` -::::::::::::::: - -::::::::::::::: callout -**The dollar sign `$` can `pluck()`** +### The dollar sign `$` can `pluck()` ```{r} incubation_period_fit <- cases %>% @@ -580,7 +590,8 @@ incubation_period_fit <- cases %>% ``` ```{r} -incubation_period_fit %>% purrr::pluck("estimate") +incubation_period_fit %>% + purrr::pluck("estimate") ``` Try this yourself: @@ -669,7 +680,7 @@ Create a `reprex` with your answer: - What is the value of the CFR from the data set in the chuck below? -```{r} +```{r,eval=FALSE,echo=TRUE} outbreaks::ebola_sim_clean %>% pluck("linelist") %>% as_tibble() From d4329dce45eb953582371f223ed2a9e33f1affb7 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 16:32:49 +0000 Subject: [PATCH 22/52] write partial visualization of transmission --- episodes/delays-refresher.Rmd | 169 +++++++++++++++++++++++++++------- 1 file changed, 134 insertions(+), 35 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 95809c6f..52619186 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -374,8 +374,8 @@ From the `cases` object we will use: - `dplyr::select()` to keep columns using their names, - `dplyr::mutate()` to create one new column `reporting_delay` as a function of existing variables `date_of_hospitalisation` and `date_of_onset`, - `ggplot()` to declare the input data frame for a graphic, -- `aes()` to describe how variables in the data are mapped to visual properties (aesthetics) of `geoms`, -- `geom_histogram()` to visualise the distribution of a single continuous variable by dividing the x axis into `bins` and counting the number of observations in each `bin`. +- `aes()` to describe how the variable `reporting_delay` will be mapped to visual properties (aesthetics) of `geoms`, +- `geom_histogram()` to visualise the distribution of a single continuous variable `reporting_delay` by dividing the x axis into `bins` and counting the number of observations in each `bin`, with `binwidth` equal to 1 day. ```{r,warning=FALSE,message=FALSE} cases %>% @@ -436,6 +436,8 @@ cases %>% dplyr::filter(delay_onset_death < 1) ``` +More on estimating a _delay-adjusted_ CFR on the episode about Estimating outbreak severity. + :::::::::::: ::::::::::::::::: @@ -443,42 +445,69 @@ cases %>% ## Visualize transmission -- Incidence curve - -aggregate the date by intervals of 7 days +The first question we want to know is simply: how bad is it?. The first step of the analysis is descriptive - we want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset. -```{r} - -# transmission ------------------------------------------------------------ +From the `cases` object we will use: -# incidence curve --------------------------------------------------------- +- `ggplot()` to declare the input data frame, +- `aes()` for the variable `date_of_onset` to map to `geoms`, +- `geom_histogram()` to visualise the distribution of a single continuous variable with a `binwidth` equal to 7 days. +```{r} +# incidence curve cases %>% ggplot(aes(x = date_of_onset)) + geom_histogram(binwidth = 7) +``` -# what transmission indicator can we estimate from the incidence curve? --- +:::::::::::: discussion -#' the growth rate! by fitting a linear model -#' more on that on DAY 3-4 +The early phase of an outbreak usually growths exponentially. -# what is the name of the delay from infection to symptom onset? ---------- -``` +- Why exponential growth may not be observed in the most recent weeks? -However, as seen before, the date of onset of symptoms is a delayed measurement with respect to the date of infection. +:::::::::::: + +::::::::::: instructor + +Close inspection of the line list shows that the last date of any entry (by date of hospitalization) is a bit later than the last date of symptom onset. + +From the `cases` object we can use: -On the last date of hospitalisation, which is the date when the case is registered in the data collection system, the last date of onset of symptoms happened some days ago: +- `dplyr::summarise()` to summarise each group down to one row, +- `base::max` to calculate the maximum dates of onset and hospitalisation. ```{r} cases %>% dplyr::summarise( - max(date_of_onset), - max(date_of_hospitalisation) + max_onset = max(date_of_onset), + max_hospital = max(date_of_hospitalisation) ) ``` +::::::::::: + +You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the threshold date you choose, as follows: + +```{r} +cases %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + dplyr::mutate(incubation_period_num = as.numeric(incubation_period)) %>% + skimr::skim(incubation_period_num) +``` + + +```{r,warning=FALSE,message=FALSE} +cases %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + ggplot(aes(x = incubation_period)) + + geom_histogram(binwidth = 1) +``` + + -There is an average time delay between the date of infection, date of onset, date of hospitalisation, and date of outcome. ```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} cases %>% @@ -501,7 +530,82 @@ cases %>% labs(fill = "Observed cases") ``` -In order to account for these time delays when estimating indicators of severity or transmission, we need to input delays as **Probability Distributions**! +:::::::::::::: challenge + +Report: + +- What transmission indicator can we estimate from the incidence curve? + +::::::::: solution + +- The growth rate! by fitting a linear model. +- Th Rt + +More on that on episodes about quantifying transmission. + +```{r,eval=TRUE,echo=FALSE} +dat <- cases %>% + incidence2::incidence( + date_index = "date_of_onset", + interval = "week", + complete_dates = TRUE + ) + +fitted <- dat %>% + # truncate curve to fit withou delays + filter(date_index% + nest() %>% + mutate( + model = lapply( + data, + function(x) glm(count ~ date_index, data = x, family = poisson) + ) + ) + +intervals <- + fitted %>% + mutate(result = Map( + function(data, model) { + data %>% + ciTools::add_ci( + model, + alpha = 0.05, + names = c("lower_ci", "upper_ci") + ) %>% + as_tibble() + }, + data, + model + )) %>% + unnest(result) + +plot(dat, angle = 45) + + ggplot2::geom_line( + ggplot2::aes(date_index, y = pred), + data = intervals, + inherit.aes = FALSE + ) + + ggplot2::geom_ribbon( + ggplot2::aes(date_index, ymin = lower_ci, ymax = upper_ci), + alpha = 0.2, + data = intervals, + inherit.aes = FALSE, + fill = "#BBB67E" + ) +``` + + +::::::::: + +:::::::::::::: + +In order to account for these time delays when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**! + +::::::::::: challenge + +- What is the name of the delay from infection to symptom onset? + +::::::::::: ## Fit a probability distribution to delays @@ -518,15 +622,6 @@ Available at: -```{r,warning=FALSE,message=FALSE} -cases %>% - dplyr::select(case_id, date_of_infection, date_of_onset) %>% - dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% - ggplot(aes(x = incubation_period)) + - geom_histogram(binwidth = 1) -``` - - ```{r} cases %>% dplyr::select(case_id, date_of_infection, date_of_onset) %>% @@ -541,17 +636,21 @@ cases %>% #' for gamma, weibull, lnorm parameters ``` +::::::::::::: checklist + Let's review some operators used until now: -- double colon `::` -- assignment `<-` -- pipe `%>%` -- negation `!` +- Assignment `<-` assigns a value to a variable from right to left. +- Double colon `::` to call a function from a specific package. +- Pipe `%>%` to structure sequences of data operations left-to-right +- Logical negation `!` to indicate a logical negation (NOT). + +::::::::::::: We need to add two more to the list: -- dollar sign `$` -- square brackets `[]` +- Dollar sign `$` +- Square brackets `[]` ::::::::::::::: tab From 4c3e6ccee7c73de8250cb1db90b759f22831ff52 Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 19:07:35 +0000 Subject: [PATCH 23/52] change content in delays as a follow up from cfr --- episodes/delays-refresher.Rmd | 99 +++++++++++++++++++++++++---------- 1 file changed, 71 insertions(+), 28 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 52619186..4bbe9dc9 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -329,6 +329,8 @@ Data today will not include outcomes from patients that are still hospitalised. The time between sequence of dated events can vary between subjects. For example, we would expect the date of infection to always be before the date of symptom onset, and the later always before the date of hospitalization. +In a random sample of 30 observations from the `cases` data frame we observe variability between the date of hospitalization and date of outcome: + ```{r,echo=FALSE,eval=TRUE} # demo code not to run by learner set.seed(99) @@ -336,12 +338,13 @@ set.seed(99) cases_select <- cases %>% dplyr::slice_sample(n = 30) %>% dplyr::arrange(date_of_onset) %>% - dplyr::mutate(case_id = fct_inorder(case_id)) %>% - # slice(10:40) %>% + dplyr::mutate(case_id = fct_inorder(case_id)) %>% + dplyr::mutate(outcome_delay = date_of_outcome - date_of_hospitalisation) %>% + dplyr::filter(outcome_delay > 0) %>% dplyr::select( case_id, - date_of_onset, - date_of_hospitalisation + date_of_hospitalisation, + date_of_outcome ) cases_long <- cases_select %>% @@ -350,7 +353,7 @@ cases_long <- cases_select %>% names_to = "date_type", values_to = "date" ) %>% - dplyr::mutate(date_type = fct_relevel(date_type, "date_of_onset")) + dplyr::mutate(date_type = fct_relevel(date_type, "date_of_hospitalisation")) ggplot() + geom_point( @@ -360,31 +363,39 @@ ggplot() + ) + geom_segment( data = cases_select, - aes(x = date_of_onset, y = case_id, - xend = date_of_hospitalisation, yend = case_id), + aes(x = date_of_hospitalisation, y = case_id, + xend = date_of_outcome, yend = case_id), color = "grey" ) + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") ``` -Given that the date of hospitalization means the date of report, we can calculate the **reporting delay** from this line list data. +We can calculate the average time from hospitalisation to outcome from the line list. From the `cases` object we will use: - `dplyr::select()` to keep columns using their names, -- `dplyr::mutate()` to create one new column `reporting_delay` as a function of existing variables `date_of_hospitalisation` and `date_of_onset`, -- `ggplot()` to declare the input data frame for a graphic, -- `aes()` to describe how the variable `reporting_delay` will be mapped to visual properties (aesthetics) of `geoms`, -- `geom_histogram()` to visualise the distribution of a single continuous variable `reporting_delay` by dividing the x axis into `bins` and counting the number of observations in each `bin`, with `binwidth` equal to 1 day. +- `dplyr::mutate()` to create one new column `outcome_delay` as a function of existing variables `date_of_outcome` and `date_of_hospitalisation`, +- `dplyr::filter()` to keep the rows that match a condition like `outcome_delay > 0`, +- `skimr::skim()` to get useful summary statistics -```{r,warning=FALSE,message=FALSE} +```{r} cases %>% - dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% - dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% - ggplot(aes(x = reporting_delay)) + - geom_histogram(binwidth = 1) + dplyr::select(case_id, date_of_hospitalisation, date_of_outcome) %>% + dplyr::mutate(outcome_delay = date_of_outcome - date_of_hospitalisation) %>% + dplyr::filter(outcome_delay > 0) %>% + skimr::skim(outcome_delay) ``` +::::::::::::::::: callout + +**Consistency among sequence of dated-events** + +Wait! Is is consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death? + +In the next episode called **Clean data** we will learn how to check sequence of dated-events and more inconsistencies! + +::::::::::::::::: ::::::::::::::::: challenge @@ -392,17 +403,19 @@ To calculate a _delay-adjusted_ CFR, we need to assume a known the delay from on Using the `cases` object: -- Calculate and visualize the delay from onset to death. +- Calculate the summary statistics of the delay from onset to death. ::::::::::::: hint -We can keep the rows that match a given logical statement, like `outcome == "Death"`, using the function `dplyr::filter()`: +Keep the rows that match a condition like `outcome == "Death"`: ```{r,eval=FALSE,echo=TRUE} cases %>% dplyr::filter(outcome == "Death") ``` +Is it consistent to have negative delays from onset of symptoms to death? + ::::::::::::: ::::::::::::: solution @@ -412,15 +425,11 @@ cases %>% dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% dplyr::filter(outcome == "Death") %>% dplyr::mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% - ggplot(aes(x = delay_onset_death)) + - geom_histogram(binwidth = 1) + dplyr::filter(delay_onset_death > 0) %>% + skimr::skim(delay_onset_death) ``` -Wait! Is is consistent to have negative time delays from primary to secondary observations, i.e., from date of onset to date of death? - -In the next episode we will learn how to check sequence of dated-events and more inconsistencies! - -But, how would you keep the rows with negative delay values? Try this out. +Now, let's say you want to keep the rows with negative delay values to inspect wherethe source of the inconsistency. How would you do it? ::::::::::::: @@ -433,7 +442,7 @@ cases %>% dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% dplyr::filter(outcome == "Death") %>% dplyr::mutate(delay_onset_death = date_of_outcome - date_of_onset) %>% - dplyr::filter(delay_onset_death < 1) + dplyr::filter(delay_onset_death < 0) ``` More on estimating a _delay-adjusted_ CFR on the episode about Estimating outbreak severity. @@ -489,6 +498,40 @@ cases %>% You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the threshold date you choose, as follows: +```{r} +cases %>% + dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% + dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% + dplyr::mutate(reporting_delay_num = as.numeric(reporting_delay)) %>% + skimr::skim(reporting_delay_num) +``` + + + + + +Given that the date of hospitalization means the date of report, we can calculate the **reporting delay** from this line list data. + +From the `cases` object we will use: + +- `dplyr::select()` to keep columns using their names, +- `dplyr::mutate()` to create one new column `reporting_delay` as a function of existing variables `date_of_hospitalisation` and `date_of_onset`, +- `ggplot()` to declare the input data frame for a graphic, +- `aes()` to describe how the variable `reporting_delay` will be mapped to visual properties (aesthetics) of `geoms`, +- `geom_histogram()` to visualise the distribution of a single continuous variable `reporting_delay` by dividing the x axis into `bins` and counting the number of observations in each `bin`, with `binwidth` equal to 1 day. + +```{r,warning=FALSE,message=FALSE} +cases %>% + dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% + dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% + ggplot(aes(x = reporting_delay)) + + geom_histogram(binwidth = 1) +``` + + + + + ```{r} cases %>% dplyr::select(case_id, date_of_infection, date_of_onset) %>% @@ -539,7 +582,7 @@ Report: ::::::::: solution - The growth rate! by fitting a linear model. -- Th Rt +- The reproduction number More on that on episodes about quantifying transmission. From 4227037bb2705d5d29a8f5ba8ab18f6330cc079d Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 20:30:35 +0000 Subject: [PATCH 24/52] complete visualization section --- episodes/delays-refresher.Rmd | 116 +++++++++++++++++++--------------- 1 file changed, 66 insertions(+), 50 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 4bbe9dc9..25283531 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -338,7 +338,7 @@ set.seed(99) cases_select <- cases %>% dplyr::slice_sample(n = 30) %>% dplyr::arrange(date_of_onset) %>% - dplyr::mutate(case_id = fct_inorder(case_id)) %>% + dplyr::mutate(case_id = fct_inorder(case_id)) %>% dplyr::mutate(outcome_delay = date_of_outcome - date_of_hospitalisation) %>% dplyr::filter(outcome_delay > 0) %>% dplyr::select( @@ -389,7 +389,7 @@ cases %>% ::::::::::::::::: callout -**Consistency among sequence of dated-events** +**Inconsistencies among sequence of dated-events?** Wait! Is is consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death? @@ -454,7 +454,7 @@ More on estimating a _delay-adjusted_ CFR on the episode about Estimating outbre ## Visualize transmission -The first question we want to know is simply: how bad is it?. The first step of the analysis is descriptive - we want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset. +The first question we want to know is simply: how bad is it? The first step of the analysis is descriptive - we want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset. From the `cases` object we will use: @@ -496,21 +496,7 @@ cases %>% ::::::::::: -You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the threshold date you choose, as follows: - -```{r} -cases %>% - dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% - dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% - dplyr::mutate(reporting_delay_num = as.numeric(reporting_delay)) %>% - skimr::skim(reporting_delay_num) -``` - - - - - -Given that the date of hospitalization means the date of report, we can calculate the **reporting delay** from this line list data. +You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the **reporting delay** from this line list data: From the `cases` object we will use: @@ -528,29 +514,19 @@ cases %>% geom_histogram(binwidth = 1) ``` - - - - + +The distribution of the reporting delay in day units is heavily skewed. Symptomatic cases can take almost **two weeks** to be reported. - +From cases reported today, we completed the exponential growth trend of incidence cases within the last two weeks: ```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} cases %>% @@ -558,7 +534,7 @@ cases %>% delayed = dplyr::case_when( # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 5) ~ # "5 weeks before", - # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 4) ~ + # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 3) ~ # "4 weeks before", date_of_hospitalisation < max(date_of_hospitalisation) - (7 * 2) ~ "2 week before", @@ -570,21 +546,21 @@ cases %>% ) %>% ggplot(aes(date_of_onset, fill = delayed)) + geom_histogram(binwidth = 7) + - labs(fill = "Observed cases") + labs(fill = "Reported cases") ``` :::::::::::::: challenge Report: -- What transmission indicator can we estimate from the incidence curve? +- What indicator can we use to estimate transmission from the incidence curve? ::::::::: solution - The growth rate! by fitting a linear model. -- The reproduction number +- The reproduction number accounting for delays from secondary observations to infection. -More on that on episodes about quantifying transmission. +More on this topic on episodes about **Aggregate and visualize** and **Quantifying transmission**. ```{r,eval=TRUE,echo=FALSE} dat <- cases %>% @@ -596,7 +572,7 @@ dat <- cases %>% fitted <- dat %>% # truncate curve to fit withou delays - filter(date_index% + filter(date_index < grates::as_isoweek(ymd(20140625))) %>% nest() %>% mutate( model = lapply( @@ -622,7 +598,7 @@ intervals <- )) %>% unnest(result) -plot(dat, angle = 45) + +plot(dat) + ggplot2::geom_line( ggplot2::aes(date_index, y = pred), data = intervals, @@ -637,18 +613,19 @@ plot(dat, angle = 45) + ) ``` +```{r,eval=TRUE,echo=FALSE} +fitted %>% + mutate(fit_tidy = map(.x = model, .f = broom::tidy)) %>% + unnest(fit_tidy) %>% + select(-data, -model) +``` + ::::::::: :::::::::::::: -In order to account for these time delays when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**! - -::::::::::: challenge - -- What is the name of the delay from infection to symptom onset? - -::::::::::: +Lastly, in order to account for these time delays when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**! ## Fit a probability distribution to delays @@ -814,6 +791,45 @@ In the next tutorial episodes, we will: #' expand the number of pre-days to include more backward contacts ``` +:::::::::::::::::::::::: challenge + + + +**Relevant delays when estimating transmission** + +- Review the definition of the [incubation period](reference.md#incubation) in our glossary page. + +- Calculate the summary statistics of the incubation period distribution observed in the line list data. + +- Visualize the distribution of the incubation period distribution observed in the line list data. + +::::::::::::: solution + +Calculate the summary statistics: + +```{r} +cases %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + skimr::skim(incubation_period) +``` + +If you want to get the interquartile range (IQR) you can transform the time to numeric adding one step to the pipeline: `dplyr::mutate(incubation_period_num = as.numeric(incubation_period))` + +Visualize the distribution: + +```{r,warning=FALSE,message=FALSE} +cases %>% + dplyr::select(case_id, date_of_infection, date_of_onset) %>% + dplyr::mutate(incubation_period = date_of_onset - date_of_infection) %>% + ggplot(aes(x = incubation_period)) + + geom_histogram(binwidth = 1) +``` + +::::::::::::: + +:::::::::::::::::::::::::: + ::::::::::::::: challenge Let's create **reproducible examples (`reprex`)**. A reprex help us to communicate our coding problems with software developers. Explore this Applied Epi entry: From 16dcc7134803cbba5dc2d979e7607809b7f28fdf Mon Sep 17 00:00:00 2001 From: Andree Valle Campos Date: Thu, 7 Nov 2024 23:06:47 +0000 Subject: [PATCH 25/52] complete fit section --- episodes/delays-refresher.Rmd | 234 ++++++++++++++++------------------ 1 file changed, 113 insertions(+), 121 deletions(-) diff --git a/episodes/delays-refresher.Rmd b/episodes/delays-refresher.Rmd index 25283531..1cc21b43 100644 --- a/episodes/delays-refresher.Rmd +++ b/episodes/delays-refresher.Rmd @@ -323,7 +323,7 @@ Due to **right-censoring bias**, if we include observations with unknown final s :::::::::::: -Data today will not include outcomes from patients that are still hospitalised. Then, one relevant question to ask is: In average, how much time it would take to know the outcomes of those cases? For this we can calculate **delays**! +Data today will not include outcomes from patients that are still hospitalised. Then, one relevant question to ask is: In average, how much time it would take to know the outcomes of hospitalised cases? For this we can calculate **delays**! ## Calculate delays @@ -380,6 +380,7 @@ From the `cases` object we will use: - `skimr::skim()` to get useful summary statistics ```{r} +# delay from report to outcome cases %>% dplyr::select(case_id, date_of_hospitalisation, date_of_outcome) %>% dplyr::mutate(outcome_delay = date_of_outcome - date_of_hospitalisation) %>% @@ -391,15 +392,15 @@ cases %>% **Inconsistencies among sequence of dated-events?** -Wait! Is is consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death? +Wait! Is it consistent to have negative time delays from primary to secondary observations, i.e., from hospitalisation to death? -In the next episode called **Clean data** we will learn how to check sequence of dated-events and more inconsistencies! +In the next episode called **Clean data** we will learn how to check sequence of dated-events and other frequent and challenging inconsistencies! ::::::::::::::::: ::::::::::::::::: challenge -To calculate a _delay-adjusted_ CFR, we need to assume a known the delay from onset to death. +To calculate a _delay-adjusted_ CFR, we need to assume a known delay from onset to death. Using the `cases` object: @@ -410,8 +411,10 @@ Using the `cases` object: Keep the rows that match a condition like `outcome == "Death"`: ```{r,eval=FALSE,echo=TRUE} +# delay from onset to death cases %>% - dplyr::filter(outcome == "Death") + dplyr::filter(outcome == "Death") %>% + ...() # replace ... with downstream code ``` Is it consistent to have negative delays from onset of symptoms to death? @@ -421,6 +424,7 @@ Is it consistent to have negative delays from onset of symptoms to death? ::::::::::::: solution ```{r,warning=FALSE,message=FALSE} +# delay from onset to death cases %>% dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% dplyr::filter(outcome == "Death") %>% @@ -429,7 +433,7 @@ cases %>% skimr::skim(delay_onset_death) ``` -Now, let's say you want to keep the rows with negative delay values to inspect wherethe source of the inconsistency. How would you do it? +Where is the source of the inconsistency? Let's say you want to keep the rows with negative delay values to investigate them. How would you do it? ::::::::::::: @@ -438,6 +442,7 @@ Now, let's say you want to keep the rows with negative delay values to inspect w We can use `dplyr::filter()` again to identify the inconsistent observations: ```{r} +# keep negative delays cases %>% dplyr::select(case_id, date_of_onset, date_of_outcome, outcome) %>% dplyr::filter(outcome == "Death") %>% @@ -445,7 +450,7 @@ cases %>% dplyr::filter(delay_onset_death < 0) ``` -More on estimating a _delay-adjusted_ CFR on the episode about Estimating outbreak severity. +More on estimating a _delay-adjusted_ CFR on the episode about **Estimating outbreak severity**! :::::::::::: @@ -454,7 +459,7 @@ More on estimating a _delay-adjusted_ CFR on the episode about Estimating outbre ## Visualize transmission -The first question we want to know is simply: how bad is it? The first step of the analysis is descriptive - we want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset. +The first question we want to know is simply: how bad is it? The first step of the analysis is descriptive. We want to draw an epidemic curve or epicurve. This visualises the incidence over time by date of symptom onset. From the `cases` object we will use: @@ -498,15 +503,8 @@ cases %>% You may want to examine how long after onset of symptoms cases are hospitalised; this may inform the **reporting delay** from this line list data: -From the `cases` object we will use: - -- `dplyr::select()` to keep columns using their names, -- `dplyr::mutate()` to create one new column `reporting_delay` as a function of existing variables `date_of_hospitalisation` and `date_of_onset`, -- `ggplot()` to declare the input data frame for a graphic, -- `aes()` to describe how the variable `reporting_delay` will be mapped to visual properties (aesthetics) of `geoms`, -- `geom_histogram()` to visualise the distribution of a single continuous variable `reporting_delay` by dividing the x axis into `bins` and counting the number of observations in each `bin`, with `binwidth` equal to 1 day. - ```{r,warning=FALSE,message=FALSE} +# reporting delay cases %>% dplyr::select(case_id, date_of_onset, date_of_hospitalisation) %>% dplyr::mutate(reporting_delay = date_of_hospitalisation - date_of_onset) %>% @@ -514,31 +512,17 @@ cases %>% geom_histogram(binwidth = 1) ``` - - -The distribution of the reporting delay in day units is heavily skewed. Symptomatic cases can take almost **two weeks** to be reported. +The distribution of the reporting delay in day units is heavily skewed. Symptomatic cases may take up to **two weeks** to be reported. -From cases reported today, we completed the exponential growth trend of incidence cases within the last two weeks: +From reports (hospitalisations) in the most recent two weeks, we completed the exponential growth trend of incidence cases within the last four weeks: ```{r,eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} cases %>% dplyr::mutate( delayed = dplyr::case_when( - # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 5) ~ - # "5 weeks before", - # date_of_hospitalisation < max(date_of_hospitalisation)-(7 * 3) ~ - # "4 weeks before", date_of_hospitalisation < max(date_of_hospitalisation) - (7 * 2) ~ - "2 week before", - TRUE ~ "Today" + "Two weeks ago", + TRUE ~ "Most recent two weeks" ) ) %>% mutate( @@ -549,6 +533,8 @@ cases %>% labs(fill = "Reported cases") ``` +Given to reporting delays during this outbreak, two weeks ago it seemed that we had a three-week decay of cases. We needed to wait a couple of weeks to complete the incidence of cases on each week. + :::::::::::::: challenge Report: @@ -625,7 +611,7 @@ fitted %>% :::::::::::::: -Lastly, in order to account for these time delays when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**! +Lastly, in order to account for these _epidemiological delays_ when estimating indicators of severity or transmission, in our analysis we need to input delays as **Probability Distributions**! ## Fit a probability distribution to delays @@ -640,22 +626,55 @@ Available at: ::::::::::::::::::: +We fit a probability distribution to data (like delays) to make inferences about it. These inferences can be useful for Public health interventions and decision making. For example, from the [incubation period](reference.md#incubation) distribution we can inform the length of active monitoring or quarantine by inferring the time by which 99% of infected individuals are expected to show symptoms. + +:::::::::::: checklist + +### Functions for the Normal distribution + +If you need it, read in detail about the [R probability functions for the normal distribution](https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc), each of its definitions and identify in which part of a distribution they are located! +![The four probability functions for the normal distribution ([Jack Weiss, 2012](https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc))](fig/fig5a-normaldistribution.png) + +:::::::::::::::::::: + +If you look at `?stats::Distributions`, each type of distribution has a unique set of functions and different parameters. To relate Distribution functions and its parameters we suggest to explore a shinyapp called **The Distribution Zoo**: + +From the `cases` object we can use: + +- `dplyr::mutate()` to transform the `reporting_delay` class object from `