-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-model-description.Rmd
1050 lines (774 loc) · 77.6 KB
/
04-model-description.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Model description
Here we describe the methods of MOSAIC beta version 0.1. This model version provides a starting point for understanding cholera transmission in Sub-Saharan Africa, incorporating important drivers of disease dynamics such as human mobility, environmental conditions, and vaccination schedules. As MOSAIC continues to evolve, future iterations will refine model components based on available data and improved model mechanisms, which we hope will increase its applicability to real-world scenarios.
The model operates on weekly time steps from January 2023 to August 2024 and includes 41 countries in Sub-Saharan Africa (SSA), see Figure \@ref(fig:map).
```{r map, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', out.width="100%", fig.cap="A map of Sub-Saharan Africa with countries that have experienced a cholera outbreak in the past 5 and 10 years highlighted in green."}
knitr::include_graphics("figures/africa_map.png")
```
## Transmission dynamics
The model has a metapopulation structure with familiar compartments for Susceptible, Infected, and Recovered individuals with SIRS dynamics. The model also contains compartments for vaccinated individuals (V) and Water & environment based transmission (W) which we refer to as SVIWRS.
```{r diagram, echo=FALSE, fig.align='center', out.width="95%", fig.cap="This diagram of the SVIWRS (Susceptible-Vaccinated-Infected-Water/environmental-Recovered-Susceptible) model shows model compartments as circles with rate parameters displayed. The primary data sources the model is fit to are shown as square nodes (Vaccination data, and reported cases and deaths)."}
knitr::include_graphics("diagrams/v_0_1.drawio.png")
```
The SVIWRS metapopulation model, shown in Figure \@ref(fig:diagram), is governed by the following difference equations:
\begin{equation}
\begin{aligned}
S_{j,t+1} &= b_j N_{jt} + S_{jt} - \phi \nu_{jt} S_{jt} + \omega V_{jt} - \Lambda_{j,t+1} - \Psi_{j,t+1} + \varepsilon R_{jt} - d_j S_{jt}\\[11pt]
V_{j,t+1} &= V_{jt} + \phi \nu_{jt} S_{jt} - \omega V_{jt} - d_j V_{jt}\\[11pt]
I_{j,t+1} &= I_{jt} + \Lambda_{j,t+1} + \Psi_{j,t+1} - \gamma I_{jt} - \mu \sigma I_{jt} - d_j I_{jt}\\[11pt]
W_{j,t+1} &= W_{jt} + \zeta I_{jt} - \delta_{jt} W_{jt}\\[11pt]
R_{j,t+1} &= R_{jt} + \gamma I_{jt} - \varepsilon R_{jt} - d_j R_{jt}\\[11pt]
\end{aligned}
(\#eq:system)
\end{equation}
For descriptions of all parameters in Equation \@ref(eq:system), see Table (\@ref(tab:params)). Transmission dynamics are driven by the two force of infection terms, $\Lambda_{jt}$ and $\Psi_{jt}$. The force of infection due to human-to-human ($\Lambda_{jt}$) is:
\begin{equation}
\begin{aligned}
\Lambda_{j,t+1} &= \frac{
\beta_{jt}^{\text{hum}} \Big(\big(S_{jt}(1-\tau_{j})\big) \big(I_{jt}(1-\tau_{j}) + \sum_{\forall i \not= j} (\pi_{ij}\tau_iI_{it}) \big)\Big)^\alpha}{N_{jt}}.\\[11pt]
\end{aligned}
(\#eq:foi-human)
\end{equation}
Where $\beta_{jt}^{\text{hum}}$ is the rate of human-to-human transmission. Movement within and among metapopulations is governed by $\tau_i$, the probability of departing origin location $i$, and $\pi_{ij}$ is the relative probability of travel from origin $i$ to destination $j$ (see section on [spatial dynamics][Spatial dynamics]). To include environmental effects, the force of infection due to environment-to-human transmission ($\Psi_{jt}$) is defined as:
\begin{equation}
\begin{aligned}
\Psi_{j,t+1} &= \frac{\beta_{jt}^{\text{env}} \big(S_{jt}(1-\tau_{j})\big) (1-\theta_j) W_{jt}}{\kappa+W_{jt}},\\[11pt]
\end{aligned}
(\#eq:foi-environment)
\end{equation}
where $\beta_{jt}^{\text{env}}$ is the rate of environment-to-human transmission and $\theta_j$ is the proportion of the population at location $j$ that at least basic access to Water, Sanitation, and Hygiene (WASH). The environmental compartment of the model is also scaled by the concentration (cells per mL) of *V. cholerae* that is required for a 50% probability of infection [Fung 2014](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926264/). See the section on [environmental transmission][Environmental transmission] below for more on the water/environment compartment and climatic drivers of transmission.
Note that all model processes are stochastic. Transition rates are converted to probabilities with the commonly used formula $p(t) = 1-e^{-rt}$ (see Ross 2007), and then integer quantities are moved between model compartments at each time step according to a binomial process like the example below for the recovery of infected individuals ($\gamma I_{jt}$):
```{r, eval=FALSE, include=FALSE}
#\Lambda_{j,t+1} \sim \text{Binom}\bigg((1-\tau_{j})S_{jt},\ 1 - \exp\big( -\beta_{jt}^{\text{hum}} \times \big((I_{jt} - \tau_{j}I_{jt} ) + \sum_{\forall i \not= j} (\pi_{ij}\tau_iI_{it}) \big)^\alpha\bigg)\\[11pt]
```
\begin{equation}
\frac{\partial R}{\partial t} \sim \text{Binom}(I_{jt}, 1-\exp(-\gamma))
(\#eq:stoch)
\end{equation}
## Seasonality
Cholera transmission is seasonal and is typically associated with the rainy season, so both transmission rate terms $\beta_{jt}^{\text{*}}$ are temporally forced. For human-to-human transmission we used a truncated sine-cosine form of the [Fourier series](https://en.wikipedia.org/wiki/Fourier_series) with two harmonic features which has the flexibility to capture seasonal transmission dynamics driven by extended rainy seasons and/or biannual trends:
\begin{equation}
\beta_{jt}^{\text{hum}} = \beta_{j0}^{\text{hum}} + a_1 \cos\left(\frac{2\pi t}{p}\right) + b_1 \sin\left(\frac{2\pi t}{p}\right) + a_2 \cos\left(\frac{4\pi t}{p}\right) + b_2 \sin\left(\frac{4\pi t}{p}\right)
(\#eq:beta1)
\end{equation}
Where, $\beta_{j0}^{\text{hum}}$ is the mean human-to-human transmission rate at location $j$ over all time steps. Seasonal dynamics are determined by the parameters $a_1$, $b_1$ and $a_2$, $b_2$ which gives the amplitude of the first and second waves respectively. The periodic cycle $p$ is 52, so the function controls the temporal variation in $\beta_{jt}^{\text{hum}}$ over the 52 weeks of the year.
```{r seasonal-concept, echo=FALSE, warning=FALSE, cache=TRUE, fig.align='center', out.width="85%", fig.cap="An example of the temporal distribution of the human-to-human transmission rate across each of the 52 weeks of the year given by the cosine wave function. The wave function is fitted to each country and is designed to align with the rainy season as indicated by the shaded region in this figure."}
knitr::include_graphics("figures/seasonal_transmission_concept.png")
```
We estimated the parameters in the Fourier series ($a_1$, $b_1$, $a_2$, $b_2$) using the [Levenberg–Marquardt](https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm) algorithm in the [`minpack.lm`](https://rdrr.io/cran/minpack.lm/) R library. Given the lack of reported cholera case data for many countries in SSA and the association between cholera transmission and the rainy season, we leveraged seasonal precipitation data to help fit the Fourier wave function to all countries. We first gathered weekly precipitation values from 1994 to 2024 for 30 uniformly distributed points within each country from the [Open-Meteo Historical Weather Data API](https://open-meteo.com/en/docs/historical-weather-api). Then we fit the Fourier series to the weekly precipitation data and used these parameters as the starting values when fitting the model to the more sparse cholera case data.
```{r seasonal-example, echo=FALSE, warning=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="Example of a grid of 30 uniformly distributed points within Mozambique (A). The scatterplot shows weekly summed precipitation values at those 30 grid points and cholera cases plotted on the same scale of the Z-Score which shows the variance around the mean in terms of the standard deviation. Fitted Fourier series fucntions are shown as blue (fit precipitation data) and red (fit to cholera case data) lines."}
knitr::include_graphics("figures/seasonal_transmission_example_MOZ.png")
```
For countries with no reported case data, we inferred seasonal dynamics using the fitted wave function of a neighboring country with available case data. The selected neighbor was chosen from the same cluster of countries (grouped hierarchically into four clusters based on precipitation seasonality using [Ward's method](https://en.wikipedia.org/wiki/Ward%27s_method); see Figure \@ref(fig:seasonal-cluster)) that had the highest correlation in seasonal precipitation with the country lacking case data. In the rare event that no country with reported case data was found within the same seasonal cluster, we expanded the search to the 10 nearest neighbors and continued expanding by adding the next nearest neighbor until a match was found.
```{r seasonal-cluster, echo=FALSE, warning=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="A) Map showing the clustering of African countries based on their seasonal precipitation patterns (2014-2024). Countries are colored according to their cluster assignments, identified using hierarchical clustering. B) Fourier series fitted to weekly precipitation for each country. Each line plot shows the seasonal pattern for countries within a given cluster. Clusteres are used to infer the seasonal transmission dynamics for countries where there are no reported cholera cases."}
knitr::include_graphics("figures/seasonal_precip_ward.D2_cluster.png")
```
Using the model fitting methods described above, and the cluster-based approach for inferring the seasonal Fourier series pattern in countries without reported cholera cases, we modeled the seasonal dynamics for all 41 countries in the MOSAIC framework. These dynamics are visualized in Figure \@ref(fig:seasonal-all), with the corresponding Fourier model coefficients presented in Table \@ref(tab:seasonal-table).
```{r seasonal-all, echo=FALSE, warning=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="Seasonal transmission patterns for all countries modeled in MOSAIC as modeled by the truncated Fourier series in Equation \\@ref(eq:beta1). Blues lines give the Fourier series model fits for precipitation (1994-2024) and the red lines give models fits to reported cholera cases (2023-2024). For countries where reported case data were not available, the Fourier model was inferred by the nearest country with the most similar seasonal precipitation patterns as determined by the hierarchical clustering. Countries with inferred case data from neighboring locations are annotated in red. The X-axis represents the weeks of the year (1-52), while the Y-axis shows the Z-score of weekly precipitation and cholera cases."}
knitr::include_graphics("figures/seasonal_transmission_all.png")
```
```{r seasonal-table, echo=FALSE, message=FALSE, warning=FALSE, tab.cap="Estimated coefficients for the truncated Fourier model in Equation \\@ref(eq:beta1) fit to countries with reported cholera cases. Model fits are shown in Figure \\@ref(fig:seasonal-all)."}
library(readr)
library(knitr)
library(kableExtra)
library(dplyr)
library(tidyr)
combined_param_values <- read_csv("./tables/combined_param_values.csv")
combined_param_values$country_name[combined_param_values$country_name == "Democratic Republic of Congo"] <- "DRC"
# Filter the data to only include 'cases' response and select relevant parameters
filtered_param_values_cases <- combined_param_values %>%
filter(response == "cases", parameter %in% c("a1", "a2", "b1", "b2"), is.na(inferred_from_neighbor)) %>%
select(country_name, country_iso_code, parameter, mean, ci_lo, ci_hi)
# Spread the parameters to create columns for a1, a2, b1, b2
table_data_cases <- filtered_param_values_cases %>%
mutate(ci = paste0(round(mean, 2), " (", round(ci_lo, 2), " to ", round(ci_hi, 2), ")")) %>%
select(country_name, parameter, ci) %>%
spread(key = parameter, value = ci) %>%
rename(a1 = a1, a2 = a2, b1 = b1, b2 = b2)
# Display the data in the desired column order
final_table_data_cases <- table_data_cases %>%
select(country_name, a1, a2, b1, b2)
# Display the final data as a kable table with reduced text size, math fonts for parameters, and row shading
kable(final_table_data_cases,
col.names = c("Country", "$a_1$", "$a_2$", "$b_1$", "$b_2$"),
escape = FALSE) %>%
kable_styling(font_size = 11.75, latex_options = "hold_position", full_width = FALSE) %>%
add_header_above(c(" ", "Fourier Coefficients" = 4))
```
## Environmental transmission
Environmental transmission is a critical factor in cholera spread and consists of several key components: the rate at which infected individuals shed *V. cholerae* into the environment, the pathogen's survival rate in environmental conditions, and the overall suitability of the environment for sustaining the bacteria over time.
### Climate-driven transmission
To capture the impacts of climate-drivers on cholera transmission, we have included the parameter $\psi_{jt}$, which represents the current state of environmental suitability with respect to: *i*) the survival time of *V. cholerae* in the environment and, *ii*) the rate of environment-to-human transmission which contributes to the overall force of infection.
\begin{equation}
\beta_{jt}^{\text{env}} = \beta_{j0}^{\text{env}} \Bigg(1 + \frac{\psi_{jt}-\bar\psi_j}{\bar\psi_j} \Bigg) \quad \text{and} \quad \bar\psi_j = \frac{1}{T} \sum_{t=1}^{T} \psi_{jt}
(\#eq:beta2)
\end{equation}
This formulation effectively scales the base environmental transmission rate $\beta_{jt}^{\text{env}}$ so that it varies over time according to the climatically driven model of suitability. Note that, unlike the the cosine wave function of $\beta_{jt}^{\text{hum}}$, this temporal term can increase or decrease over time following multi-annual cycles.
Environmental suitability ($\psi_{jt}$) also impacts the survival rate of *V. cholerae* in the environment ($\delta_{jt}$) with the form:
\begin{equation}
\delta_{jt} = \delta_{\text{min}} + \psi_{jt} \times (\delta_{\text{max}} - \delta_{\text{min}})
(\#eq:delta)
\end{equation}
which normalizes the variance of the suitability parameter to be bounded within the minimum ($\delta_{\text{min}}$) and maximum ($\delta_{\text{max}}$) survival times of *V. cholerae*.
```{r, echo=FALSE, cache=TRUE, fig.align='center', out.width="80%", fig.cap="Relationship between environmental suitability ($\\psi_{jt}$) and the rate of *V. cholerae* decay in the environment ($\\delta_j$). The green line shows the mildest penalty on *V. cholerae* survival, where survival in the environment is $1/\\delta_{\\text{min}}$ = 3 days when suitability = 0 and $1/\\delta_{\\text{max}}$ = 90 days when suitability = 1."}
knitr::include_graphics("figures/shedding_rate.png")
```
### Modeling environmental suitability
#### Environmental data
The mechanism for environment-to-human transmission (Equation \@ref(eq:beta2)) and rate of decay of *V. cholerae* in the environment (Equation \@ref(eq:delta)) is driven by the parameter $\psi_{jt}$, which we refer to as environmental suitability. The parameter $\psi_{jt}$ is modeled as a time series for each location using a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) model and a suite of 24 covariates which include 19 historical and forecasted climate variables under the [MRI-AGCM3-2-S](https://www.wdc-climate.de/ui/cmip6?input=CMIP6.HighResMIP.MRI.MRI-AGCM3-2-S.highresSST-present) climate model. Covariates also include 4 large-scale climate drivers such as the Indian Ocean Dipole Mode Index (DMI), and the El Niño Southern Oscillation (ENSO) from 3 different Pacific Ocean regions. We also included a location specific variable giving the mean elevation for each country. See example time series of climate variables from one country (Mozambique) in Figure \@ref(fig:climate-data-moz) and DMI and ENSO variables in Figure \@ref(fig:climate-data-enso). A list of all covariates and their sources can be seen in Table \@ref(tab:climate-data-variables).
Note that while the 19 climate variables offer forecasts up to 2030 and beyond, the forecasts of the DMI and ENSO variables are limited to 5 months into the future. So, environmental suitability model predictions are currently limited to a 5 month time horizon but future iterations may allow for longer forecasts. Additional data sources will be integrated into subsequent versions of the suitability model. For instance, flood and cyclone data will likely be incorporated later, though not in the initial version of the model.
```{r climate-data-moz, echo=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="Climate data acquired from the OpenMeteo data API. Data were collected from 30 uniformly distributed points across each country and then aggregated to give weekly values of 17 climate variable from 1970 to 2030."}
knitr::include_graphics("figures/climate_data_MOZ_weekly.png")
```
```{r climate-data-enso, echo=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="Historical and forecasted values of the Indian Ocean Dipole Mode Index (DMI) and the El Niño Southern Oscillation (ENSO) from 2015 to 2025. The ENSO values come from three different regions: Niño3 (central to eastern Pacific), Niño3.4 (central Pacific), and Niño4 (western-central Pacifi). Data are from National Oceanic and Atmospheric Administration (NOAA) and Bureau of Meteorology (BOM)."}
knitr::include_graphics("figures/climate_data_ENSO_weekly.png")
```
```{r climate-data-variables, echo=FALSE, message=FALSE, warning=FALSE}
library(knitr)
covariates <- c(
"temperature_2m_mean", "temperature_2m_max", "temperature_2m_min",
"wind_speed_10m_mean", "wind_speed_10m_max", "cloud_cover_mean",
"shortwave_radiation_sum", "relative_humidity_2m_mean",
"relative_humidity_2m_max", "relative_humidity_2m_min",
"dew_point_2m_mean", "dew_point_2m_min", "dew_point_2m_max",
"precipitation_sum", "pressure_msl_mean",
"soil_moisture_0_to_10cm_mean", "et0_fao_evapotranspiration_sum",
"DMI", "ENSO3", "ENSO34", "ENSO4", "elevation"
)
# Define descriptions for each covariate
descriptions <- c(
"Average temperature at 2 meters",
"Maximum temperature at 2 meters",
"Minimum temperature at 2 meters",
"Average wind speed at 10 meters",
"Maximum wind speed at 10 meters",
"Mean cloud cover",
"Total shortwave radiation",
"Mean relative humidity at 2 meters",
"Maximum relative humidity at 2 meters",
"Minimum relative humidity at 2 meters",
"Mean dew point at 2 meters",
"Minimum dew point at 2 meters",
"Maximum dew point at 2 meters",
"Total precipitation",
"Mean sea level pressure",
"Mean soil moisture at 0 to 10 cm",
"Total evapotranspiration (FAO method)",
"Dipole Mode Index (DMI)",
"El Niño Southern Oscillation (ENSO) - Region 3",
"ENSO - Region 3.4",
"ENSO - Region 4",
"Mean elevation"
)
# Define sources for each covariate with manual links
source_links <- c(
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # temperature_2m_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # temperature_2m_max
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # temperature_2m_min
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # wind_speed_10m_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # wind_speed_10m_max
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # cloud_cover_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # shortwave_radiation_sum
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # relative_humidity_2m_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # relative_humidity_2m_max
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # relative_humidity_2m_min
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # dew_point_2m_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # dew_point_2m_min
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # dew_point_2m_max
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # precipitation_sum
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # pressure_msl_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # soil_moisture_0_to_10cm_mean
"OpenMeteo [Historical Weather](https://open-meteo.com/en/docs/historical-weather-api) and [Climate Change](https://open-meteo.com/en/docs/climate-api) APIs", # et0_fao_evapotranspiration_sum
"[NOAA](https://psl.noaa.gov/enso/) and [BOM](http://www.bom.gov.au/climate/ocean/outlooks/#region=NINO4®ion=NINO3®ion=NINO34)", # DMI
"[NOAA](https://psl.noaa.gov/enso/) and [BOM](http://www.bom.gov.au/climate/ocean/outlooks/#region=NINO4®ion=NINO3®ion=NINO34)", # ENSO3
"[NOAA](https://psl.noaa.gov/enso/) and [BOM](http://www.bom.gov.au/climate/ocean/outlooks/#region=NINO4®ion=NINO3®ion=NINO34)", # ENSO34
"[NOAA](https://psl.noaa.gov/enso/) and [BOM](http://www.bom.gov.au/climate/ocean/outlooks/#region=NINO4®ion=NINO3®ion=NINO34)", # ENSO4
"[Amazon Web Services Terrain Tiles](https://registry.opendata.aws/terrain-tiles/)" # elevation
)
# Create a data frame with covariates, descriptions, and sources with links
covariate_table <- data.frame(
Covariate = covariates,
Description = descriptions,
Source = source_links,
stringsAsFactors = FALSE
)
# Create the table using knitr::kable()
kable(covariate_table,
escape = TRUE,
col.names = c("Covariate", "Description", "Source"),
caption = "A full list of covariates and their sources used in the LSTM RNN model to predict the environmental suitability of *V. cholerae* ($\\psi_{jt}$).")
```
#### Deep learning neural network model
As mentioned above, we model environmental suitability $\psi_{jt}$ using a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) model. The LSTM model was developed using [`keras`](https://cran.r-project.org/package=keras) and [`tensorflow`](https://cran.r-project.org/package=tensorflow) in R to predict binary outcomes. Thus the modeled quantity $\psi_{jt}$ is a proportion implying unsuitable conditions at 0 and perfectly suitable conditions at 1.
The model was fitted to reported case counts that were converted to a binary variable using a threshold of 200 reported cases per week. Given delays in reporting and likely lead times for environmental suitability ahead of transmission and case reporting, we also set the preceding one week to be suitable and in cases where there were two consecutive weeks of >200 cases per week, we assumed that the preceding two weeks were also suitable. See Figure \@ref(fig:cases-binary) for an example of how reported case counts are converted to a binary variable representing presumed environmental suitability for *V. cholerae*.
```{r cases-binary, echo=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="Reported cases converted to binary variable for modeling environmental suitability."}
knitr::include_graphics("figures/cases_binary.png")
```
The model is a Long Short-Term Memory (LSTM) neural network designed for binary classification, where environmental suitability, $\psi_{jt}$, is modeled as a function of the hidden state $h_t$ and hidden bias term $b_h$. Specifically, $\psi_{jt}$ is defined by a sigmoid activation function applied to the linear combination of the hidden state $h_t$ and the bias $b_h$ which if given by the 3 layers of the LSTM model:
\begin{equation}
\psi_{jt} \sim \text{Sigmoid}(w_h \cdot h_t + b_h)
(\#eq:psi)
\end{equation}
\begin{equation}
h_t = \text{LSTM}\big(\text{temperature}_{jt}, \ \text{precipitation}_{jt}, \ \text{ENSO}_{t}, \dots \big)
\end{equation}
In this formulation, $h_t$ represents the hidden state generated by the LSTM network based on input variables such as temperature, precipitation, and ENSO conditions, while $b_h$ is a bias term added to the output of the hidden state transformation.
The deep learning LSTM model consists of three stacked LSTM-RNN layers. The first LSTM layer has 500 units and the second and third LSTM layers have 250 and 100 units respectively. The architecture the LSTM model is configured to pass node values to subsequent LSTM layers allowing deep learning of more the complex interactions among the climate variable over time. We enforced model sparsity for each LSTM layer using L2 regularization (penalty = 0.001) and used a dropout rate of 0.5 for each LSTM layer to further prevent overfitting on the limited amount of data. The final output layer was a dense layer with a single unit and a sigmoid activation function to produce a probability value for binary classification, i.e. a prediction of environmental suitability $\psi_{jt}$ on a scale of 0 to 1.
To fit the LSTM model to data, we modified the learning rate by applying an exponential decay schedule that started at 0.001 and decayed by a factor of 0.9 every 10,000 steps to enable smoother convergence. The model was compiled using the Adam optimizer with this learning rate schedule, along with binary cross-entropy as the loss function and accuracy as the evaluation metric. The model was trained for a maximum of 200 epochs with a batch size of 1024. We allowed model fitting to stop early with a patience parameter of 10 which halts training if no improvement is observed in validation accuracy for 10 consecutive epochs. To train the model we set aside 20% of the observed data for validation and also used 20% of the training data for model fitting. The training history, including loss and accuracy, was monitored over the course of training and gave a final test accuracy of 0.73 and a final test loss of 0.56 (see Figure \@ref(fig:lstm-model-fit)).
```{r lstm-model-fit, echo=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="Model performance on training and validation data."}
knitr::include_graphics("figures/suitability_LSTM_fit.png")
```
After model training was completed, we predicted the values of environmental suitability $\psi_{jt}$ across all time steps for each location. Predictions start in January 1970 and go up to 5 months past the present date (currently February 2025). Given the amount of noise in the model predictions, we added a simple LOESS spline with logit transformation to smooth model predictions over time and give a more stable value of $\psi_{jt}$ when incorporating it into other model features (e.g. Equations \@ref(eq:beta2) and \@ref(eq:delta)). The resulting model predictions are shown for an example country such as Mozambique in Figure \@ref(fig:psi-prediction-data) which compares model predictions to the original case counts and the binary classification. Predicitons for all model locations are shown in a simplified view in Figure \@ref(fig:psi-prediction-countries).
*Also, please note that this initial version of the model is fitted to a rather small amount of data. Model hyper parameters were specifically chosen to reduce overfitting. Therefore, we recommend to not over-interpret the time series predictions of the model at this early stage since they are likely to change and improve as more historical incidence data is included in future versions.*
```{r psi-prediction-data, echo=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="The LSTM model predictions over time and reported cases for an example country such as Mozambique. Reported cases are shown in the top panel and tje shaded areas show the binary classification used to characterize environmental suitability. Raw model predicitons are shown in the transparent brown line with the solid black line showing the LOESS smoothing. Forecasted values beyond the current time point are shown in orange and are limited to 5 month time horizon."}
knitr::include_graphics("figures/suitability_cases_MOZ.png")
```
```{r psi-prediction-countries, echo=FALSE, cache=TRUE, fig.align='center', out.width="100%", fig.cap="The smoothed LSTM model predictions (lines) and binary suitability classification (shaded areas) over time for all countries in the MOSAIC framework. Orange lines show forecasts beyond the current date. With ENSO and DMI covariates included in the model, forecasts are limited to 5 months."}
knitr::include_graphics("figures/suitability_by_country.png")
```
### Shedding
The rate at which infected individuals shed *V. cholerae* into the environment ($\zeta$) is a critical factor influencing cholera transmission. Shedding rates can vary widely depending on the severity of the infection, the immune response of the individual, and environmental factors. According to [Fung 2014](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926264/), the shedding rate is estimated to range from 0.01 to 10 cells per mL per person per day.
Further studies support these findings, indicating that shedding rates can indeed fluctuate significantly. For instance, [Nelson et al (2009)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842031/) note that during the, depending on the phase of infection, individuals can shed $10^3$ (asymptomatic cases) to $10^{12}$ (severe cases) *V. cholerae* cells per gram of stool. Future version of the model may attempt to capture the nuances of shedding dynamics, but here we make the simplifying assumption that shedding is constant across infected individuals and has a wide range of variability with no prior distributional assumptions:
$$
\zeta \sim \text{Uniform}(0.01, 10).
$$
### WAter, Sanitation, and Hygiene (WASH)
Since *V. cholerae* is transmitted through fecal contamination of water and other consumables, the level of exposure to contaminated substrates significantly impacts transmission rates. Interventions involving Water, Sanitation, and Hygiene (WASH) have long been a first line of defense in reducing cholera transmission, and in this context, WASH variables can serve as proxy for the rate of contact with environmental risk factors. In the MOSAIC model, WASH variables are incorporated mechanistically, allowing for intervention scenarios that include changes to WASH. However, it is necessary to distill available WASH variables into a single parameter that represents the WASH-determined contact rate with contaminated substrates for each location $j$, which we define as $\theta_j$.
To parameterize $\theta_j$, we calculated a weighted mean of the 8 WASH variables in [Sikder et al 2023](https://doi.org/10.1021/acs.est.3c01317) and originally modeled by the [Local Burden of Disease WaSH Collaborators 2020](https://www.thelancet.com/journals/langlo/article/PIIS2214-109X(20)30278-3/fulltext). The 8 WASH variables (listed in Table \@ref(tab:wash-weights)) provide population-weighted measures of the proportion of the population that either: *i*) have access to WASH resources (e.g., piped water, septic or sewer sanitation), or *ii*) are exposed to risk factors (e.g. surface water, open defecation). For risk associated WASH variables, we used the complement ($1-\text{value}$) to give the proportion of the population *not* exposed to each risk factor. We used the [`optim`](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/optim) function in R and the [L-BFGS-B](https://en.wikipedia.org/wiki/Limited-memory_BFGS) algorithm to estimate the set of optimal weights (Table \@ref(tab:wash-weights)) that maximize the correlation between the weighted mean of the 8 WASH variables and reported cholera incidence per 1000 population across 40 SSA countries from 2000 to 2016. The optimal weighted mean had a correlation coefficient of $r =$ -0.33 (-0.51 to -0.09 95% CI) which was higher than the basic mean and all correlations provided by the individual WASH variables (see Figure \@ref(fig:wash-incidence)). The weighted mean then provides a single variable between 0 and 1 that represents the overall proportion of the population that has access to WASH and/or is not exposed to environmental risk factors. Thus, the WASH-mediated contact rate with sources of environmental transmission is represented as ($1-\theta_j$) in the environment-to-human force of infection ($\Psi_{jt}$). Values of $\theta_j$ for all countries are shown in Figure \@ref(fig:wash-country).
```{r wash-weights, echo=FALSE}
library(kableExtra)
df <- read.csv(file.path(getwd(), "tables/WASH_data_weights.csv"), stringsAsFactors = FALSE)
knitr::kable(
df,
col.names = c("WASH variable", "Optimized weight"),
caption = "Table of optimized weights used to calculate the single mean WASH index for all countries."
) %>%
kable_styling(full_width = FALSE,
bootstrap_options = c("striped", "hover", "condensed"))
```
```{r wash-incidence, echo=FALSE, fig.align='center', out.width="100%", fig.cap="Relationship between WASH variables and cholera incidences."}
knitr::include_graphics("figures/wash_incidence_correlation.png")
```
```{r wash-country, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The optimized weighted mean of WASH variables for AFRO countries. Countries labeled in orange denote countries with an imputed weighted mean WASH variable. Imputed values are the weighted mean from the 3 most similar countries."}
knitr::include_graphics("figures/wash_index_by_country.png")
```
## Immune dynamics
Aside from the current number of infections, population susceptibility is one of the key factors influencing the spread of cholera. Further, since immunity from both vaccination and natural infection provides long-lasting protection, it's crucial to quantify not only the incidence of cholera but also the number of past vaccinations. Additionally, we need to estimate how many individuals with immunity remain in the population at any given time step in the model.
To achieve this, we estimate the vaccination rate over time ($\nu_{jt}$) based on historical vaccination campaigns and incorporate a model of vaccine effectiveness ($\phi$) and immune decay post-vaccination ($\omega$) to estimate the current number of individuals with vaccine-derived immunity. We also account for the immune decay rate from natural infection ($\varepsilon$), which is generally considered to last longer than immunity from vaccination.
### Estimating Vaccination Rates
To estimate the past and current vaccination rates, we sourced data on reported OCV vaccinations from the WHO [International Coordinating Group](https://www.who.int/groups/icg) (ICG) [Cholera vaccine dashboard](https://app.powerbi.com/view?r=eyJrIjoiYmFmZTBmM2EtYWM3Mi00NWYwLTg3YjgtN2Q0MjM5ZmE1ZjFkIiwidCI6ImY2MTBjMGI3LWJkMjQtNGIzOS04MTBiLTNkYzI4MGFmYjU5MCIsImMiOjh9). This resource lists all reactive OCV campaigns conducted from 2016 to the present, with approximately 103 million OCV doses shipped to Sub-Saharan African (SSA) countries as of October 9, 2024. However, these data only capture reactive vaccinations in emergency settings and do not include preventive campaigns organized by GAVI and in-country partners.
*As a result, our current estimates of the OCV vaccination rate likely underestimate total OCV coverage. We are working to expand our data sources to better reflect the full number of OCV doses distributed in SSA and will update the results here as soon as these are available.*
To translate the reported number of OCV doses into the model parameter $\nu_{jt}$, we take the number of doses shipped and the reported start date of the vaccination campaign, distributing the doses over subsequent days according to a maximum daily vaccination rate. Therefore, the vaccination rate $\nu_t$ is not an estimated quantity, it is defined by the reported number of OCV doses administered with a assumption about the daily rate of distribution for an OCV campaign:
$$
\nu_{jt} = f\big(\text{reported OCV doses distributed}_{jt} \ | \ \text{daily distribution rate}\big).
$$
See Figure \@ref(fig:vaccination-example) for an example of OCV distribution using a maximum daily vaccination rate of 100,000. The resulting time series for each country is shown in Figure \@ref(fig:vaccination-countries), with current totals based on the WHO ICG data displayed in Figure \@ref(fig:vaccination-maps).
```{r vaccination-example, echo=FALSE, fig.align='center', out.width="100%", fig.cap="Example of the estimated vaccination rate during an OCV campaign."}
knitr::include_graphics("figures/vaccination_example_ZMB.png")
```
```{r vaccination-countries, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The estimated vaccination coverage across all countries with reported vaccination data one the WHO ICG dashboard."}
knitr::include_graphics("figures/vaccination_by_country.png")
```
```{r vaccination-maps, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The total cumulative number of OCV doses distributed through the WHO ICG from 2016 to present day."}
knitr::include_graphics("figures/vaccination_maps.png")
```
### Immunity from vaccination
The impacts of Oral Cholera Vaccine (OCV) campaigns is incorporated into the model through the Vaccinated compartment (V). The rate that individuals are effectively vaccinated is defined as $\phi\nu_t$, where $\nu_t$ is the number of OCV doses administered in location $j$ at time $t$ and $\phi$ is the estimated vaccine effectiveness. The vaccination rate $\nu_{jt}$ is not an estimated quantity. Rather, it is directly defined by the reported number of OCV doses administered as described above. Note that there is just one vaccinated compartment at this time, though future model versions may include $V_1$ an $V_2$ compartments to explore two dose vaccination strategies or to emulate more complex waning patterns.
The evidence for waning immunity comes from 4 cohort studies (Table \@ref(tab:effectiveness-papers)) from Bangladesh ([Qadri et al 2016](https://www.nejm.org/doi/full/10.1056/NEJMoa1510330) and [2018](https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(18)30108-7/fulltext)), South Sudan ([Azman et al 2016](https://www.thelancet.com/journals/langlo/article/PIIS2214-109X(16)30211-X/fulltext)), and Democratic Republic of Congo ([Malembaka et al 2024](https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(23)00742-9/fulltext)).
```{r effectiveness-papers, echo=FALSE}
df <- read.csv(file.path(getwd(), "tables/vaccine_effectiveness_data.csv"), stringsAsFactors = FALSE)
# Display the table using knitr::kable
knitr::kable(
df,
col.names = c("Effectiveness", "Upper CI", "Lower CI", "Day (midpoint)", "Day (min)", "Day (max)", "Source"),
caption = "Summary of Effectiveness Data"
)
```
We estimated vaccine effectiveness and waning immunity by fitting an exponential decay model to the reported effectiveness of one dose OCV in these studies using the following formulation:
\begin{equation}
\text{Proportion immune}\ t \ \text{days after vaccination} = \phi \times (1 - \omega) ^ {t-t_{\text{vaccination}}}
(\#eq:omega)
\end{equation}
Where $\phi$ is the effectiveness of one dose OCV, and the based on this specification, it is also the initial proportion immune directly after vaccination. The decay rate parameter $\omega$ is the rate at which initial vaccine derived immunity decays per day post vaccination, and $t$ and $t_{\text{vaccination}}$ are the time (in days) the function is evaluated at and the time of vaccination respectively. When we fitted the model to the data from the cohort studies shown in Table (\@ref(tab:effectiveness-papers)) we found that $\omega = 0.00057$ ($0-0.0019$ 95% CI), which gives a mean estimate of 4.8 years for vaccine derived immune duration with unreasonably large confidence intervals (1.4 years to infinite immunity). However, the point estimate of 4.8 years is consistent with anecdotes that one dose OCV is effective for up to at least 3 years.
The wide confidence intervals are likely due to the wide range of reported estimates for proportion immune after a short duration in the 7--90 days range ([Azman et al 2016](https://www.thelancet.com/journals/langlo/article/PIIS2214-109X(16)30211-X/fulltext) and [Qadri et al 2016](https://www.nejm.org/doi/full/10.1056/NEJMoa1510330)). Therefore, we chose to use the point estimate of $\omega$ and incorporate uncertainty based on the initial proportion immune (i.e. vaccine effectiveness $\phi$) shortly after vaccination. Using the decay model in Equation \@ref(eq:omega) we estimated $\phi$ to be $0.64$ ($0.32-0.96$ 95% CI). We then fit a Beta distribution to the quantiles of $\phi$ by minimizing the sums of squares using the Nelder-Mead optimization algorithm to render the following distribution (shown in Figure \@ref(fig:effectiveness)B):
\begin{equation}
\phi \sim \text{Beta}(4.57, 2.41).
(\#eq:effectiveness)
\end{equation}
```{r effectiveness, echo=FALSE, fig.align='center', out.width="102%", fig.cap="This is vaccine effectiveness"}
knitr::include_graphics("figures/vaccine_effectiveness.png")
```
### Immunity from natural infection
The duration of immunity after a natural infection is likely to be longer lasting than that from vaccination with OCV (especially given the current one dose strategy). As in most SIR-type models, the rate at which individuals leave the Recovered compartment is governed by the immune decay parameter $\varepsilon$. We estimated the durability of immunity from natural infection based on two cohort studies and fit the following exponential decay model to estimate the rate of immunity decay over time:
$$
\text{Proportion immune}\ t \ \text{days after infection} = 0.99 \times (1 - \varepsilon) ^ {t-t_{\text{infection}}}
$$
Where we make the necessary and simplifying assumption that within 0--90 days after natural infection with *V. cholerae*, individuals are 95--99% immune. We fit this model to reported data from [Ali et al (2011)](https://doi.org/10.1093/infdis/jir416) and [Clemens et al (1991)](https://www.sciencedirect.com/science/article/pii/0140673691902076) (see Table \@ref(tab:immunity-sources)).
```{r immunity-sources, echo=FALSE}
df <- data.frame(
day = c(90, 30*36, 30*42),
effectiveness = c(0.95, 0.65, 0.61),
effectiveness_hi = c(0.95, 0.81, 0.81),
effectiveness_lo = c(0.95, 0.37, 0.21),
source = c("Assumption",
"[Ali et al (2011)](https://doi.org/10.1093/infdis/jir416)",
"[Clemens et al (1991)](https://www.sciencedirect.com/science/article/pii/0140673691902076)")
)
knitr::kable(df,
col.names = c("Day", "Effectiveness", "Upper CI", "Lower CI", "Source"),
caption = "Sources for the duration of immunity fro natural infection.")
```
We estimated the mean immune decay to be $\bar\varepsilon = 3.9 \times 10^{-4}$ ($1.7 \times 10^{-4}-1.03 \times 10^{-3}$ 95% CI) which is equivalent to an immune duration of $7.21$ years ($2.66-16.1$ years 95% CI) as shown in Figure \@ref(fig:immune-decay)A. This is slightly longer than previous modeling work estimating the duration of immunity to be ~5 years ([King et al 2008](https://www.nature.com/articles/nature07084)). Uncertainty around $\varepsilon$ in the model is then represented by a Log-Normal distribution as shown in Figure \@ref(fig:immune-decay)B:
$$
\varepsilon \sim \text{Lognormal}(\bar\varepsilon+\frac{\sigma^2}{2}, 0.25)
$$
```{r immune-decay, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The duration of immunity after natural infection with *V. cholerae*."}
knitr::include_graphics("figures/immune_decay.png")
```
## Spatial dynamics
The parameters in the model diagram in Figure \@ref(fig:diagram) that have a $jt$ subscript denote the spatial structure of the model. Each country is modeled as an independent metapopulation that is connected to all others via the spatial force of infection $\Lambda_{jt}$ which moves contagion among metapopulations according to the connectivity provided by parameters $\tau_i$ (the probability departure) and $\pi_{ij}$ (the probability of diffusion to destination $j$). Both parameters are estimated using the departure-diffusion model below which is fitted to average weekly air traffic volume between all of the 41 countries included in the MOSAIC framework (Figure \@ref(fig:mobility-data)).
```{r mobility-data, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The average number of air passengers per week in 2017 among all countries."}
knitr::include_graphics("figures/mobility_flight_data.png")
```
```{r mobility-network, echo=FALSE, fig.align='center', out.width="100%", fig.cap="A network map showing the average number of air passengers per week in 2017."}
knitr::include_graphics("figures/mobility_network.png")
```
### Human mobility model
The departure-diffusion model estimates diagonal and off-diagonal elements in the mobility matrix ($M$) separately and combines them using conditional probability rules. The model first estimates the probability of travel outside the origin location $i$---the departure process---and then the distribution of travel from the origin location $i$ by normalizing connectivity values across all $j$ destinations---the diffusion process. The values of $\pi_{ij}$ sum to unity along each row, but the diagonal is not included, indicating that this is a relative quantity. That is to say, $\pi_{ij}$ gives the probability of going from $i$ to $j$ given that travel outside origin $i$ occurs. Therefore, we can use basic conditional probability rules to define the travel routes in the diagonal elements (trips made within the origin $i$) as
$$
\Pr( \neg \text{depart}_i ) = 1 - \tau_i
$$
and the off-diagonal elements (trips made outside origin $i$) as
$$
\Pr( \text{depart}_i, \text{diffuse}_{i \rightarrow j}) = \Pr( \text{diffuse}_{i \rightarrow j} \mid \text{depart}_i ) \Pr(\text{depart}_i ) = \pi_{ij} \tau_i.
$$
The expected mean number of trips for route $i \rightarrow j$ is then:
\begin{equation}
M_{ij} =
\begin{cases}
\theta N_i (1-\tau_i) \ & \text{if} \ i = j \\
\theta N_i \tau_i \pi_{ij} \ & \text{if} \ i \ne j.
\end{cases}
(\#eq:M)
\end{equation}
Where, $\theta$ is a proportionality constant representing the overall number of trips per person in an origin population of size $N_i$, $\tau_i$ is the probability of leaving origin $i$, and $\pi_{ij}$ is the probability of travel to destination $j$ given that travel outside origin $i$ occurs.
### Estimating the departure process
The probability of travel outside the origin is estimated for each location $i$ to give the location-specific departure probability $\tau_i$.
$$
\tau_i \sim \text{Beta}(1+s, 1+r)
$$
Binomial probabilities for each origin $\tau_i$ are drawn from a Beta distributed prior with shape ($s$) and rate ($r$) parameters.
$$
\begin{aligned}
s &\sim \text{Gamma}(0.01, 0.01)\\
r &\sim \text{Gamma}(0.01, 0.01)
\end{aligned}
$$
### Estimating the diffusion process
We use a normalized formulation of the power law gravity model to defined the diffusion process, the probability of travelling to destination $j$ given travel outside origin $i$ ($\pi_{ij}$) which is defined as:
\begin{equation}
\pi_{ij} = \frac{
N_j^\omega d_{ij}^{-\gamma}
}{
\sum\limits_{\forall j \ne i} N_j^\omega d_{ij}^{-\gamma}
}
(\#eq:gravity)
\end{equation}
Where, $\omega$ scales the attractive force of each $j$ destination based on its population size $N_j$. The kernel function $d_{ij}^{-\gamma}$ serves as a penalty on the proportion of travel from $i$ to $j$ based on distance. Prior distributions of diffusion model parameters are defined as:
$$
\begin{aligned}
\omega &\sim \text{Gamma}(1, 1)\\
\gamma &\sim \text{Gamma}(1, 1)
\end{aligned}
$$
The models for $\tau_i$ and $\pi_{ij}$ were fitted to air traffic data from [OAG](https://www.oag.com/flight-data-sets) using the `mobility` R package ([Giles 2020](https://covid-19-mobility-data-network.github.io/mobility/)). Estimates for mobility model parameters are shown in Figures \@ref(fig:mobility-departure) and \@ref(fig:mobility-diffusion).
```{r mobility-departure, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The estimated weekly probability of travel outside of each origin location $\\tau_i$ and 95% confidence intervals is shown in panel A with the population mean indicated as a red dashed line. Panel B shows the estimated total number of travelers leaving origin $i$ each week."}
knitr::include_graphics("figures/mobility_travel_prob_tau.png")
```
```{r mobility-diffusion, echo=FALSE, fig.align='center', out.width="100%", fig.cap="The diffusion process $\\pi_{ij}$ which gives the estimated probability of travel from origin $i$ to destination $j$ given that travel outside of origin $i$ has occurred."}
knitr::include_graphics("figures/mobility_diffusion_pi.png")
```
### The probability of spatial transmission
The likelihood of introductions of cholera from disparate locations is a major concern during cholera outbreaks. However, this can be difficult to characterize given the endemic dynamics and patterns of human movement. We include a few measures of spatial heterogeneity here and the first is a simple importation probability based on connectivity and the possibility of incoming infections. The basic probability of transmission from an origin $i$ to a particular destination $j$ and time $t$ is defined as:
\begin{equation}
p(i,j,t) = 1 - e^{-\beta_{jt}^{\text{hum}} (((1-\tau_j)S_{jt})/N_{jt}) \pi_{ij}\tau_iI_{it}}
(\#eq:prob)
\end{equation}
### The spatial hazard
Although we are more concerned with endemic dynamics here, there are likely to be periods of time early in the rainy season where cholera cases and the rate of transmission is low enough for spatial spread to resemble epidemic dynamics for a time. During such times periods, we can estimate the arrival time of contagion for any location where cases are yet to be reported. We do this be estimating the spatial hazard of transmission:
\begin{equation}
h(j,t) = \frac{
\beta_{jt}^{\text{hum}} \Big(1 - \exp\big(-((1-\tau_j)S_{jt}/N_{jt}) \sum_{\forall i \not= j} \pi_{ij}\tau_i (I_{it}/N_{it}) \big) \Big)
}{
1/\big(1 + \beta_{jt}^{\text{hum}} (1-\tau_j)S_{jt}\big)
}.
(\#eq:hazard)
\end{equation}
And then normalizing to give the waiting time distribution for all locations:
\begin{equation}
w(j,t) = h(j,T) \prod_{t=1}^{T-1}1-h(j,t).
(\#eq:waiting)
\end{equation}
### Coupling among locations
Another measure of spatial heterogeneity is to quantify the coupling of disease dynamics among metapopulations using a correlation coefficient. Here, we use the definition of spatial correlation between locations $i$ and $j$ as $C_{ij}$ described in [Keeling and Rohani (2002)](https://onlinelibrary.wiley.com/doi/abs/10.1046/j.1461-0248.2002.00268.x), which gives a measure of how similar infection dynamics are between locations.
\begin{equation}
C_{ij} = \frac{
( y_{it} - \bar{y}_i )( y_{jt} - \bar{y}_j )
}{
\sqrt{\text{var}(y_i) \text{var}(y_j)}
}
(\#eq:correlation)
\end{equation}
Where $y_{it} = I_{it}/N_i$ and $y_{jt} = I_{jt}/N_j$. Mean prevalence in each location is $\bar{y_i} = \frac{1}{T} \sum_{t=1}^{T} y_{it}$ and $\bar{y_j} = \frac{1}{T} \sum_{t=1}^{T} y_{jt}$.
## The observation process
### Rate of symptomatic infection
The presentation of infection with *V. cholerae* can be extremely variable. The severity of infection depends many factors such as the amount of the infectious dose, the age of the host, the level of immunity of the host either through vaccination or previous infection, and naivety to the particular strain of *V. cholerae*. Additional circumstantial factors such as nutritional status and overall pathogen burden may also impact infection severity. At the population level, the observed proportion of infections that are symptomatic is also dependent on the endemicity of cholera in the region. Highly endemic areas (e.g. parts of Bangladesh; [Hegde et al 2024](https://www.nature.com/articles/s41591-024-02810-4)) may have a very low proportion of symptomatic infections due to many previous exposures. Inversely, populations that are largely naive to *V. cholerae* will exhibit a relatively higher proportion of symptomatic infections (e.g. Haiti; [Finger et al 2024](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635253/)).
Accounting for all of these nuances in the first version of this model not possible, but we can past studies do contain some information that can help to set some sensible bounds on our definition for the proportion of infections that are symptomatic ($\sigma$). So we have compiled a short list of studies that have done sero-surveys and cohort studies to assess the likelihood of symptomatic infections in different locations and displayed those results in Table (\@ref(tab:symptomatic-table)).
To provide a reasonably informed prior for the proportion of infections that are symptomatic, we calculated the combine mean and confidence intervals of all studies in Table \@ref(tab:symptomatic-table) and fit a Beta distribution that corresponds to these quantiles using least-squares and a Nelder-Mead algorithm. The resulting prior distribution for the symptomatic proportion $\sigma$ is:
\begin{equation}
\sigma \sim \text{Beta}(4.30, 13.51)
\end{equation}
```{r symptomatic-table, echo=FALSE, message=FALSE, warning=FALSE}
df <- data.frame(
mean = c(0.430, # Nelson et al (2009)
NA, # Lueng and Matrajt (2021)
NA, # Harris et al (2012)
0.762, # Finger et al (2024)
0.787, # Jackson et al (2013)
0.796, # Bart et al (1970)
0.629, # Bart et al (1970)
1-0.184, # Harris et al (2008)
1-0.0005996 # Hegde et al (2023)
),
ci_lo = c(NA, # Nelson et al (2009)
0, # Lueng and Matrajt (2021)
0.4, # Harris et al (2012)
0.75, # Finger et al (2024)
0.769, # Jackson et al (2013)
NA, # Bart et al (1970)
NA, # Bart et al (1970)
1-0.256, # Harris et al (2008)
1-0.0004998 # Hegde et al (2023)
),
ci_hi = c(NA, # Nelson et al (2009)
0.75, # Lueng and Matrajt (2021)
0.8, # Harris et al (2012)
0.773, # Finger et al (2024)
0.806, # Jackson et al (2013)
NA, # Bart et al (1970)
NA, # Bart et al (1970)
1-0.112, # Harris et al (2008)
1-0.0007994 # Hegde et al (2023)
),
source = c("[Nelson et al (2009)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842031/)", # Nelson et al (2009)
"[Lueng & Matrajt (2021)](https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0009383)", # Lueng and Matrajt (2021)
"[Harris et al (2012)](https://www.sciencedirect.com/science/article/pii/S014067361260436X)", # Harris et al (2012)
"[Finger et al (2024)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635253/)", # Finger et al (2024)
"[Jackson et al (2013)](https://www.ajtmh.org/view/journals/tpmd/89/4/article-p654.xml)", # Jackson et al (2013)
"[Bart et al (1970)](https://doi.org/10.1093/infdis/121.Supplement.S17)", # Bart et al (1970)
"[Bart et al (1970)](https://doi.org/10.1093/infdis/121.Supplement.S17)", # Bart et al (1970)
"[Harris et al (2008)](https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0000221)", # Harris et al (2008)
"[Hegde et al (2024)](https://www.nature.com/articles/s41591-024-02810-4)" # Hegde et al (2023)
),
location = c(NA, # Nelson et al (2009)
NA, # Lueng and Matrajt (2021)
"Endemic regions", # Harris et al (2012)
"Haiti", # Finger et al (2024)
"Haiti", # Jackson et al (2013)
"Pakistan", # Bart et al (1970)
"Pakistan", # Bart et al (1970)
"Bangladesh", # Harris et al (2008)
"Bangladesh" # Hegde et al (2023)
),
year = c(2009, # Nelson et al (2009)
2021, # Lueng and Matrajt (2021)
2012, # Harris et al (2012)
2024, # Finger et al (2024)
2013, # Jackson et al (2013)
1970, # Bart et al (1970)
1970, # Bart et al (1970)
2008, # Harris et al (2008)
2023 # Hegde et al (2023)
),
note = c("Review", # Nelson et al (2009)
"Review", # Lueng and Matrajt (2021)
"Review", # Harris et al (2012)
"Sero-survey and clinical data", # Finger et al (2024)
"Cross-sectional sero-survey", # Jackson et al (2013)
"Sero-survey during epidemic; El Tor Ogawa strain", # Bart et al (1970)
"Sero-survey during epidemic; Inaba strain", # Bart et al (1970)
"Household cohort; mean of all age groups", # Harris et al (2008)
"Sero-survey and clinical data" # Hegde et al (2023)
)
)
# Change to symptomatic definition
df$mean <- 1-df$mean
df$ci_lo <- 1-df$ci_lo
df$ci_hi <- 1-df$ci_hi
df <- df[, -which(names(df) == "year")]
df <- df[,c(1:3,5,4,6)]
knitr::kable(df, caption = "Summary of Studies on Cholera Immunity", digits = 3,
col.names = c("Mean", "Low CI", "High CI", "Location", "Source", "Note"))
```
The prior distribution for $\sigma$ is plotted in Figure \@ref(fig:symptomatic-fig)A with the reported values of the proportion symptomatic from previous studies shown in \@ref(fig:symptomatic-fig)B.
```{r symptomatic-fig, echo=FALSE, fig.align='left', out.width="103%", fig.cap="Proportion of infections that are symptomatic."}
knitr::include_graphics("figures/proportion_symptomatic.png")
```
### Suspected cases
The clinical presentation of diarrheal diseases is often similar across various pathogens, which can lead to systematic biases in the reported number of cholera cases. It is anticipated that the number of suspected cholera cases is related to the actual number of infections by a factor of \(1/\rho\), where \(\rho\) represents the proportion of suspected cases that are true infections. To adjust for this bias, we use estimates from the meta-analysis by [Weins et al. (2023)](https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1004286), which suggests that suspected cholera cases outnumber true infections by approximately 2 to 1, with a mean across studies indicating that 52% (24-80% 95% CI) of suspected cases are actual cholera infections. A higher estimate was reported for ourbreak settings (78%, 40-99% 95% CI). To account for the variability in this estimate, we fit a Beta distribution to the reported quantiles using a least squares approach and the Nelder-Mead algorithm, resulting in the prior distribution shown in Figure \@ref(fig:rho)B:
\begin{equation}
\rho \sim \text{Beta}(4.79, 1.53).
\end{equation}
```{r rho, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', out.width="100%", fig.cap="Proportion of suspected cholera cases that are true infections. Panel A shows the 'low' assumption which estimates across all settings: $\\rho \\sim \\text{Beta}(5.43, 5.01)$. Panel B shows the 'high' assumption where the estimate reflects high-quality studies during outbreaks: $\\rho \\sim \\text{Beta}(4.79, 1.53)$"}
knitr::include_graphics("figures/suspected_cases.png")
```
### Case fatality rate
The Case Fatality Rate (CFR) among symptomatic infections was calculated using reported cases and deaths data from January 2021 to August 2024. The data were collated from various issues of the WHO Weekly Epidemiological Record the Global Cholera and Acute Watery Diarrhea (AWD) Dashboard (see Data section) which provide annual aggregations of reported cholera cases and deaths. We then used the Binomial exact test ([`binom.test`](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/binom.test) in R) to calculate the mean probability for the number of deaths (successes) given the number of reported cases (sample size), and the [Clopper-Pearson method](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper%E2%80%93Pearson_interval) for calculating the binomial confidence intervals. We then fit Beta distributions to the mean CFR and 95% confidence intervals calculated for each country using least squares and the Nelder-Mead algorithm to give the distributional uncertainty around the CFR estimate for each country ($\mu_j$).
$$
\mu_j \sim \text{Beta}(s_{1,j}, s_{2,j})
$$
Where $s_{1,i}$ and $s_{2,j}$ are the two positive shape parameters of the Beta distribution estimated for destination $j$. By definition $\mu_j$ is the CFR for reported cases which are a subset of the total number of infections. Therefore, to infer the total number of deaths attributable to cholera infection, we assume that the CFR of observed cases is proportionally equivalent to the CFR of all cases and then calculate total deaths $D$ as follows:
\begin{equation}
\begin{aligned}
\text{CFR}_{\text{observed}} &= \text{CFR}_{\text{total}}\\
\\[3pt]
\frac{[\text{observed deaths}]}{[\text{observed cases}]} &=
\frac{[\text{total deaths}]}{[\text{all infections}]}\\
\\[3pt]
\text{total deaths} &= \frac{[\text{observed deaths}] \times [\text{true infections}]}{[\text{observed cases}]}\\
\\[3pt]
D_{jt} &= \frac{ [\sigma\rho\mu_j I_{jt}] \times [I_{jt}] }{ [\sigma\rho I_{jt}] }
\end{aligned}
\end{equation}
```{r cfr, echo=FALSE, message=FALSE, warning=FALSE}
library(kableExtra)
cholera_data_recent <- read.csv(file.path(getwd(), "tables/case_fatality_ratio_2014_2024.csv"), stringsAsFactors = FALSE)
filtered_data <- cholera_data_recent[!is.na(cholera_data_recent$cases_total), ]
cfr_table <- filtered_data[, c("country", "cases_total", "deaths_total", "cfr", "cfr_lo", "cfr_hi", "shape1", "shape2")]
# Round all numeric values to 2 decimal places
cfr_table[, c("cases_total", "deaths_total", "cfr", "cfr_lo", "cfr_hi", "shape1", "shape2")] <-
round(cfr_table[, c("cases_total", "deaths_total", "cfr", "cfr_lo", "cfr_hi", "shape1", "shape2")], 3)
knitr::kable(cfr_table,
caption = "CFR Values and Beta Shape Parameters for AFRO Countries",
col.names = c("Country", "Cases (2014-2024)", "Deaths (2014-2024)", "CFR", "CFR Lower", "CFR Upper", "Beta Shape1", "Beta Shape2")) %>%
kable_styling(full_width = FALSE,
bootstrap_options = c("hover", "condensed"))
```
```{r cfr-cases, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', out.width='100%', fig.cap='Case Fatality Rate (CFR) and Total Cases by Country in the AFRO Region from 2014 to 2024. Panel A: Case Fatality Ratio (CFR) with 95% confidence intervals. Panel B: total number of cholera cases. The AFRO Region is highlighted in black, all countries with less than 3/0.2 = 150 total reported cases are assigned the mean CFR for AFRO.'}
knitr::include_graphics("figures/case_fatality_ratio_and_cases_total_by_country.png")
```
```{r cfr-beta, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', out.width='95%', fig.cap='Beta distributions of the overall Case Fatality Rate (CFR) from 2014 to 2024. Examples show the overall CFR for the AFRO region (2%) in black, Congo with the highest CFR (7%) in red, and South Sudan with the lowest CFR (0.1%) in blue.'}
knitr::include_graphics("figures/case_fatality_ratio_beta_distributions.png")
```
## Demographics
The model includes basic demographic change by using reported birth and death rates for each of the $j$ countries, $b_j$ and $d_j$ respectively. These rates are static and defined by the United Nations Department of Economic and Social Affairs Population Division [World Population Prospects 2024](https://population.un.org/wpp/Download/Standard/CSV/). Values for $b_j$ and $d_j$ are derived from crude rates and converted to birth rate per day and death rate per day (shown in Table \@ref(tab:demographics)).
```{r demographics, echo=FALSE, message=FALSE, warning=FALSE}
library(kableExtra)
dem <- read.csv(file.path(getwd(), "data/demographics_afro_2023.csv"), stringsAsFactors = FALSE)
dem <- dem[order(dem$country),]
dem <- dem[,c("country", "population", "birth_rate_per_day", "death_rate_per_day")]
row.names(dem) <- NULL
knitr::kable(
dem,
caption = "Demographic for AFRO countries in 2023. Data include: total population as of January 1, 2023, daily birth rate, and daily death rate. Values are calculate from crude birth and death rates from UN World Population Prospects 2024.",
col.names = c("Country", "Population", "Birth rate", "Death rate")
) %>%
kable_styling(full_width = FALSE,
bootstrap_options = c("hover", "condensed"))
```
## The reproductive number
The reproductive number is a common metric of epidemic growth that represents the average number of secondary cases generated by a primary case at a specific time during an epidemic. We track how $R$ changes over time by estimating the instantaneous reproductive number $R_t$ as described in [Cori et al 2013](https://academic.oup.com/aje/article/178/9/1505/89262). We track $R_t$ across all metapopulations in the model to give $R_{jt}$ using the following formula:
\begin{equation}
R_{jt} = \frac{I_{jt}}{\sum_{\Delta t=1}^{t} g(\Delta t) I_{j,t-\Delta t}}
(\#eq:R)
\end{equation}
Where $I_{jt}$ is the number of new infections in destination $j$ at time $t$, and
$g(\Delta t)$ represents the probability value from the generation time distribution of cholera. This is accomplished by using the weighed sum in the denominator which is highly influenced by the generation time distribution.
### The generation time distribution
The generation time distribution gives the time between when an individual is infected and when they infect subsequent individuals. We parameterized this quantity using a Gamma distribution with a mean of 5 days:
\begin{equation}
g(\cdot) \sim \text{Gamma}(0.5, 0.1).
(\#eq:generation-time)
\end{equation}
Here, shape=0.5, rate=0.1, and the mean if given by shape/rate. Previous studies use a mean of 5 days ([Kahn et al 2020](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7060667/#:~:text=The%20serial%20interval%20for%20cholera,routes%20(45%2C%2046).) and [Azman 2016](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4880069/)), however a mean of 3, 5, 7, or 10 days may be admissible ([Azman 2012](https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0001901)).
```{r generation, echo=FALSE, fig.align='center', out.width="95%", fig.cap="This is generation time"}
knitr::include_graphics("figures/generation_time.png")
```
```{r, echo=FALSE, message=FALSE, warning=FALSE}
library(knitr)
library(kableExtra)
df <- read.csv("./tables/generation_time_weeks.csv")
df$Probability <- round(df$Probability, 3)
knitr::kable(df, caption = "Generation Time in Weeks") %>%
kable_styling(full_width = FALSE,
bootstrap_options = c("hover", "condensed"))
```
## Initial conditions
Since this first version of the model will begin on Jan 2023 (to take advantage of available weekly data), the initial conditions surrounding population immunity must be estimated. To set these initial conditions, we use historical data to find the total number of reported cases for a location over the previous X years, multiply by $1/\sigma$ to estimate total infections from those symptomatic cases that are reported, and then adjust based on waning immunity. We also sum the total number of vaccinations over the past X years and adjust for vaccine efficacy $\phi$ and waning immunity from vaccination $\omega$.
* total number infected? From reported cases... back out symptomatic and asymptomatic
* Total number immune due to natural infections in the past X years
* total number immune due to past vaccinations in the X years
Use deconvolution based on immune decay estimated in vaccine section
## Model calibration
* The model will be calibrated using Latin hypercube sampling for hyper-parameters and model likelihoods fit to incidence and deaths.
* An important challenge is flexibly fitting to data that are often missing or only available in aggregated forms.
[Fig: different spatial and temporal scales of available data]
## Caveats
* Simplest model to start. Easier for initial spatial structure but with minimum additional compartments to calibrate to available data (vaccination, cases, deaths).
* Country level aggregations. First generation data is 2023/24...
* Assumes vaccinating susceptible only individuals.
* For climate, summarizing for whole country.
## Table of parameters
```{r params, echo=FALSE, message=FALSE, warning=FALSE}
tbl <- data.frame(
Parameter = c(
"$i$", # Index i
"$j$", # Index j
"$t$", # Index t
"$b_j$", # Birth rate of population j
"$d_j$", # Mortality rate of population j
"$N_{jt}$", # Population size of destination j at time t
"$S_{jt}$", # Susceptible individuals in destination j at time t
"$V_{jt}$", # Vaccinated individuals in destination j at time t
"$I_{jt}$", # Infected individuals in destination j at time t
"$W_{jt}$", # V. cholerae in the environment in destination j at time t
"$R_{jt}$", # Recovered (immune) individuals in destination j at time t
"$\\Lambda_{j,t+1}$", # Human-to-human force of infection in destination j at time t+1
"$\\Psi_{j,t+1}$", # Environment-to-human force of infection in destination j at time t+1
"$\\phi$", # Effectiveness of Oral Cholera Vaccine (OCV)
"$\\nu_{jt}$", # Rate of OCV vaccination in destination j at time t
"$\\omega$", # Waning immunity rate of vaccinated individuals
"$\\varepsilon$", # Waning immunity rate of recovered individuals
"$\\gamma$", # Recovery rate of infected individuals
"$\\mu$", # Mortality rate due to V. cholerae infection
"$\\sigma$", # Proportion of symptomatic V. cholerae infections
"$\\rho$", # Proportion of suspected cholera cases that are true infections
"$\\zeta$", # Shedding rate of V. cholerae into the environment
"$\\delta$", # Environmental decay rate of V. cholerae
"$\\delta_{\\text{min}}$", # Minimum environmental decay rate of V. cholerae
"$\\delta_{\\text{max}}$", # Maximum environmental decay rate of V. cholerae
"$\\psi_{jt}$", # Environmental suitability of V. cholerae in destination j at time t
"$\\beta_{j0}^{\\text{hum}}$", # Baseline human-to-human transmission rate in destination j
"$\\beta_{jt}^{\\text{hum}}$", # Seasonal human-to-human transmission rate in destination j at time t
"$\\beta_{j0}^{\\text{env}}$", # Baseline environment-to-human transmission rate in destination j
"$\\beta_{jt}^{\\text{env}}$", # Environment-to-human transmission rate in destination j at time t
"$\\alpha$", # Population mixing parameter
"$\\tau_i$", # Probability of departure from origin i
"$\\pi_{ij}$", # Probability of travel from origin i to destination j
"$\\theta_{j}$", # Proportion with adequate WASH in destination j
"$\\kappa$" # Concentration of V. cholerae required for 50% infection probability
),
Description = c(
"Index $i$ represents the origin metapopulation.", # i
"Index $j$ represents the destination metapopulation.", # j
"Index $t$ is the time step which is one week (7 days).", # t
"Birth rate of population $j$.", # b_j
"Overall mortality rate of population $j$.", # d_j
"Total population size of destination $j$ at time $t$.", # N_{jt}
"Number of susceptible individuals in destination $j$ at time $t$.", # S_{jt}
"Number of effectively vaccinated individuals in destination $j$ at time $t$.", # V_{jt}
"Number of infected individuals in destination $j$ at time $t$.", # I_{jt}
"Total amount of V. cholerae in the environment in destination $j$ at time $t$.", # W_{jt}
"Number of recovered (and therefore immune) individuals in destination $j$ at time $t$.", # R_{jt}
"The force of infection due to human-to-human transmission in destination $j$ at time step $t+1$.", # Lambda_{j,t+1}
"The force of infection due to environment-to-human transmission in destination $j$ at time step $t+1$.", # Psi_{j,t+1}
"The effectiveness of Oral Cholera Vaccine (OCV).", # phi
"The reported rate of vaccination with OCV in destination $j$ at time $t$.", # nu_{jt}
"Rate of waning immunity of vaccinated individuals.", # omega
"Rate of waning immunity of recovered individuals.", # varepsilon
"Recovery rate of infected individuals.", # gamma
"Mortality rate due to infection with *V. cholerae*.", # mu
"Proportion of *V. cholerae* infections that are symptomatic.", # sigma
"The proportion of suspected cholera cases that are true infections.", # rho
"Rate that infected individuals shed *V. cholerae* into the environment.", # zeta
"The environmental suitability dependent decay rate of *V. cholerae* in the environment.", # delta
"The minimum decay rate of *V. cholerae* in the environment obtained when environmental suitability ($\\psi_{jt}$) = 0.", # delta_{min}
"The maximum decay rate of *V. cholerae* in the environment obtained when environmental suitability ($\\psi_{jt}$) = 1.", # delta_{max}
"The climatically driven environmental suitability of *V. cholerae* in destination $j$ at time $t$.", # psi_{jt}
"The baseline rate of human-to-human transmission in destination $j$.", # beta_{j0}^{hum}
"The seasonal rate of human-to-human transmission in destination $j$ at time $t$.", # beta_{jt}^{hum}
"The baseline rate of environment-to-human transmission in destination $j$.", # beta_{j0}^{env}
"The rate of environment-to-human transmission in destination $j$ at time $t$.", # beta_{jt}^{env}
"The overall population mixing parameter.", # alpha
"The probability that an individual departs origin $i$.", # tau_i
"The probability that an individual travels from origin $i$ to destination $j$ given that they have departed origin $i$.", # pi_{ij}
"The proportion of the population that have adequate Water, Sanitation and Hygiene (WASH).", # theta_j
"The concentration (number of cells per mL) of *V. cholerae* required for a 50% probability of infection." # kappa
),
Distribution = c(
"", # i
"", # j
"", # t
"", # b_j
"", # d_j
"", # N_{jt}
"", # S_{jt}
"", # V_{jt}
"", # I_{jt}
"", # W_{jt}
"", # R_{jt}
"", # Lambda_{j,t+1}
"", # Psi_{j,t+1}
"", # phi
"", # nu_{jt}
"", # omega
"", # varepsilon
"", # gamma
"", # mu
"", # sigma
"", # rho
"$0.1\\mbox{-}10$", # zeta
"", # delta
"", # delta_{min}
"", # delta_{max}
"", # psi_{jt}
"", # beta_{j0}^{hum}
"", # beta_{jt}^{hum}
"", # beta_{j0}^{env}
"", # beta_{jt}^{env}
"0.95", # alpha
"", # tau_i