diff --git a/ArboMAP_forecast.html b/ArboMAP_forecast.html new file mode 100644 index 0000000..3d69a92 --- /dev/null +++ b/ArboMAP_forecast.html @@ -0,0 +1,2796 @@ + + + + + + + + + + + + + + +West Nile Virus Forecast Report for 2018-08-15 South Dakota + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

1 Forecast results

+

The Arbovirus Monitoring and Prediction (ArboMAP) system produces a +weekly, county-level forecast of human West Nile virus (WNV) cases using +environmental data combined with entomological data.

+

Modeling overview: The transmission of mosquito-borne +diseases, such as WNV, is influenced by environmental conditions that +affect many aspects of the disease transmission system. ArboMAP uses an +ensemble of different mathematical models that each are predicting if a +county will report at least one case in a given week (‘positive +county-week’). Results presented are an average of the models with +ranges as appropriate. As part of the process, mosquito infection rate +is also modeled based on the mosquito pool data, and is included in the +default modeling. ArboMAP uses generalized additive models (GAMs) with +smooths for seasonality, and also lagged weather data, which allows it +to model the time-delayed effects of weather conditions. The appendix +will expand all the results to show each individual model.

+
+

1.1 Forecast week WNV +absolute risk

+

The following map displays the absolute risk of +predicted positive counties during epidemiological week 33.

+

This map can be used in conjunction with the relative +risk map. The absolute risk map shows the risk of a county +reporting at least one WNV positive human case during this week, and the +relative risk map shows if this risk is elevated (or not) as compared to +previous years.

+ +

+

+
+
+
+

1.2 Forecast week WNV +relative risk

+

In the forecast week there are 0 counties with higher than average +risk as compared to the same epidemiological week in previous years +(2004 through 2018).

+

This relative risk map may be used in conjunction +with the absolute risk map. The absolute risk map shows +the risk of a county reporting at least one WNV positive human case +during this week, and the relative risk map shows if this risk is +elevated (or not) as compared to previous years.

+ +

+
+
+

1.3 Forecast year

+

The following graph is the predicted epicurve of the forecast year: +the average of all models is shown as a dark red line, with the range of all +models in the shaded ribbon. Forecasts are shown as a dotted line and +the predicted values from before the current forecast week (‘backcast’) +are shown as a solid line. The appendix will have a version of this +chart with a series for each model, rather than an average.

+

The historical observed proportion of counties +positive, averaged over all known years, is also shown, here as a dark blue line. This is excluding human +cases that occurred very early or very late in the season (temporal +outliers), based on the percentage cut-off in the parameters, 0.02. This +plotted curve allows a comparison between the timing and height of the +predicted peak of cases as compared to averaged historical years. In the +averaged year, 49% of the yearly cases would have been observed by this +forecast week.

+

+
+
+

1.4 Case estimation

+

ArboMAP models are based on ‘positive county-weeks’, the probability +that a county would have at least one human WNV case in a given week. +These values can be used to predict a total number of +cases, shown in the table below.

+ + ++++++ + + + + + + + + + + + + + + + + +
Estimated number of WNV cases
YearPredicted positive county-weeksAverage estimated cases (standard dev)Range of estimated cases
20186375 (+/-15)56 - 89
+
+
+

1.5 Model fit +statistics

+

The following table gives a summary of how well the model is fitting +the historical years. The Area Under the ROC curve (AUC) is a statistic +that ranges from 0 (model is right 0% of the time) to 1 (model is right +100% of the time). Scores above 0.5 are better than a random model, with +>0.7 generally considered acceptable and >0.8 as good.

+ + + + + + + + + + + + + + + + + + +
Area Under Curve (AUC) statistics of all model fits
ModelAverage AUCMin AUCMax AUC
Average of all models0.840.840.85
+
+
+

1.6 Multi-year +forecast

+

The following chart shows the model results for the entire modeled +period from 2004 through 2018. Years prior to the forecast year that had +human case data were used for fitting the model.

+

Similar to the previous forecast year chart, the average of all +models is shown as a dark red line, +with the range of all models as the shaded ribbon. Forecasts are shown +as a dotted line and predicted values from before the current forecast +week (‘backcast’) are shown as a solid line. The historical +observed values are shown in black. The appendix will +have a version of this chart with a series for each model, rather than +an average.

+

+
+
+
+

2 Input data +summaries

+

The report was requested for 2018-08-15, which is CDC/MMWR epiweek 33 +in epiyear 2018.

+
+

2.1 Human cases

+

After data processing, the human case data contained a total of 1300 +rows containing data from years: 2004, 2005, 2006, 2007, 2008, 2009, +2010, 2011, 2012, 2013, 2014, 2015, 2016, and 2017. Parameters were set +to include human data from 2004 through 2017. Data from all years were +found in the data file.

+

The human case data entries that were unmatched to spatial data +during processing are in the table below. Please check for mispellings +in the original file. Internal IDs for county names will show in lower +case with other formatting applied for attempted matching.

+ + + + + + + + + + + + + + +
Unmatched human case entries
arbo_IDdate
greg0ry7/4/2004
+

Over all years, the state saw a cumulative total of 1300 human cases +representing 1141 positive county-weeks from a total of 66 counties.

+

+

+

To compare the epicurve of human cases in each year, the heatmap +below shows when in each year the cases occurred.

+

+
+
+
+

2.2 Mosquito pools

+

After data processing, the mosquito pool data contained a total of +16274 rows, containing data from years: 2004, 2005, 2006, 2007, 2008, +2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, and 2018. Overall +during this time frame, a total of 29 counties reported mosquito +data.

+

Parameters were set to include mosquito data from 2004 through 2018. +Data from all years were found in the data file.

+

Note that even if there were no positive pools in a given year, if +there were any pools tested then the data will be useful; zero infection +rates do predict low-risk years and should be used.

+

Parameters were set to include mosquito data from day of year 140 +through 366. The mosquito infection rate modeling is very sensitive to +early mosquito pool results, which is why a cut-off is used. Sensitivity +analyses indicate that a start day of year of 140 is a reasonable +cut-off for a high modeling accuracy.

+

Modeling was done from 2004 through 2018. Years without mosquito data +during these years are assumed to have average mosquito infection rates. +This allows us to estimate relationships with environmental data even +when mosquito data are not available. There were 6 years where mosquito +infection rates were imputed. These years are: 2004, 2007, 2009, 2010, +2012, and 2013. In modeling years where sufficient mosquito data were +present, the mosquito infection statistic was created using the model +specified in the input parameter: stratifiedMIGR.

+

In the forecast year to date, there were 1697 pools reported from 12 +counties. Of these pools, 66 (4%) were reported WNV positive.

+

Pool statistics for the past two weeks are also included. If pool +data exists for the forecast epiweek, then the two weeks will be the +forecast week and the week prior. If data does not exist yet for the +requested forecast epiweek, then the weeks shown will be the two +epiweeks prior to the forecast week. In this report, the two weeks are +07/29/2018 through 08/05/2018 (epiweeks 31 & 32) with mosquito data +existing between 07/29/2018 through 07/31/2018.

+

+ + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Total reported and WNV mosquito pools: Counties with positive +pools in year to forecast week (YTD) or reported any (positive or +negative) pools in past two weeks
CountyPools reported last 2 weeksPools positive last 2 weeks (%)Pools reported YTDPools positive YTD (%)
Beadle305 (16.7%)16111 (6.8%)
Brookings130 (0%)2907 (2.4%)
Brown385 (13.2%)55630 (5.4%)
Davison51 (20%)382 (5.3%)
Edmunds--211 (4.8%)
Fall River20 (0%)572 (3.5%)
Hughes161 (6.2%)761 (1.3%)
Lincoln--471 (2.1%)
Minnehaha--37210 (2.7%)
Stanley--291 (3.4%)
+

The next graph shows the percentage of predicted positive pools by +year comparing the forecast year (in red) to the requested comparison years +(shades of blue) and all other years (gray). If there is sufficient data +in the forecast year, the observed pools rates are shown as black dots, +binned into a variable number of different time points (up to 6) +depending on how much data is available.

+

+

The last mosquito graph shows the relative risk due to the mosquito +infection rate as a time-series of all known years. Mosquito strata are +shown in different colors.

+

+
+
+
+

2.3 Weather

+

After processing, weather data existed from 2001-01-01 through +2018-08-15 for 66 counties. Environmental data are read and collated +from all files in the base data_weather folder and if there +are duplicate data entries for any particular day, the value from the +latest file is used (i.e. the latest updated value).

+

Parameters were set to include weather data from 2000 through 2018. + + All +necessary weather data were found in the data file.

+

The report parameters set the two environmental predictor variables +as tmeanc and vpd. The following two graphs show the median state-wide +observed weather variables for the forecast year, compared to the +historical median. Two or more consecutive days that are greater +than the historical median are drawn in red and consecutive days that are +less than the historical median are drawn in blue. Consecutive days that overlap the +historical median (i.e. one day above and the next below, or the +opposite) are in purple. The gray shaded region is a ribbon showing the +historical range (min to max). The appendix will show the anomaly graphs +(same timeseries, but the weather variable has been anomalized).

+

+
+
+

2.4 Reference map

+

For South Dakota, the spatial data (shapefile) contained 66 counties: +Aurora, Beadle, Bennett, Bon Homme, Brookings, Brown, Brule, Buffalo, +Butte, Campbell, Charles Mix, Clark, Clay, Codington, Corson, Custer, +Davison, Day, Deuel, Dewey, Douglas, Edmunds, Fall River, Faulk, Grant, +Gregory, Haakon, Hamlin, Hand, Hanson, Harding, Hughes, Hutchinson, +Hyde, Jackson, Jerauld, Jones, Kingsbury, Lake, Lawrence, Lincoln, +Lyman, Marshall, McCook, McPherson, Meade, Mellette, Miner, Minnehaha, +Moody, Oglala Lakota, Pennington, Perkins, Potter, Roberts, Sanborn, +Spink, Stanley, Sully, Todd, Tripp, Turner, Union, Walworth, Yankton, +and Ziebach

+

+
+
+

2.5 Parameters used

+

The report was run with the following parameters set.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Parameters used
ParameterValue
forecast_date2018-08-15
state_nameSouth Dakota
state_codeSD
predictor_var1tmeanc
predictor_var2vpd
mosquito_modelstratifiedMIGR
mosquito_doy_start140
mosquito_doy_end366
file_humandata_human/simulated_human_data.csv
file_mosquitodata_mosquito/simulated_mosquito_data.csv
file_stratadata_strata/example_strata_SD.csv
file_county_sfdata_spatial/sd_counties.RDS
file_modelsdata_models/models.txt
folder_weatherdata_weather
year_human_start2004
year_human_end2017
year_mosquito_start2004
year_mosquito_end2018
year_weather_start2000
year_weather_end2018
year_compare_vis12012
year_compare_vis22017
create_appendixTRUE
lag_length121
case_trim_alpha0.02
version4.2
+
+
+
+
+

3 Appendix

+

This appendix will provide more details into some of the underlying +forecast modeling and break out the results per model, rather than an +average of all models run (as in the main report).

+
+

3.1 Forecast results

+
+

3.1.1 Current-week WNV +absolute risk

+

Following are the absolute risk maps generated by +each model:

+

+
+
+
+

3.1.2 Current-week WNV +relative risk

+

Following are the relative risk maps generated by +each model:

+

+
+
+
+

3.1.3 Current-year +forecasts

+

The graph below shows the current year forecast, with lines for +each model:

+

+
+
+

3.1.4 Case +estimations

+

The table below lists the estimated case counts per model.

+ + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Estimated number of WNV cases
YearModelPredicted positive county-weeksEstimated cases
2018cub-fx-anom73.286
2018cub-fx-nonanom75.789
2018cub-sv-anom46.856
2018cub-sv-nonanom55.766
2018tp-fx-anom73.386
2018tp-fx-nonanom75.789
2018tp-sv-anom46.255
2018tp-sv-nonanom55.766
+
+
+

3.1.5 Additional model +fit statistics

+

The table below gives multiple model fit statistic per forecast +model:

+
    +
  • AUC : Area Under ROC Curve, values range 0 - 1
  • +
  • AIC : Akaike information criterion, relative fit statistic to other +models
  • +
  • Temporal MAE : Mean Average Error, mean of weeks (collapsed to +state)
  • +
  • Spatial MAE : Mean Average Error, mean of counties (collapsed all +time)
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Fit statistcs by model
ModelAUCAICMAE TemporalMAE Spatial
cub-fx-nonanom0.83967281.8370.772
cub-fx-anom0.83867491.7910.766
cub-sv-nonanom0.84666341.6220.743
cub-sv-anom0.84866451.5390.729
tp-fx-nonanom0.83967281.8370.772
tp-fx-anom0.83867491.7900.766
tp-sv-nonanom0.84666341.6200.743
tp-sv-anom0.84866441.5380.729
+
+
+

3.1.6 Partial +effects

+

ArboMAP allows the user to write custom model formulas, and as such +the plots below are the partial effects of all the smooth terms for each +model. These show the component effect of the term. All components (not +just smooths) added together would be the overall prediction. For a +table of all formulas, see the section on “Models and formulas”.

+

The number after the comma in the s({item}, {number}) +labels is the effective degrees of freedom (EDF). The EDF is a +measurement of the complexity of the smooth term - a value of 1 is a +straight line, higher values are more complex curves.

+

An easy way to check on the significance of the smooth term is if you +cannot draw a horizontal line through the 95% confidence interval (value ++/- se, shown in the gray shaded ribbon in the relevant graphs).

+

All models with smooths will have 1-D graphs. Seasonally-varying +models will also have 2-D graphs (components with doymat in +standard models), however a subset of y-values have been pulled out to +plot as lines.

+
+

3.1.6.1 Non-anomalized +weather with fixed cubic splines: “cub-fx-nonanom”

+

+

+
+
+

3.1.6.2 Anomalized +weather with fixed cubic splines: “cub-fx-anom”

+

+

+

+
+
+

3.1.6.3 Non-anomalized +weather with seasonally-varying cubic splines: “cub-sv-nonanom”

+

+

+
+
+

3.1.6.4 Non-anomalized +weather with seasonally-varying cubic splines: “cub-sv-anom”

+

+

+

+
+
+

3.1.6.5 Non-anomalized +weather with fixed thin plate splines: “tp-fx-nonanom”

+

+

+
+
+

3.1.6.6 Anomalized +weather with fixed thin plate splines: “tp-fx-anom”

+

+

+

+
+
+

3.1.6.7 Non-anomalized +weather with seasonally-varying thin plate splines: “tp-sv-nonanom”

+

+

+
+
+

3.1.6.8 Anomalized +weather with seasonally-varying thin plate splines: “tp-sv-anom”

+

+

+

+
+
+
+

3.1.7 Multi-year +forecasts

+

The graph below shows the full forecast for all years, with lines for +each model:

+

+
+
+
+

3.1.8 Models and +formulas

+

The table below lists the models that were found in the +data_models/models.txt file. Standard models will have a text +description, but all models run should appear in the table along with +their formula.

+

The following fields may be present:

+
    +
  • any_cases : positive county-week
  • +
  • arbo_ID : internal field for identifying counties
  • +
  • mir_stat : the mosquito infection rate statistic
  • +
  • s(lag, by=var...) : fixed smooth term for the +environmental variable over the distributed lag period
  • +
  • te(lag, doymat, by=var...) : seasonally-varying smooth +term for the environmental variable over the distributed lag period
  • +
  • var1 : variable for tmeanc, observed value
  • +
  • var2 : variable for vpd, observed value
  • +
  • var1_anom : variable for tmeanc, anomalized value
  • +
  • var2_anom : variable for vpd, anomalized value
  • +
  • s(doy,...) : smooth term for day of year, for +seasonality
  • +
+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDescriptionFormula
cub-fx-nonanomNon-anomalized weather with fixed cubic splinesany_cases ~ 0 + arbo_ID + mir_stat + s(lag, by=var1, +bs=‘cr’) + s(lag, by=var2, bs=‘cr’)
cub-fx-anomAnomalized weather with fixed cubic splinesany_cases ~ 0 + arbo_ID + mir_stat + s(lag, +by=var1_anom, bs=‘cr’) + s(lag, by=var2_anom, bs=‘cr’) + s(doy, +bs=‘cr’)
cub-sv-nonanomNon-anomalized weather with seasonally-varying cubic +splinesany_cases ~ 0 + arbo_ID + mir_stat + te(lag, doymat, +by=var1, bs=‘cr’) + te(lag, doymat, by=var2, bs=‘cr’)
cub-sv-anomNon-anomalized weather with seasonally-varying cubic +splinesany_cases ~ 0 + arbo_ID + mir_stat + te(lag, doymat, +by=var1_anom, bs=‘cr’) + te(lag, doymat, by=var2_anom, bs=‘cr’) + s(doy, +bs=‘cr’)
tp-fx-nonanomNon-anomalized weather with fixed thin plate +splinesany_cases ~ 0 + arbo_ID + mir_stat + s(lag, by=var1, +bs=‘tp’) + s(lag, by=var2, bs=‘tp’)
tp-fx-anomAnomalized weather with fixed thin plate splinesany_cases ~ 0 + arbo_ID + mir_stat + s(lag, +by=var1_anom, bs=‘tp’) + s(lag, by=var2_anom, bs=‘tp’) + s(doy, +bs=‘tp’)
tp-sv-nonanomNon-anomalized weather with seasonally-varying thin +plate splinesany_cases ~ 0 + arbo_ID + mir_stat + te(lag, doymat, +by=var1, bs=‘tp’) + te(lag, doymat, by=var2, bs=‘tp’)
tp-sv-anomAnomalized weather with seasonally-varying thin plate +splinesany_cases ~ 0 + arbo_ID + mir_stat + te(lag, doymat, +by=var1_anom, bs=‘tp’) + te(lag, doymat, by=var2_anom, bs=‘tp’) + s(doy, +bs=‘tp’)
+
+
+
+

3.2 Data summaries

+
+

3.2.1 Anomalized +environmental variables

+

The report parameters set the two environmental predictor variables +as tmeanc and vpd. The following two graphs show the median state-wide +anomalized weather variables for the forecast year, compared to the +historical median. Anomalies are calculated using deviance between the +observed value and the predicted value from a GAM regression model using +county and a smooth on day of year (seasonality) and county. An anomaly +is the observed minus the predicted.

+

Two or more consecutive days that have anomalized values +greater than the anomalized historical median are drawn +in red and consecutive days that are +less than the historical median are drawn in blue. Consecutive days that overlap the +historical median (i.e. one day above and the next below, or the +opposite) are in purple. The gray shaded region is a ribbon showing the +historical range (min to max).

+

+
+
+

3.2.2 Modeled mosquito +infection rate

+

In modeling years where sufficient mosquito data were present, the +mosquito infection rate (MIR) statistic was created using the model +specified in the input parameter: stratifiedMIGR. The following table +presents the calculated centered MIR values that were used in the +forecast modeling.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Mosquito model summary statistic
YearStratumCentered MIR stat
20041010.265
2004104-0.909
20051010.635
20051020.219
2005103-0.793
2005104-0.505
20061011.312
20061020.312
20061030.224
2006104-0.197
20071010.266
20071020.629
20071030.314
2008101-0.410
2008102-0.610
20081030.268
2008104-0.376
2009101-0.380
20091020.359
2009103-0.092
2010101-0.964
2010102-0.929
20101030.278
2011101-1.700
2011102-1.498
2011103-1.257
2011104-0.469
20121010.375
20121021.442
20121030.853
20131011.048
20131020.799
20131031.268
20141010.381
20141020.117
2014103-0.158
2014104-0.782
2015101-0.481
20151020.261
20151030.304
20151040.647
2016101-0.014
20161020.584
2016103-0.415
2016104-0.171
2017101-0.282
20171020.347
20171030.073
20171040.091
2018101-0.064
20181020.320
2018103-0.297
2018104-0.238
+
+
+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + diff --git a/ArboMAP_forecast.pdf b/ArboMAP_forecast.pdf index cafb55e..1433e07 100644 Binary files a/ArboMAP_forecast.pdf and b/ArboMAP_forecast.pdf differ