05-SDMFittingandProjecting.Rmd

# Species Distribution Model Fitting and Projecting

**About:**

This stage of the workflow is where we fit the seasonal VAST model. After fitting the model, we then make statistical inferences and use the fitted model to project species distribution and abundance changes under future environmental conditions expected from the CMIP6 global climate models. The code for this stage is found within the [TargetsSDM GitHub repository](https://github.com/aallyn/TargetsSDM) and the [vast_functions.R](https://github.com/aallyn/TargetsSDM/blob/main/R/vast_functions.R) script. Importantly, to leverage these functions, you will want to have defined a few core objects, including: a habitat formula governing the species environmental covariates relationship, the field configuration settings to determine whether spatial or spatio-temporal variability will be turned on, and the rho configuration setting that define if there is an autoregressive structure on the intercepts or the spatio-temporal variability component. You can see [how we have done this](https://github.com/aallyn/TargetsSDM/blob/main/_targets.R) at the beginning of our code to implement the workflow using the [R Targets](https://github.com/ropensci/targets) functionality. 


## Steps

This is one of the more complicated stages of the workflow and has a variety of different steps. This complexity arises for two reasons. First, there are the complexities related to integrating data from two surveys and fitting a seasonal VAST model that includes habitat covariates, catchability covariates, persistent spatial variability, ephemeral spatio-temporal variability, and potential temporal correlations in species occurrence across the study domain in successive seasons. Second, we tried to break down the entire modeling cycle into incremental steps.  

1. VAST objects. There are a variety of objects that we create which define how the VAST model is constructed.

     1a. Make VAST extrapolation grid. We create a user-specific extrapolation grid that encompasses the survey domain for both the NOAA NEFSC bottom trawl survey and the DFO bottom trawl survey. We do this by providing a shapefile that covers this domain and then use our `vast_make_extrap_grid` function.
     
     1b. Make VAST settings. We use our `vast_make_settings` function that leverages `FishStatsUtils::make_settings` and has a bit more flexibility to accommodate our own extrapolation grid while also requiring us to be a bit more explicit about the setting passed to the VAST modeling engine. For example, we could specify `purpose = "index2"` within the a `FishStatsUtils::make_settings` call or a `FishStatsUtils::fit_model` call and this would trigger a specific model configuration for spatial (omega), spatio-temporal variability (epsilon) and other model parameters. Rather than using those defaults, we define these before hand then pass them into our `vast_make_settings` function as shown [here](https://github.com/aallyn/TargetsSDM/blob/main/_targets.R).
     
     1c. Make VAST spatial lists. After creating the extrapolation grid and then the settings, we create a "spatial list" object  with the `vast_make_spatial_lists` function. This function is a warapper around the `FishStatsUtils::make_extrapolation_info` and `FishStatsUtils::make_spatial_info` functions. We use it to generate the extrapolation information and the spatial information, which includes information about the INLA mesh and its relationship to the observations and extrapolation grid cells.
     
     1d. Make VAST covariate effect list. With the VAST seasonal model, we need to make some adjustment to how "season" (and potentially "year") covariates are modeled. In particular, we specify that these are going to be estimated as spatially-varying coefficients with corner constraints to ensure estimability. We use the `vast_make_coveff` function to do this. 

2. VAST dataframes. Before fitting the seasonal VAST model, we generate a few dataframes that are then passed to the main VAST model fitting function.
     
     2a. Make VAST seasonal dataframe. The standard behavior of a single species VAST species distribution model assumes that we have observations that occur annually. In turn, before fitting the VAST seasonal model, we need to do some reformatting. Specifically, we need to create a new vector that we can use as the "year" vector, but, that actually represents the season-year increments. We also want to make sure that there is a dummy observation for every season-year of interest, even those where we may not have survey data. To accomplish both of these goals, we use the `vast_make_seasonal_data` function. We also encourage interested people to look at the [Wiki example in the VAST GitHub repository](https://github.com/James-Thorson-NOAA/VAST/wiki/Seasonal-model) for additional details.
     
     2b. Make VAST sample dataframe. After creating the VAST seasonal dataframe, we then subset it into three different dataframes. The first is a sample dataframe, which includes the biological catch data information. This is accomplished using the `make_vast_sample_data` function.
     
     2c. Make VAST covariate dataframe. This is the second dataframe we create from the VAST seasonal dataframe, and includes the information for habitat covariates at each of the tow locations. We make this dataframe using the `make_vast_covariate_data` function.
     
     2d. Make VAST catchability dataframe. The final dataframe we create is a catchability dataframe, this includes information for the survey that each observation was collected. 

3. Fitting the VAST seasonal model. To fit the VAST seasonal model, we first fit a base model, setting `run_model = FALSE` and then we make some adjustments to accommodate the seasonal structure. 

     3a. Fitting VAST base model. With the dataframes, the extrapolation grid, and the model settings created and defined, we fit a base VAST model. While you could certainly do this with the `VAST::fit_model` function, we use a function we wrote, `vast_build_sdm`. Along with fitting nicely within our workflow and the objects we have created, this function also allows us to use sf multipolygon shapefiles to specify different regions/strata.
     
     3b. Making adjustments to accommodate the seasonal model. After fitting the base model, we make some modifications to facilitate fitting the seasonal VAST model. Specifically, this requires adjusting the mapping the variance components for the season and year terms so that they are pooled and that we are not estimating an individual variance for each level of season and year. We so this using the `vast_make_adjustments` function.  
     
     3c. Fitting VAST seasonal model. With the adjustments made, we then use the `vast_fit_sdm` function to fit the model. When doing this, we pass in the adjusted `fit_model` object with the correct parameter mapping. 

4. Making inferences from the fitted VAST seasonal model

     4a. Validating predictive skill of the model. One of the first things we want to do after fitting the model is assess its reliability. This can include evaluating the model fit to the data and with more recent versions of VAST, [we can get the deviance explained from the model](https://github.com/James-Thorson-NOAA/VAST/wiki/Percent-deviance-explained), which is particularly helpful when comparing different candidate model structures. Along with evaluating the model, we are especially interested in validating the predictive skill using a hold out, testing dataset. We open up this opportunity during the beginning of the modeling process by toggling the "Pred_TF" indicator for specific observations, so that they are used in the predictive component of the model and not used in the maximum likelihood estimation. We then uss our `vast_get_point_preds` to extract the model predictions for these "hold out" observations. We can then calculate any number of prediction skill statistics (e.g., AUC, RMSE, etc). To summarize multiple components of model prediction skill, we use our `taylor_diagram_func` function to generate and plot Taylor Diagrams (Taylor 2001). 
     
     4b. Visualizing covariate effects. After checking the model for its fit to the data and predictive skill to holdout, testing data, we might want to visualize the fitted smooth functions defining the relationship between species occurrence and the environmental variables. To do that, we can run our `get_vast_covariate_effects` and `plot_vast_covariate_effects` functions. 
     
     4b. Making projections. The final action in this stage is using the model to make projections using the future environmental conditions characterized by the ensemble of CMIP6 SSP5 8.5 scenario models. To do this, we collect the projected environmental data for each of the covariates using the `vast_post_fit_pred_df` function. We then use the `project.fit_model_wrapper` function to make the projections. This function was designed to mimic [VAST's `project.fit_model` function](https://github.com/James-Thorson-NOAA/VAST/wiki/Projections), with some adjustments to accomodate the seasonal model structure. 
  
## Output

The output from this stage includes the model fitted object, and then some results associated with inferences we hope to make from the fitted model -- including marginal effect plots of fitted covariate smooth functions, maps of predicted density, and time series of total estimated biomass within spatial regions of interest. Most relevant to our specific project goals, one of the outputs from this stage is the projected species density at grid locations within the DFO/NOAA NEFSC spatial domain for fall-summer-spring seasons from 1985-2100. 

## Next stages

The output from this stage is then summarized and used to produce the data and visualizations for the [FishViz RShiny application][Results Visualization and Communication].