tidysdm-04-modeling.qmd

---
title: "Modeling"
---


# Fit the model by cross-validation


```{r setup}
source("setup.R", echo = FALSE)
bb = get_bb(form = 'polygon')
coast = rnaturalearth::ne_coastline(scale = 'large', returnclass = 'sf') |>
  sf::st_geometry()
preds = read_predictors(quick = TRUE)
model_input = read_obs("model_input")
mask = read_mask()
```

# Recipe and models workflow

A recipe provides the scaffolding for subsequent steps.  Note that the output is not a plain vanilla [recipes](https://recipes.tidymodels.org/) recipe.  Instead it is a special [tidysdm](https://evolecolgroup.github.io/tidysdm/articles/a0_tidysdm_overview.html) recipe that is spatially aware - hence the `coords` element.  Also note that we only pass in the head of the `model_input` since the recipe only needs to know what the variable types are.

```{r recipe}
rec <- recipe(head(model_input),
              formula = class ~ .)
rec
```

Here we leverage the recipe to build workflows. Note that the models specified are provided by the `tidysdm` package rather than the standard [parsnip](https://parsnip.tidymodels.org/) models.  Also note that we are specifying a maxnet model, but the engine is [maxnet](https://github.com/mrmaxent/maxnet).

```{r models}
models <-
  # create the workflow_set
  workflow_set(
    preproc = list(default = rec),
    models = list(
      # the standard glm specs
      glm = tidysdm::sdm_spec_glm(),
      # rf specs with tuning
      rf = tidysdm::sdm_spec_rf(),
      # boosted tree model (gbm) specs with tuning
      gbm = tidysdm::sdm_spec_boost_tree(),
      # maxent specs with tuning
      maxent = tidysdm::sdm_spec_maxent()
    ),
    # make all combinations of preproc and models,
    cross = TRUE  ) |>
  # tweak controls to store information needed later to create the ensemble
  option_add(control = tidysdm::control_ensemble_grid())
models
```

Above you can see the models are arranged in a table (with list-columns to hold complex data types.) Currently, this is the skeleton used to guide the tuning step (that comes soon). Once we have tuned the models the `info`, `option` and `result` variables in `model` will be populated; for now the exist but are unpopulated. Also note that we still have not fed the model a complete dataset.

# Cross folding 

Before feed the data to the models we divide it into data set samples. We set up a spatial cross validation with three folds three folds (groups) split across a 5x5 sampling matrix.  The idea behind foldings is to present different sets of data to develop the model, then we can look at the performance mean and variability.

```{r cv_block}
set.seed(100)
input_cv <- spatial_block_cv(data = model_input, v = 3, n = 5)
autoplot(input_cv)
```

# Tuning

Now we can tune the models using the fold-data. The following is a tuning excercise applied to each fold of the sample grouping.

```{r tune_models}
set.seed(1234567)
models <-  models |>
  workflow_map("tune_grid",
    resamples = input_cv, 
    grid = 3,
    metrics = tidysdm::sdm_metric_set(), 
    verbose = TRUE )
models
```

We can generate assessment plots based upon typical model metrics.

```{r model_autoplot}
autoplot(models)
```

At this point you would rightly be wondering about these models.  In our traditional approach we selected one model type, `maxnet`, and ran with that alone.  Here we are trying 4 types of models, and we can see that they perform differently. But the breathtaking speed with which we can get to this step is a hallmark of [tidymodels](https://www.tidymodels.org/). 

# Ensembles

Currently these models are independent of each other, but we can form them into an ensemble.

```{r ensemble}
ensemble = simple_ensemble() |>
  add_member(models, metric = "boyce_cont")
ensemble
```
```{r autoplot_ensemble}
autoplot(ensemble)
```

```{r table_ensemble}
ensemble |> 
  collect_metrics()
```

# Saving models and ensembles

You can save a model or an ensemble of models using the `write_model()`, `read_model()`, `write_ensemble()` and `read_ensemble()` functions.  These are wrapper functions that will hide the details, but allow you to save and restoire portable files.

```{r}
ok = dir.create("data/model/tidysdm", recursive = TRUE)
write_ensemble(ensemble, "data/model/tidysdm/v1_ensemble.rds")
```