-
Notifications
You must be signed in to change notification settings - Fork 5
/
tidysdm-04-modeling.qmd
113 lines (87 loc) · 4.17 KB
/
tidysdm-04-modeling.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
title: "Modeling"
---
# Fit the model by cross-validation
```{r setup}
source("setup.R", echo = FALSE)
bb = get_bb(form = 'polygon')
coast = rnaturalearth::ne_coastline(scale = 'large', returnclass = 'sf') |>
sf::st_geometry()
preds = read_predictors(quick = TRUE)
model_input = read_obs("model_input")
mask = read_mask()
```
# Recipe and models workflow
A recipe provides the scaffolding for subsequent steps. Note that the output is not a plain vanilla [recipes](https://recipes.tidymodels.org/) recipe. Instead it is a special [tidysdm](https://evolecolgroup.github.io/tidysdm/articles/a0_tidysdm_overview.html) recipe that is spatially aware - hence the `coords` element. Also note that we only pass in the head of the `model_input` since the recipe only needs to know what the variable types are.
```{r recipe}
rec <- recipe(head(model_input),
formula = class ~ .)
rec
```
Here we leverage the recipe to build workflows. Note that the models specified are provided by the `tidysdm` package rather than the standard [parsnip](https://parsnip.tidymodels.org/) models. Also note that we are specifying a maxnet model, but the engine is [maxnet](https://github.com/mrmaxent/maxnet).
```{r models}
models <-
# create the workflow_set
workflow_set(
preproc = list(default = rec),
models = list(
# the standard glm specs
glm = tidysdm::sdm_spec_glm(),
# rf specs with tuning
rf = tidysdm::sdm_spec_rf(),
# boosted tree model (gbm) specs with tuning
gbm = tidysdm::sdm_spec_boost_tree(),
# maxent specs with tuning
maxent = tidysdm::sdm_spec_maxent()
),
# make all combinations of preproc and models,
cross = TRUE ) |>
# tweak controls to store information needed later to create the ensemble
option_add(control = tidysdm::control_ensemble_grid())
models
```
Above you can see the models are arranged in a table (with list-columns to hold complex data types.) Currently, this is the skeleton used to guide the tuning step (that comes soon). Once we have tuned the models the `info`, `option` and `result` variables in `model` will be populated; for now the exist but are unpopulated. Also note that we still have not fed the model a complete dataset.
# Cross folding
Before feed the data to the models we divide it into data set samples. We set up a spatial cross validation with three folds three folds (groups) split across a 5x5 sampling matrix. The idea behind foldings is to present different sets of data to develop the model, then we can look at the performance mean and variability.
```{r cv_block}
set.seed(100)
input_cv <- spatial_block_cv(data = model_input, v = 3, n = 5)
autoplot(input_cv)
```
# Tuning
Now we can tune the models using the fold-data. The following is a tuning excercise applied to each fold of the sample grouping.
```{r tune_models}
set.seed(1234567)
models <- models |>
workflow_map("tune_grid",
resamples = input_cv,
grid = 3,
metrics = tidysdm::sdm_metric_set(),
verbose = TRUE )
models
```
We can generate assessment plots based upon typical model metrics.
```{r model_autoplot}
autoplot(models)
```
At this point you would rightly be wondering about these models. In our traditional approach we selected one model type, `maxnet`, and ran with that alone. Here we are trying 4 types of models, and we can see that they perform differently. But the breathtaking speed with which we can get to this step is a hallmark of [tidymodels](https://www.tidymodels.org/).
# Ensembles
Currently these models are independent of each other, but we can form them into an ensemble.
```{r ensemble}
ensemble = simple_ensemble() |>
add_member(models, metric = "boyce_cont")
ensemble
```
```{r autoplot_ensemble}
autoplot(ensemble)
```
```{r table_ensemble}
ensemble |>
collect_metrics()
```
# Saving models and ensembles
You can save a model or an ensemble of models using the `write_model()`, `read_model()`, `write_ensemble()` and `read_ensemble()` functions. These are wrapper functions that will hide the details, but allow you to save and restoire portable files.
```{r}
ok = dir.create("data/model/tidysdm", recursive = TRUE)
write_ensemble(ensemble, "data/model/tidysdm/v1_ensemble.rds")
```