Skip to content

Commit

Permalink
cleanup Model/*.qmd w/ subtitles
Browse files Browse the repository at this point in the history
  • Loading branch information
bbest committed Nov 23, 2023
1 parent b13f652 commit dd99a32
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 65 deletions.
24 changes: 4 additions & 20 deletions calibrate.qmd
Original file line number Diff line number Diff line change
@@ -1,24 +1,8 @@
# Calibrate
---
title: "Calibrate"
subtitle: "Calibrate model fit, i.e., model selection"
---

The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC).

It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor.

## Predict

The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).

## Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
5 changes: 4 additions & 1 deletion evaluate.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Evaluate
---
title: "Evaluate"
subtitle: "Evaluate performance of the predicted model with the test data"
---

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

Expand Down
31 changes: 5 additions & 26 deletions fit.qmd
Original file line number Diff line number Diff line change
@@ -1,30 +1,9 @@
# Fit
---
title: "Fit"
subtitle: "Fit environmental relationship distinguishing presence from absence of species"
---

Model fitting in theory is quite complex, but quite simple in practice, with feeding the prepared data into the modeling function.

However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models | The caret Package](https://topepo.github.io/caret/available-models.html).
However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models \| The caret Package](https://topepo.github.io/caret/available-models.html).

## Calibrate

The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC).

It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor.

## Predict

The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).

## Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
20 changes: 4 additions & 16 deletions predict.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,6 @@
# Predict
---
title: "Predict"
subtitle: "Predict distribution of the species with environmental relationship from fitted model"
---

The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).

## Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
6 changes: 4 additions & 2 deletions split.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Split
---
title: "Split"
subtitle: "Split data into training (to fit) and test (to evaluate prediction)"
---

Data is often split so that \~20% of the observations (presence and absence) are set aside from the model fitting to be used for model evaluation.

The `k-fold` function is often used to split the data into k groups, and then the model is fit k times, each time using a different group as the test data and the remaining groups as the training data.

0 comments on commit dd99a32

Please sign in to comment.