-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
21 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,8 @@ | ||
# Calibrate | ||
--- | ||
title: "Calibrate" | ||
subtitle: "Calibrate model fit, i.e., model selection" | ||
--- | ||
|
||
The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC). | ||
|
||
It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor. | ||
|
||
## Predict | ||
|
||
The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average). | ||
|
||
## Evaluate | ||
|
||
Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix). | ||
|
||
| | | | | | ||
|----------|--------------|---------------|----------------| | ||
| | | Predicted | | | ||
| | | 0 (absence) | 1 (presence) | | ||
| Observed | 0 (absence) | True absence | False presence | | ||
| | 1 (presence) | False absence | True presence | | ||
|
||
: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix} | ||
|
||
![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,9 @@ | ||
# Fit | ||
--- | ||
title: "Fit" | ||
subtitle: "Fit environmental relationship distinguishing presence from absence of species" | ||
--- | ||
|
||
Model fitting in theory is quite complex, but quite simple in practice, with feeding the prepared data into the modeling function. | ||
|
||
However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models | The caret Package](https://topepo.github.io/caret/available-models.html). | ||
However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models \| The caret Package](https://topepo.github.io/caret/available-models.html). | ||
|
||
## Calibrate | ||
|
||
The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC). | ||
|
||
It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor. | ||
|
||
## Predict | ||
|
||
The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average). | ||
|
||
## Evaluate | ||
|
||
Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix). | ||
|
||
| | | | | | ||
|----------|--------------|---------------|----------------| | ||
| | | Predicted | | | ||
| | | 0 (absence) | 1 (presence) | | ||
| Observed | 0 (absence) | True absence | False presence | | ||
| | 1 (presence) | False absence | True presence | | ||
|
||
: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix} | ||
|
||
![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,6 @@ | ||
# Predict | ||
--- | ||
title: "Predict" | ||
subtitle: "Predict distribution of the species with environmental relationship from fitted model" | ||
--- | ||
|
||
The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average). | ||
|
||
## Evaluate | ||
|
||
Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix). | ||
|
||
| | | | | | ||
|----------|--------------|---------------|----------------| | ||
| | | Predicted | | | ||
| | | 0 (absence) | 1 (presence) | | ||
| Observed | 0 (absence) | True absence | False presence | | ||
| | 1 (presence) | False absence | True presence | | ||
|
||
: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix} | ||
|
||
![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
# Split | ||
--- | ||
title: "Split" | ||
subtitle: "Split data into training (to fit) and test (to evaluate prediction)" | ||
--- | ||
|
||
Data is often split so that \~20% of the observations (presence and absence) are set aside from the model fitting to be used for model evaluation. | ||
|
||
The `k-fold` function is often used to split the data into k groups, and then the model is fit k times, each time using a different group as the test data and the remaining groups as the training data. | ||
|