cleanup Model/*.qmd w/ subtitles

marinebon · Nov 23, 2023 · dd99a32 · dd99a32
1 parent b13f652
commit dd99a32
Show file tree

Hide file tree

Showing 5 changed files with 21 additions and 65 deletions.
diff --git a/calibrate.qmd b/calibrate.qmd
@@ -1,24 +1,8 @@
-# Calibrate
+---
+title: "Calibrate"
+subtitle: "Calibrate model fit, i.e., model selection"
+---
 
 The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC).
 
 It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor.
-
-## Predict
-
-The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).
-
-## Evaluate
-
-Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).
-
-|          |              |               |                |
-|----------|--------------|---------------|----------------|
-|          |              | Predicted     |                |
-|          |              | 0 (absence)   | 1 (presence)   |
-| Observed | 0 (absence)  | True absence  | False presence |
-|          | 1 (presence) | False absence | True presence  |
-
-: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}
-
-![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
diff --git a/evaluate.qmd b/evaluate.qmd
@@ -1,4 +1,7 @@
-# Evaluate
+---
+title: "Evaluate"
+subtitle: "Evaluate performance of the predicted model with the test data"
+---
 
 Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).
 

diff --git a/fit.qmd b/fit.qmd
@@ -1,30 +1,9 @@
-# Fit
+---
+title: "Fit"
+subtitle: "Fit environmental relationship distinguishing presence from absence of species"
+---
 
 Model fitting in theory is quite complex, but quite simple in practice, with feeding the prepared data into the modeling function.
 
-However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models | The caret Package](https://topepo.github.io/caret/available-models.html). 
+However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models \| The caret Package](https://topepo.github.io/caret/available-models.html).
 
-## Calibrate
-
-The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC).
-
-It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor.
-
-## Predict
-
-The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).
-
-## Evaluate
-
-Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).
-
-|          |              |               |                |
-|----------|--------------|---------------|----------------|
-|          |              | Predicted     |                |
-|          |              | 0 (absence)   | 1 (presence)   |
-| Observed | 0 (absence)  | True absence  | False presence |
-|          | 1 (presence) | False absence | True presence  |
-
-: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}
-
-![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
diff --git a/predict.qmd b/predict.qmd
@@ -1,18 +1,6 @@
-# Predict
+---
+title: "Predict"
+subtitle: "Predict distribution of the species with environmental relationship from fitted model"
+---
 
 The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).
-
-## Evaluate
-
-Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).
-
-|          |              |               |                |
-|----------|--------------|---------------|----------------|
-|          |              | Predicted     |                |
-|          |              | 0 (absence)   | 1 (presence)   |
-| Observed | 0 (absence)  | True absence  | False presence |
-|          | 1 (presence) | False absence | True presence  |
-
-: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}
-
-![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
diff --git a/split.qmd b/split.qmd
@@ -1,6 +1,8 @@
-# Split
+---
+title: "Split"
+subtitle: "Split data into training (to fit) and test (to evaluate prediction)"
+---
 
 Data is often split so that \~20% of the observations (presence and absence) are set aside from the model fitting to be used for model evaluation.
 
 The `k-fold` function is often used to split the data into k groups, and then the model is fit k times, each time using a different group as the test data and the remaining groups as the training data.
-