Skip to content

Commit

Permalink
detail Prepare/Model/Combine sections
Browse files Browse the repository at this point in the history
  • Loading branch information
bbest committed Nov 23, 2023
1 parent ed174c0 commit b13f652
Show file tree
Hide file tree
Showing 19 changed files with 324 additions and 12 deletions.
19 changes: 15 additions & 4 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,26 @@ book:
Built with <a href="https://quarto.org/" target="_blank">Quarto</a>
chapters:
- index.qmd
- part: create.qmd
- part: "Prepare"
chapters:
- prep.qmd
- occ.qmd
- abs.qmd
- env.qmd
- part: "Model"
chapters:
- prep-data.qmd
- model.qmd
- part: combine.qmd
- split.qmd
- fit.qmd
- calibrate.qmd
- predict.qmd
- evaluate.qmd
- part: "Combine"
chapters:
- combine.qmd
- ensemble.qmd
- mosaic.qmd
- group.qmd
- taxa.qmd
- indicators.qmd
- software.qmd
- organize.qmd
Expand Down
21 changes: 21 additions & 0 deletions abs.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Pseudo-absences"
subtitle: "Generate pseudo-absence or background environmental values to compare with occurrence environment"
---

Describe various strategies for generating pseudo-absences.

- [Pseudo-absences • biomod2](https://biomodhub.github.io/biomod2/articles/vignette_pseudoAbsences.html)
- [@barbet-massin2012]

## All background

A common Maxent strategy is to feed all background points into Maxent, and then to use the resulting distribution as a null model. This is the default strategy in Maxent [@phillips2017; @phillips2006; @phillips2008].

## Mask by FAO areas

The FAO areas applicable to species are included in the `aquamapsdata`, presumably from evaluating OBIS observations and the literature.

## Use occurrences from same Family, different species

By using the same family, we can be sure that the pseudo-absences are ecologically similar to the species of interest.
24 changes: 24 additions & 0 deletions calibrate.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Calibrate

The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC).

It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor.

## Predict

The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).

## Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
3 changes: 2 additions & 1 deletion combine.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Combine SDMs"
title: "Combine"
subtitle: "Combine SDMs from the same or multiple species"
---

We look at combining SDMs to calculate biodiversity based on addressing questions of interest and relevance.
45 changes: 45 additions & 0 deletions create.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,48 @@
%%| fig-cap: "Diagram of SDM data preparation and model fitting."
%%| file: diagrams/sdm-process.mmd
```

# Prepare Data

```{mermaid}
%%| label: fig-prep
%%| fig-cap: "Diagram of SDM data preparation for model fitting."
%%| file: diagrams/sdm-prep.mmd
```

- **obs\
**observations: occurrences from OBIS; masked by FAO regions defined by AquaMaps [@aquamapsdata]
- **presence**\
OBIS: species occurrence
- **absence**\
OBIS not-species, but same family
- **env\
**environment
- **tbl**\
table of observations (presence and absence) with environmental values

## Environmental Predictors

### Physiographic

- `depth`\
Bathymetric Depth

- `d2coast`\
Distance to Coast

- `d2shelf`\
Distance to Shelf

### Time Varying

- `vgpm`\
Vertically integrated primary Productivity model

### Depth & Time Varying

- `temp`\
Temperature, either sea-surface temperature (SST) or some modeled product from HyCOM, ROMS or Copernicus

- `salin`\
Salinity
30 changes: 30 additions & 0 deletions env.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: "Environment"
subtitle: "Extract environmental predictors (static and/or dynamic) from various sources for observations (presence and pseudo-absence)"
---

These data are also used at the prediction step.

### Physiographic

- `depth`\
Bathymetric Depth

- `d2coast`\
Distance to Coast

- `d2shelf`\
Distance to Shelf

### Time Varying

- `vgpm`\
Vertically integrated primary Productivity model

### Depth & Time Varying

- `temp`\
Temperature, either sea-surface temperature (SST) or some modeled product from HyCOM, ROMS or Copernicus

- `salin`\
Salinity
14 changes: 14 additions & 0 deletions evaluate.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
1 change: 1 addition & 0 deletions explorations/sdm-1_predicts.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ url-code: "https://github.com/marinebon/sdm-explore/blob/main/sdm_1.qmd"
categories:
- "data: OBIS"
- "tech: R"
- "model: Maxent"
editor: source
---

Expand Down
Binary file added figures/rocr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions fit.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Fit

Model fitting in theory is quite complex, but quite simple in practice, with feeding the prepared data into the modeling function.

However there are MANY modeling techniques from which to choose. For instance check out 238 entries in [6 Available Models | The caret Package](https://topepo.github.io/caret/available-models.html).

## Calibrate

The process of refining the model to only the most relevant environmental predictor terms is commonly called "Model Selection." One of the most cited scientific paper of all time [@akaike1974] is based on taking a most parsimonious approach to this process -- the so called Akaike Information Criteria (AIC).

It is important to avoid using environmental predictors that are correlated with each other, since the effect of a predictor on the response could be the ecologically inverse, the result of explaining variance on the residuals of the other correlated predictor.

## Predict

The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).

## Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
15 changes: 13 additions & 2 deletions index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,14 @@ By definition `r glossary("MBON")` is a network, so this is inclusive of and mea

- The world is quickly moving towards a future trying to conserve 30% of the oceans by 2030, so called "[**30 by 30**](https://en.wikipedia.org/wiki/30_by_30)". In the U.S., this is [America the Beautiful](https://www.noaa.gov/america-the-beautiful) initiative. We need biodiversity indicators to track progress. This push for conservation is driven by increasing impacts of **climate change**, as evidenced by marine heatwaves and shifts in population distributions.

## Process

```{mermaid}
%%| label: fig-process
%%| fig-cap: "Diagram of SDM data preparation and model fitting."
%%| file: diagrams/sdm-process.mmd
```

## Contribute

We very much welcome your feedback, contributions and collaboration. Here are a few ways from least to most involved:
Expand All @@ -84,15 +92,18 @@ We very much welcome your feedback, contributions and collaboration. Here are a

4. If you are a regular contributor, you can be added to the collaborators of this repository to push changes directly (without needing a pull request).

## Features of this Book
## Features

This Quarto book has a few cool features:

- Multiple formats\
From the singe set of source Quarto documents (\*.qmd), several output formats are rendered: html, pdf, docx. This is particularly helpful when suggesting changes. It also lends itself well to being carved into manuscripts.

- Self-rendering\
Github hosts the web pages (\*.html), which get rendered from the source code (\*.qmd) using a Github Action. So edits can be made simply through the web interface and all outputs get updated (html, pdf, docx). It also ensures the reproducibility of the document with a common setup environment.

- Mermaid diagrams
- Mermaid diagrams\
e.g., @fig-process, @fig-prep, @fig-model

- Quarto document listings

Expand Down
3 changes: 2 additions & 1 deletion indicators.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Indicators"
subtitle: "Calculate indicators of ecological or management interest beyond taxonomic groupings"
---

## Diversity
Expand All @@ -13,7 +14,7 @@ Here are the classic diversity indices from the R package `vegan`:
> D_2 &= \frac{1}{\sum_{i=1}^S p_i^2} &\text{inverse Simpson}
> \end{aligned}
> $$
>
>
> where $p_i$ is the proportion of species $i$, and $S$ is the number of species so that $\sum_{i=1}^S p_i = 1$, and $b$ is the base of the logarithm.
## Endemism
Expand Down
6 changes: 4 additions & 2 deletions model.qmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Model
---
title: "Model"
subtitle: "Model the distribution of a species"
---

```{mermaid}
%%| label: fig-model
%%| fig-cap: "Diagram of SDM Modeling processes."
%%| file: diagrams/sdm-model.mmd
```

21 changes: 21 additions & 0 deletions occ.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Occurrences"
subtitle: "Fetch presence observations and filter for quality control"
---

To describe:

- `robis`

- Filter based on quality flags

- Remove outliers

- [`eks`](https://cran.r-project.org/web/packages/eks/vignettes/tidysf_kde.html)\
*Tidy and Geospatial Kernel Smoothing for spatially filtering outlier observations*

![Source: Kernel density estimates for tidy and geospatial data in the eks package](figures/software/eks.png){#fig-eks}

## Fetch OBIS

## Filter occurrences
18 changes: 18 additions & 0 deletions predict.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Predict

The prediction step applies the environmental relationships from the fitted model to a new set of data, typically the seascape of interest, and perhaps with some sort of temporal snapshot (e.g., climatic annual or monthly average).

## Evaluate

Model evaluation uses the set aside test data from the earlier splitting to evaluate how well the model predicts the response of presence or absence. Since the test response data is binary \[0,1\] and the prediction from the model is continuous \[0-1\], a threshold needs to be applied to assign to convert the continuous response to binary. This is often performed through a Receiver Operator Characteristic (**ROC**) curve (@fig-rocr), which evaluates at each threshold the **confusion matrix** (@tbl-confusion-matrix).

| | | | |
|----------|--------------|---------------|----------------|
| | | Predicted | |
| | | 0 (absence) | 1 (presence) |
| Observed | 0 (absence) | True absence | False presence |
| | 1 (presence) | False absence | True presence |

: Confusion matrix to understand predicted versus observed. {#tbl-confusion-matrix}

![ROC curve generated by showing rates of false positive vs false negative as function of changing the threshold value (rainbow colors). Source: [ROCR: visualizing classifier performance in R](https://cran.rstudio.com/web/packages/ROCR/vignettes/ROCR.html)](figures/rocr.png){#fig-rocr}
7 changes: 6 additions & 1 deletion prep-data.qmd → prep.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Prepare Data
---
title: "Prepare"
subtitle: "Prepare observations and environmental data for modeling"
---

# Prepare

```{mermaid}
%%| label: fig-prep
Expand Down
Loading

0 comments on commit b13f652

Please sign in to comment.