Skip to content

Commit

Permalink
add modeling-02
Browse files Browse the repository at this point in the history
  • Loading branch information
btupper committed Oct 10, 2023
1 parent f52baa0 commit c87f51c
Show file tree
Hide file tree
Showing 34 changed files with 3,884 additions and 28 deletions.
6 changes: 5 additions & 1 deletion covariates.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ We need to create a random sample of background in both time and space.

### Sampling time

Sampling time requires us to consider that the occurrences are not even distributed through time. We can see that using a histogram of observation dates by month.
Sampling time requires us to consider that the occurrences are not evenly distributed through time. We can see that using a histogram of observation dates by month.

```{r}
H = hist(obs$date, breaks = 'month', format = "%Y",
Expand Down Expand Up @@ -120,6 +120,10 @@ days_sample = sample(days, size = nback, replace = TRUE, prob = day_probs)

So, now we have a sampling of of dates that have a temporal distribution similar to that of the observations.

:::{.callout-warning}
It is possible that we maybe [overfitting](https://en.wikipedia.org/wiki/Overfitting) by weighting the samples in time. Other time-sampling strategies are available to us, so we are not stuck with the approach we are using and can easily revisit this selection.
:::

### Sampling space

The [sf](https://CRAN.R-project.org/package=sf) package provides a function, `st_sample()`, for sampling points within a polygon. But what polygon? We have choices as we could use (a) a bounding box around the observations, (b) a convex hull around the observations or (c) a buffered envelope around the observations. Each has it's advantages and disadvantages. We show how to make one of each.
Expand Down
Binary file added data/model/v2/v2.Apr.rds
Binary file not shown.
Binary file added data/model/v2/v2.Aug.rds
Binary file not shown.
Binary file added data/model/v2/v2.Dec.rds
Binary file not shown.
Binary file added data/model/v2/v2.Feb.rds
Binary file not shown.
Binary file added data/model/v2/v2.Jan.rds
Binary file not shown.
Binary file added data/model/v2/v2.Jul.rds
Binary file not shown.
Binary file added data/model/v2/v2.Jun.rds
Binary file not shown.
Binary file added data/model/v2/v2.Mar.rds
Binary file not shown.
Binary file added data/model/v2/v2.May.rds
Binary file not shown.
Binary file added data/model/v2/v2.Nov.rds
Binary file not shown.
Binary file added data/model/v2/v2.Oct.rds
Binary file not shown.
Binary file added data/model/v2/v2.Sep.rds
Binary file not shown.
15 changes: 14 additions & 1 deletion docs/covariates.html
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ <h2 data-number="3" class="anchored" data-anchor-id="sampling-background-data"><
<p>We need to create a random sample of background in both time and space.</p>
<section id="sampling-time" class="level3" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="sampling-time"><span class="header-section-number">3.1</span> Sampling time</h3>
<p>Sampling time requires us to consider that the occurrences are not even distributed through time. We can see that using a histogram of observation dates by month.</p>
<p>Sampling time requires us to consider that the occurrences are not evenly distributed through time. We can see that using a histogram of observation dates by month.</p>
<div class="cell" data-hash="covariates_cache/html/unnamed-chunk-3_6c57817ed29ec1b0d03b0ce25e6fe886">
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>H <span class="ot">=</span> <span class="fu">hist</span>(obs<span class="sc">$</span>date, <span class="at">breaks =</span> <span class="st">'month'</span>, <span class="at">format =</span> <span class="st">"%Y"</span>, </span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="at">freq =</span> <span class="cn">TRUE</span>, <span class="at">main =</span> <span class="st">"Observations"</span>,</span>
Expand Down Expand Up @@ -363,6 +363,19 @@ <h3 data-number="3.1" class="anchored" data-anchor-id="sampling-time"><span clas
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>days_sample <span class="ot">=</span> <span class="fu">sample</span>(days, <span class="at">size =</span> nback, <span class="at">replace =</span> <span class="cn">TRUE</span>, <span class="at">prob =</span> day_probs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>So, now we have a sampling of of dates that have a temporal distribution similar to that of the observations.</p>
<div class="callout callout-style-default callout-warning callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Warning
</div>
</div>
<div class="callout-body-container callout-body">
<p>It is possible that we maybe <a href="https://en.wikipedia.org/wiki/Overfitting">overfitting</a> by weighting the samples in time. Other time-sampling strategies are available to us, so we are not stuck with the approach we are using and can easily revisit this selection.</p>
</div>
</div>
</section>
<section id="sampling-space" class="level3" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="sampling-space"><span class="header-section-number">3.2</span> Sampling space</h3>
Expand Down
20 changes: 16 additions & 4 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -332,9 +332,22 @@ <h3 data-number="3.4" class="anchored" data-anchor-id="data-storage"><span class
│ ├── bkg-covariates.gpkg
│ └── buffered-polygon.gpkg
├── model
│ └── v1
│ └── v1.0
│ └── model_v1.0.rds
│ ├── v1
│ │ └── v1.0
│ │ └── model_v1.0.rds
│ └── v2
│ ├── v2.Apr.rds
│ ├── v2.Aug.rds
│ ├── v2.Dec.rds
│ ├── v2.Feb.rds
│ ├── v2.Jan.rds
│ ├── v2.Jul.rds
│ ├── v2.Jun.rds
│ ├── v2.Mar.rds
│ ├── v2.May.rds
│ ├── v2.Nov.rds
│ ├── v2.Oct.rds
│ └── v2.Sep.rds
├── nbs
│ ├── 2000
│ │ ├── 0101
Expand Down Expand Up @@ -1472,7 +1485,6 @@ <h3 data-number="3.4" class="anchored" data-anchor-id="data-storage"><span class
│ │ ├── uvcomp_SCIMonthlyGlobal.v_wind.2023-07-01.tif
│ │ └── uvcomp_SCIMonthlyGlobal.windspeed.2023-07-01.tif
│ └── database.csv.gz
├── nbs2
├── obis
│ └── Mola_mola.gpkg
├── obs
Expand Down
2 changes: 1 addition & 1 deletion docs/modeling-01.html
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ <h2 data-number="5" class="anchored" data-anchor-id="save-the-model"><span class
<p>In this tutorial we are going to build three types of models: basic, monthly and split (between testing and training). We should organize the storage of the models in a way that makes sense. With each model we may generate one or more predictions - for example, for our basic model we might tyr to hind-cast a individual years. That’s a one-to-many to relationship between model and predictions. We suggest that you start considering each model a version and store them accordingly. Let’s use a simple numbering scheme…</p>
<ul>
<li><code>v1.0</code> for the basic model</li>
<li><code>v2.01, v2.01, ..., v2.12</code> for the monthly models</li>
<li><code>v2.jan, v2.feb, ..., v2.dec</code> for the monthly models</li>
<li><code>v3.0, ...</code> for for the split model(s)</li>
</ul>
<p>The <a href="https://github.com/BigelowLab/maxnetic">maxnetic</a> provides some convenience functions for working with maxnet models including file storage functions.</p>
Expand Down
Loading

0 comments on commit c87f51c

Please sign in to comment.