This is version “v0.2”, which switches from use of OPeNDAP to PODAAC’s convenience apps. In addition to opening NetCDF connections, version “v0.1” (now defunct) demonstrated how you could extract point or small polygons form the NetCDF. This new package is only about fetching and reading data from PODAAC, not about extracting points or polygons from the data. To learn more about extraction, see the sf and stars tutorials.
Access GHRSST (aka
MUR)
data and manage data from R. PODAAC
provides two nice command line tools (podaac-data-downloader
and
podaac-data-subscriber
). Our focus here is on
podaac-data-downloader
.
Currently podaac-data-downloader
only downloads the global
file,
but there are plans afoot to add a --subset
argument so only portions
of the global raster are transferred. Until then we download the entire
globe, but then provide tools for subsetting.
We made a short video explaining this packages (while still under development), it’s a decent but informal walk through.
Application from PODAAC
Install
data-subscriber
from PODAAC. Be sure to set your credentials.
Note we have had good luck installing
pipx rather than
pip as pipx
will handle installation
into environments.
remotes::install_github("BigelowLab/ghrsst", ref = "data-subscriber")
suppressPackageStartupMessages({
library(rnaturalearth)
library(dplyr)
library(sf)
library(ghrsst)
library(stars)
})
Data files are downloaded to a default data path; you can override this as needed. It’s convenient to set it once and then forget it. Below is an example, but you should adjust the path to suit your own needs.
path = "/Users/ben/Library/CloudStorage/Dropbox/data/ghrsst"
ghrsst::make_path(path)
ghrsst::set_root_path(path)
Downloading is mostly about selecting the start and end dates. Accepting
the defaults, files will be downloaded to ghrsst::ghrsst_path("tmp")
directory, which is not quite the same as a temporary directory. You’ll
want to clean out old files on a regular basis and we provide a tool to
help with that (see below).
ok = ghrsst::podaac_downloader(start_date = as.Date("2020-02-01"), end_date = "2020-02-03")
print(ok)
## [1] 0
ff = ghrsst::podaac_list() |>
print()
## [1] "/Users/ben/Library/CloudStorage/Dropbox/data/ghrsst/tmp/20200201090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
## [2] "/Users/ben/Library/CloudStorage/Dropbox/data/ghrsst/tmp/20200202090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
## [3] "/Users/ben/Library/CloudStorage/Dropbox/data/ghrsst/tmp/20200203090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
## [4] "/Users/ben/Library/CloudStorage/Dropbox/data/ghrsst/tmp/20200204090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"
At this point you don’t really need this package as you could read the
downloaded file(s) with your favorite NetCDF reader:
ncdf4,
RNetCDF,
stars,
terra, or
tidync; any will do nicely.
We provide a wrapper around
tidync to read into stars
objects.
namibia = get_namibia()
s = read_podaac(ff, bb = namibia, var = c("analysed_sst", "sst_anomaly"))
s
## stars object with 3 dimensions and 2 attributes
## attribute(s), summary of first 1e+05 cells:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## analysed_sst 289.851 293.653 294.663 294.2342406 295.038 296.241 2913
## sst_anomaly -2.020 -0.515 -0.278 -0.2617328 0.009 0.974 2913
## dimension(s):
## from to offset delta refsys point x/y
## x 1 1701 0 0.01 WGS 84 FALSE [x]
## y 1 1301 -29 0.01 WGS 84 FALSE [y]
## time 1 4 2020-02-01 1 days Date NA
coast = rnaturalearth::ne_coastline(scale = "medium", returnclass = "sf") |>
sf::st_geometry() |>
sf::st_crop(namibia)
extra_plot = function(...){
plot(coast, col = "orange", lwd = 2, add = TRUE)
}
plot(s['analysed_sst'], hook = extra_plot)
plot(s['sst_anomaly'], hook = extra_plot)
There are many reasons to want to keep your subset of data. We provide minimalist tools to doing so, and allowing you to read back data to suit your needs.
We archive into a designated directory so that we can recover the files later. Since they are subsets they are relatively light to restore at some later date. We build a metadata database, in the form or a data frame, that helps us keep track of what we have downloaded and archived.
path = ghrsst::ghrsst_path("namibia")
newdb = lapply(ff,
function(filename){
ghrsst::read_podaac(filename, bb = namibia,
var = c("analysed_sst", "sst_anomaly")) |>
ghrsst::archive_podaac(filename, path = path)
}) |>
dplyr::bind_rows() |>
dplyr::glimpse()
## Rows: 8
## Columns: 12
## $ date <date> 2020-02-01, 2020-02-01, 2020-02-02, 2020-02-02, 2020-02-…
## $ year <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020
## $ month <dbl> 2, 2, 2, 2, 2, 2, 2, 2
## $ time <chr> "090000", "090000", "090000", "090000", "090000", "090000…
## $ rdac <chr> "JPL", "JPL", "JPL", "JPL", "JPL", "JPL", "JPL", "JPL"
## $ level <chr> "L4_GHRSST", "L4_GHRSST", "L4_GHRSST", "L4_GHRSST", "L4_G…
## $ type <chr> "SSTfnd", "SSTfnd", "SSTfnd", "SSTfnd", "SSTfnd", "SSTfnd…
## $ product <chr> "MUR", "MUR", "MUR", "MUR", "MUR", "MUR", "MUR", "MUR"
## $ reg <chr> "GLOB", "GLOB", "GLOB", "GLOB", "GLOB", "GLOB", "GLOB", "…
## $ gds_version <chr> "v02.0", "v02.0", "v02.0", "v02.0", "v02.0", "v02.0", "v0…
## $ file_version <chr> "fv04.1", "fv04.1", "fv04.1", "fv04.1", "fv04.1", "fv04.1…
## $ var <chr> "analysed_sst", "sst_anomaly", "analysed_sst", "sst_anoma…
The database is a very simple decomposition of the original file name, plus the added variable name. Each selected variable from each file has been written to the path as a TIFF file.
We write the database to a file. Note that this makes sense for a brand
new database, but if you are adding to an existing database you’ll want
to checkout append_database()
. But for this tutorial we can simply
write the new file.
newdb = ghrsst::write_database(newdb, path)
Now we can read the database back (pretend you paused for lunch).
path = ghrsst::ghrsst_path("namibia")
db = read_database(path) |>
print()
## # A tibble: 8 × 12
## date year month time rdac level type product reg gds_version
## <date> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2020-02-01 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 2 2020-02-01 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 3 2020-02-02 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 4 2020-02-02 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 5 2020-02-03 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 6 2020-02-03 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 7 2020-02-04 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## 8 2020-02-04 2020 2 090000 JPL L4_GHRSST SSTfnd MUR GLOB v02.0
## # ℹ 2 more variables: file_version <chr>, var <chr>
Now filter the files for just the one(s) you want, let’s say Feb 1 and 2 for sst_anomaly.
x = db |>
dplyr::filter(date <= as.Date("2020-02-02"), var == "sst_anomaly") |>
ghrsst::read_ghrsst(path = path)
x
## stars object with 3 dimensions and 1 attribute
## attribute(s), summary of first 1e+05 cells:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## sst_anomaly -2.02 -0.515 -0.278 -0.2617328 0.009 0.974 2913
## dimension(s):
## from to offset delta refsys point x/y
## x 1 1701 0 0.01 WGS 84 FALSE [x]
## y 1 1301 -29 0.01 WGS 84 FALSE [y]
## time 1 2 2020-02-01 1 days Date NA
plot(x, hook = extra_plot)
## downsample set to 3
You can also request multiple variables.
x = db |>
dplyr::filter(date <= as.Date("2020-02-02"),
var %in% c("sst_anomaly", "analysed_sst")) |>
ghrsst::read_ghrsst(path = path)
x
## stars object with 3 dimensions and 2 attributes
## attribute(s), summary of first 1e+05 cells:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## analysed_sst 289.851 293.653 294.663 294.2342406 295.038 296.241 2913
## sst_anomaly -2.020 -0.515 -0.278 -0.2617328 0.009 0.974 2913
## dimension(s):
## from to offset delta refsys point x/y
## x 1 1701 0 0.01 WGS 84 FALSE [x]
## y 1 1301 -29 0.01 WGS 84 FALSE [y]
## time 1 2 2020-02-01 1 days Date NA
Without intentional cleanup, the *download** directory may grow to an unreasonable size. We provide a function to purge that directory as needed. Not that this does not delete files from any archive you may have developed.
purge_podaac()