Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cellarea docs, allow Literate.jl tutorials in the doc pipeline, fix typos #800

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,14 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
GBIF2 = "dedd4f52-e074-43bf-924d-d6bce14ad628"
GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f"
GeoMakie = "db073c08-6b98-4ee5-b6a4-5efafb3259c6"
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
NaturalEarth = "436b0209-26ab-4e65-94a9-6526d86fea76"
NCDatasets = "85f8d34a-cbdd-5861-8df4-14fed0d494ab"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Proj = "c94c279d-25a6-4763-9509-64d165bea63e"
RasterDataSources = "3cb90ccd-e1b6-4867-9617-4276c8b2ca36"
Rasters = "a3a2b9e3-a471-40c9-b274-f788e487c689"
Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4"
7 changes: 2 additions & 5 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
using Documenter, Rasters, Plots, Logging, Statistics, Dates,
RasterDataSources, ArchGDAL, NCDatasets, CoordinateTransformations
RasterDataSources, ArchGDAL, Proj, NCDatasets, CoordinateTransformations
import Makie, CairoMakie
using DocumenterVitepress
using DocumenterVitepress, Documenter
using Rasters.LookupArrays, Rasters.Dimensions

# Don't output huge svgs for Makie plots
Expand All @@ -18,7 +18,6 @@ function flush_info_and_warnings()
end
flush_info_and_warnings()


Logging.disable_logging(Logging.Warn)

# Make the docs, without running the tests again
Expand All @@ -44,8 +43,6 @@ makedocs(
devurl = "dev";
),
draft = false,
source = "src",
build = "build",
warnonly=true,
)

Expand Down
7 changes: 6 additions & 1 deletion docs/src/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,12 @@ export default defineConfig({
{ text: 'Array Operations', link: '/array_operations' },
]
},
{ text: 'Data Sources',
{text: 'Tutorials',
items: [
{text: 'Computing spatial means', link: '/tutorials/spatial_mean'}
]
},
{ text: 'Data Sources',
items: [
{ text: 'Overview', link: '/data_sources' },
{ text: 'GBIF', link: '/gbif_wflow' }
Expand Down
2 changes: 2 additions & 0 deletions docs/src/array_operations.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Array operations

Most base methods work as for regular julia `Array`s, such as `reverse` and
rotations like `rotl90`. Base, statistics and linear algebra methods like `mean`
that take a `dims` argument can also use the dimension name.
Expand Down
15 changes: 11 additions & 4 deletions docs/src/gbif_wflow.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Load occurrences for the Mountain Pygmy Possum using GBIF.jl
# GBIF workflow

## Load GBIF
In this example, we'll load occurrences for the Mountain Pygmy Possum species using [GBIF2.jl](https://github.com/rafaqz/GBIF2.jl), an interface to the [Global Biodiversity Information Facility](https://www.gbif.org/), and extract environmental variables using BioClim data from [RasterDataSources.jl](https://github.com/EcoJulia/RasterDataSources.jl).

## Load GBIF species data

````@example gbif
using Rasters, GBIF2
Expand Down Expand Up @@ -38,10 +40,15 @@ foreach(i -> scatter!(p, coords; subplot=i, kw...), 1:4)
p
````

Then extract predictor variables and write to CSV.
Then extract predictor variables.
````@example gbif
using CSV
predictors = collect(extract(se_aus, coords))
````

These are recognized as a table format in Julia, so we can write them to file using CSV.jl:

````@example gbif
using CSV
CSV.write("burramys_parvus_predictors.csv", predictors)
````

Expand Down
143 changes: 143 additions & 0 deletions docs/src/tutorials/spatial_mean.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Computing spatial means

```@meta
CollapsedDocStrings=true
```

It's very common to want to compute the mean of some value over some area of a raster. The initial approach is to simply average the values, but this will give you the arithmetic mean, not the spatial mean.

The reason for this is that raster cells do not always have the same area, especially over a large region of the Earth where its curvature comes into play.

To compute the spatial mean, you need to weight the values by the area of each cell. You can do this by multiplying the values by the cell area, then summing the values, and dividing that number by the total area. That was the motivation for this example.

Let's get the rainfall over Chile, and compute the average rainfall across the country for the month of June.

## Acquiring the data

We'll get the precipitation data across the globe from [WorldClim](https://www.worldclim.org/data/index.html), via [RasterDataSources.jl](https://github.com/EcoJulia/RasterDataSources.jl), and use the `month` keyword argument to get the June data.

Then, we can get the geometry of Chile from [NaturalEarth.jl](https://github.com/JuliaGeo/NaturalEarth.jl), and use `Rasters.mask` to get the data just for Chile.

````@example cellarea
using Rasters
import Proj # to activate the spherical `cellarea` method

using ArchGDAL, RasterDataSources, NaturalEarth # purely for data loading

using CairoMakie # for plotting

precip = Raster(WorldClim{Climate}, :prec; month = 6)
````

````@example cellarea
all_countries = naturalearth("admin_0_countries", 10)
chile = all_countries.geometry[findfirst(==("Chile"), all_countries.NAME)]
````

Let's plot the precipitation on the world map, and highlight Chile:

````@example cellarea
f, a, p = heatmap(precip; colorrange = Makie.zscale(replace_missing(precip, NaN)), axis = (; aspect = DataAspect()))
p2 = poly!(a, chile; color = (:red, 0.3), strokecolor = :red, strokewidth = 0.5)
f
````

You can see Chile highlighted in red, in the bottom left quadrant.

## Processing the data

First, let's make sure that we only have the data that we care about, and crop and mask the raster so it only has values in Chile.
We can crop by the geometry, which really just generates a view into the raster that is bounded by the geometry's bounding box.

````@example cellarea
cropped_precip = crop(precip; to = chile)
````

Now, we mask the data such that any data outside the geometry is set to `missing`.

````@example cellarea
masked_precip = mask(cropped_precip; with = chile)
heatmap(masked_precip)
````

This is a lot of missing data, but that's mainly because the Chile geometry we have encompasses the Easter Islands as well, in the middle of the Pacific.


```@docs; canonical=false
cellarea
```

`cellarea` computes the area of each cell in a raster.
This is useful for a number of reasons - if you have a variable like
population per cell, or elevation ([spatially extensive variables](https://r-spatial.org/book/05-Attributes.html#sec-extensiveintensive)),
you'll want to account for the fact that different cells have different areas.

You can specify whether you want to compute the area in the plane of your projection
(`Planar()`), or on a sphere of some radius (`Spherical(; radius=...)`).

Now, let's compute the average precipitation per square meter across Chile.
First, we need to get the area of each cell in square meters. We'll use the spherical method, since we're working with a geographic coordinate system. This is the default.

````@example cellarea
areas = cellarea(masked_precip)
masked_areas = mask(areas; with = chile)
heatmap(masked_areas; axis = (; title = "Cell area in square meters"))
````

You can see here that cells are largest towards the equator, and smallest away from it. This means that cells away from the equator should have a smaller contribution to the average than cells nearer the equator.

## Computing the spatial mean

Now we can compute the average precipitation per square meter. First, we compute total precipitation per grid cell:

````@example cellarea
precip_per_area = masked_precip .* masked_areas
````

We can sum this to get the total precipitation per square meter across Chile:

````@example cellarea
total_precip = sum(skipmissing(precip_per_area))
````

We can also sum the areas to get the total area of Chile (in this raster, at least).

````@example cellarea
total_area = sum(skipmissing(masked_areas))
````

And we can convert that to an average by dividing by the total area:

````@example cellarea
avg_precip = total_precip / total_area
````

According to the internet, Chile gets about 100mm of rain per square meter in June, so our statistic seems pretty close.

Let's see what happens if we don't account for cell areas. An equivalent assumption would be that all cells have the same area.

````@example cellarea
bad_total_precip = sum(skipmissing(masked_precip))
bad_avg_precip = bad_total_precip / length(collect(skipmissing(masked_precip)))
````

This is misestimated! This is why it's important to account for cell areas when computing averages.

!!! note
If you made it this far, congratulations!

It's interesting to note that we've replicated the workflow of `zonal` here.
`zonal` is a more general function that can be used to compute any function over geometries,
and it has multithreading built in.

But fundamentally, this is all that `zonal` is doing under the hood -
masking and cropping the raster to the geometry, and then computing the statistic.

## Summary

In this tutorial, we've seen how to compute the spatial mean of a raster, and how to account for the fact that raster cells do not always have the same area.

We've also seen how to use the `cellarea` function to compute the area of each cell in a raster, and how to use the `mask` function to get the data within a geometry.

We've seen that the spatial mean is not the same as the arithmetic mean, and that we need to account for the area of each cell when computing the average.

62 changes: 58 additions & 4 deletions src/extensions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -164,18 +164,21 @@ warp(args...; kw...) = throw_extension_error(warp, "ArchGDAL", :RastersArchGDALE
Gives the approximate area of each gridcell of `x`.
By assuming the earth is a sphere, it approximates the true size to about 0.1%, depending on latitude.

Run `using ArchGDAL` to make this method fully available.
Run `using Proj` to make this method fully available.

`method` can be `Spherical(; radius)` (the default) or `Planar()`.
- `Spherical` will compute cell area on the sphere, by transforming all points back to long-lat. You can specify the radius by the `radius` keyword argument here. By default, this is `6371008.8`, the mean radius of the Earth.
- `Planar` will compute cell area in the plane of the CRS you have chosen. Be warned that this will likely be incorrect for non-equal-area projections.

## Example

```julia
using Rasters, ArchGDAL, Rasters.Lookups
xdim = X(Projected(90.0:10.0:120; sampling=Intervals(Start()), crs=EPSG(4326)))
```jldoctest cellarea
using Rasters, Proj, Rasters.Lookups
# First, we construct a raster with cell-based sampling (`Intervals`),
# where the value represents the start of the cell in each dimension (`Start`).
xdim = X(Projected(90.0:10.0:120; sampling=Intervals(Start()), crs=EPSG(4326))) # construct cell-based sampling,
ydim = Y(Projected(0.0:10.0:50; sampling=Intervals(Start()), crs=EPSG(4326)))
# Now, we construct a raster from these dimensions, and compute the cell area.
myraster = rand(xdim, ydim)
cs = cellarea(myraster)

Expand All @@ -196,6 +199,57 @@ cs = cellarea(myraster)
110.0 1.23017e6 1.19279e6 1.11917e6 1.01154e6 873182.0 708290.0
120.0 1.23017e6 1.19279e6 1.11917e6 1.01154e6 873182.0 708290.0
```

Note how the cell area decreases with increasing latitude, i.e., as we move away from the equator.

Compare this to the output from `cellarea(Planar(), ...)`:
```jldoctest cellarea
cs = cellarea(Planar(), myraster)

# output
╭───────────────────────╮
│ 4×6 Raster{Float64,2} │
├───────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
↓ X Projected{Float64} 90.0:10.0:120.0 ForwardOrdered Regular Intervals{Start},
→ Y Projected{Float64} 0.0:10.0:50.0 ForwardOrdered Regular Intervals{Start}
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
extent: Extent(X = (90.0, 130.0), Y = (0.0, 60.0))

crs: EPSG:4326
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
↓ → 0.0 10.0 20.0 30.0 40.0 50.0
90.0 100.0 100.0 100.0 100.0 100.0 100.0
100.0 100.0 100.0 100.0 100.0 100.0 100.0
110.0 100.0 100.0 100.0 100.0 100.0 100.0
120.0 100.0 100.0 100.0 100.0 100.0 100.0
```

The output here is clearly incorrect - but you can see how it's calculated, and that in an equal-area projection, it could be useful.

Let's look at `cellarea` in the Web Mercator projection:
```jldoctest cellarea
mywebmerc = reproject(myraster; crs = EPSG(3857)) # this is the EPSG code for Web Mercator
cs = cellarea(mywebmerc)

# output
╭───────────────────────╮
│ 4×6 Raster{Float64,2} │
├───────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┐
↓ X Projected{Float64} [1.0018754171394622e7, 1.1131949079327356e7, 1.2245143987260092e7, 1.3358338895192828e7] ForwardOrdered Regular Intervals{Start},
→ Y Projected{Float64} [0.0, 1.1188899748579594e6, …, 4.865942279503176e6, 6.446275841017159e6] ForwardOrdered Irregular Intervals{Start}
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
extent: Extent(X = (1.0018754171394622e7, 1.4471533803125564e7), Y = (0.0, 8.399737889818357e6))

crs: EPSG:3857
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
↓ → 0.0 1.11889e6 2.27303e6 3.50355e6 4.86594e6 6.44628e6
1.00188e7 1.2332e12 1.1952e12 1.12048e12 1.01158e12 8.72085e11 7.06488e11
1.11319e7 1.2332e12 1.1952e12 1.12048e12 1.01158e12 8.72085e11 7.06488e11
1.22451e7 1.2332e12 1.1952e12 1.12048e12 1.01158e12 8.72085e11 7.06488e11
1.33583e7 1.2332e12 1.1952e12 1.12048e12 1.01158e12 8.72085e11 7.06488e11
```


$EXPERIMENTAL
"""
cellarea(x; kw...) = cellarea(Spherical(), x; kw...)
Expand Down
10 changes: 6 additions & 4 deletions src/methods/reproject.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,16 @@
"""
reproject(obj; crs)

Reproject the lookups of `obj` to a different crs.
Reproject the lookups (axes) of `obj` to a different crs.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to maintain the distinction between axes and lookups as different things

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, the problem is that even people who've used rasters for a while have no clue what "lookups" are. There has to be some frame of reference. "Dimension values" is also super unclear - what does that mean?

Maybe we can rewrite the parentheses to be more clear? Would "axis values" work there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is meant to be for people who have no clue what cellarea is or why you might want it. So you have to go from the ground up, more or less.

Copy link
Owner

@rafaqz rafaqz Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it, but we just can't use a common base function name that also works on a Raster to explain a DD function that's really a different thing.

It's clearly difficult to find words that describe lookups without semantic overloads (see AxisKeys.jl... keys are a different base method again) but it's important that we try.

I think axis values is helpful, but maybe values is also a pretty empty word. Maybe this needs to be systematic from DD up, we could workshop the language over there.

(FWIW lookup was intentionally chosen to avoid the overloads with base concepts that other packages have. The problem with avoiding overloads is you end up with a less common word)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could say "x and y values" for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we're back to the problem of "what is a lookup" again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could say "location information". But I would prefer to just use lookups here and have a glossary where we would explain these differences, because otherwise we would have to duplicate this info everywhere we use lookups.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that we should duplicate this information everywhere we use lookups, at least in top-level, user accessible functions. I don't want to overestimate the curiosity of a new user v/s their unwillingness to click a link.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not to say that we shouldn't have a glossary with a more detailed explanation, but there should be something like:

[lookup values](link to DD lookup docs) (the positions of the cells in each dimension)

or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually "lookup values (indicating the positions of each cell)" sounds pretty descriptive


This is a lossless operation for the raster data, as only the
lookup values change. This is only possible when the axes of source
and destination projections are aligned: the change is usually from
a [`Regular`](@ref) and an [`Irregular`](@ref) lookup spans.

For converting between projections that are rotated,
skewed or warped in any way, use [`resample`](@ref).
skewed or warped in any way, *or* if you want to re-sample the
_data_, use [`resample`](@ref).

Dimensions without an `AbstractProjected` lookup (such as a `Ti` dimension)
are silently returned without modification.
Expand Down Expand Up @@ -41,8 +42,9 @@ end
"""
reproject(source::GeoFormat, target::GeoFormat, dim::Dimension, val)

`reproject` uses ArchGDAL.reproject, but implemented for a reprojecting
a value array of values, a single dimension at a time.
`reproject` uses Proj.jl's `Transformation` interface,
but implemented for reprojecting a lookup / axis array,
a single dimension at a time.
"""
function reproject(source, target, dim, val)
if source == target
Expand Down
Loading