-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cellarea docs, allow Literate.jl tutorials in the doc pipeline, fix typos #800
Open
asinghvi17
wants to merge
16
commits into
main
Choose a base branch
from
as/cellarea_docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
d63518a
Explain what `reproject` does a bit more
asinghvi17 3990a9c
Add Proj.jl to docs project + simplify make.jl
asinghvi17 f06a98c
Minor restructuring to GBIF workflow example to make it clearer
asinghvi17 ef18b27
Add a header to array_operations.md
asinghvi17 0c68d21
Merge remote-tracking branch 'origin/main' into moredocs
asinghvi17 c795d2c
Add a brief tutorial for cellarea, with descriptions
asinghvi17 9e1e86c
Add a motivating example to cellarea
asinghvi17 2b6f33b
Add a little note that this is exactly what zonal does
asinghvi17 b57b633
Clean up the docs a bit
asinghvi17 2aa2186
Add tutorials to top bar
asinghvi17 06ddbc1
add NaturalEarth to docs project
asinghvi17 2390aba
remove Literate, make the tutorial a full-workflow example
asinghvi17 1f844bd
Fix title + plotting
asinghvi17 e7e3423
Add a bit more text about what we're doing with cellarea
asinghvi17 5c6c761
Add the examples from the old cellarea tutorial to the docstring
asinghvi17 bbf539a
Remove tutorials/methods/cellarea.jl
asinghvi17 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# Computing spatial means | ||
|
||
```@meta | ||
CollapsedDocStrings=true | ||
``` | ||
|
||
It's very common to want to compute the mean of some value over some area of a raster. The initial approach is to simply average the values, but this will give you the arithmetic mean, not the spatial mean. | ||
|
||
The reason for this is that raster cells do not always have the same area, especially over a large region of the Earth where its curvature comes into play. | ||
|
||
To compute the spatial mean, you need to weight the values by the area of each cell. You can do this by multiplying the values by the cell area, then summing the values, and dividing that number by the total area. That was the motivation for this example. | ||
|
||
Let's get the rainfall over Chile, and compute the average rainfall across the country for the month of June. | ||
|
||
## Acquiring the data | ||
|
||
We'll get the precipitation data across the globe from [WorldClim](https://www.worldclim.org/data/index.html), via [RasterDataSources.jl](https://github.com/EcoJulia/RasterDataSources.jl), and use the `month` keyword argument to get the June data. | ||
|
||
Then, we can get the geometry of Chile from [NaturalEarth.jl](https://github.com/JuliaGeo/NaturalEarth.jl), and use `Rasters.mask` to get the data just for Chile. | ||
|
||
````@example cellarea | ||
using Rasters | ||
import Proj # to activate the spherical `cellarea` method | ||
|
||
using ArchGDAL, RasterDataSources, NaturalEarth # purely for data loading | ||
|
||
using CairoMakie # for plotting | ||
|
||
precip = Raster(WorldClim{Climate}, :prec; month = 6) | ||
```` | ||
|
||
````@example cellarea | ||
all_countries = naturalearth("admin_0_countries", 10) | ||
chile = all_countries.geometry[findfirst(==("Chile"), all_countries.NAME)] | ||
```` | ||
|
||
Let's plot the precipitation on the world map, and highlight Chile: | ||
|
||
````@example cellarea | ||
f, a, p = heatmap(precip; colorrange = Makie.zscale(replace_missing(precip, NaN)), axis = (; aspect = DataAspect())) | ||
p2 = poly!(a, chile; color = (:red, 0.3), strokecolor = :red, strokewidth = 0.5) | ||
f | ||
```` | ||
|
||
You can see Chile highlighted in red, in the bottom left quadrant. | ||
|
||
## Processing the data | ||
|
||
First, let's make sure that we only have the data that we care about, and crop and mask the raster so it only has values in Chile. | ||
We can crop by the geometry, which really just generates a view into the raster that is bounded by the geometry's bounding box. | ||
|
||
````@example cellarea | ||
cropped_precip = crop(precip; to = chile) | ||
```` | ||
|
||
Now, we mask the data such that any data outside the geometry is set to `missing`. | ||
|
||
````@example cellarea | ||
masked_precip = mask(cropped_precip; with = chile) | ||
heatmap(masked_precip) | ||
```` | ||
|
||
This is a lot of missing data, but that's mainly because the Chile geometry we have encompasses the Easter Islands as well, in the middle of the Pacific. | ||
|
||
|
||
```@docs; canonical=false | ||
cellarea | ||
``` | ||
|
||
`cellarea` computes the area of each cell in a raster. | ||
This is useful for a number of reasons - if you have a variable like | ||
population per cell, or elevation ([spatially extensive variables](https://r-spatial.org/book/05-Attributes.html#sec-extensiveintensive)), | ||
you'll want to account for the fact that different cells have different areas. | ||
|
||
You can specify whether you want to compute the area in the plane of your projection | ||
(`Planar()`), or on a sphere of some radius (`Spherical(; radius=...)`). | ||
|
||
Now, let's compute the average precipitation per square meter across Chile. | ||
First, we need to get the area of each cell in square meters. We'll use the spherical method, since we're working with a geographic coordinate system. This is the default. | ||
|
||
````@example cellarea | ||
areas = cellarea(masked_precip) | ||
masked_areas = mask(areas; with = chile) | ||
heatmap(masked_areas; axis = (; title = "Cell area in square meters")) | ||
```` | ||
|
||
You can see here that cells are largest towards the equator, and smallest away from it. This means that cells away from the equator should have a smaller contribution to the average than cells nearer the equator. | ||
|
||
## Computing the spatial mean | ||
|
||
Now we can compute the average precipitation per square meter. First, we compute total precipitation per grid cell: | ||
|
||
````@example cellarea | ||
precip_per_area = masked_precip .* masked_areas | ||
```` | ||
|
||
We can sum this to get the total precipitation per square meter across Chile: | ||
|
||
````@example cellarea | ||
total_precip = sum(skipmissing(precip_per_area)) | ||
```` | ||
|
||
We can also sum the areas to get the total area of Chile (in this raster, at least). | ||
|
||
````@example cellarea | ||
total_area = sum(skipmissing(masked_areas)) | ||
```` | ||
|
||
And we can convert that to an average by dividing by the total area: | ||
|
||
````@example cellarea | ||
avg_precip = total_precip / total_area | ||
```` | ||
|
||
According to the internet, Chile gets about 100mm of rain per square meter in June, so our statistic seems pretty close. | ||
|
||
Let's see what happens if we don't account for cell areas. An equivalent assumption would be that all cells have the same area. | ||
|
||
````@example cellarea | ||
bad_total_precip = sum(skipmissing(masked_precip)) | ||
bad_avg_precip = bad_total_precip / length(collect(skipmissing(masked_precip))) | ||
```` | ||
|
||
This is misestimated! This is why it's important to account for cell areas when computing averages. | ||
|
||
!!! note | ||
If you made it this far, congratulations! | ||
|
||
It's interesting to note that we've replicated the workflow of `zonal` here. | ||
`zonal` is a more general function that can be used to compute any function over geometries, | ||
and it has multithreading built in. | ||
|
||
But fundamentally, this is all that `zonal` is doing under the hood - | ||
masking and cropping the raster to the geometry, and then computing the statistic. | ||
|
||
## Summary | ||
|
||
In this tutorial, we've seen how to compute the spatial mean of a raster, and how to account for the fact that raster cells do not always have the same area. | ||
|
||
We've also seen how to use the `cellarea` function to compute the area of each cell in a raster, and how to use the `mask` function to get the data within a geometry. | ||
|
||
We've seen that the spatial mean is not the same as the arithmetic mean, and that we need to account for the area of each cell when computing the average. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to maintain the distinction between axes and lookups as different things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying, the problem is that even people who've used rasters for a while have no clue what "lookups" are. There has to be some frame of reference. "Dimension values" is also super unclear - what does that mean?
Maybe we can rewrite the parentheses to be more clear? Would "axis values" work there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is meant to be for people who have no clue what cellarea is or why you might want it. So you have to go from the ground up, more or less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it, but we just can't use a common base function name that also works on a Raster to explain a DD function that's really a different thing.
It's clearly difficult to find words that describe lookups without semantic overloads (see AxisKeys.jl... keys are a different base method again) but it's important that we try.
I think axis values is helpful, but maybe values is also a pretty empty word. Maybe this needs to be systematic from DD up, we could workshop the language over there.
(FWIW
lookup
was intentionally chosen to avoid the overloads with base concepts that other packages have. The problem with avoiding overloads is you end up with a less common word)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could say "x and y values" for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we're back to the problem of "what is a lookup" again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could say "location information". But I would prefer to just use lookups here and have a glossary where we would explain these differences, because otherwise we would have to duplicate this info everywhere we use lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that we should duplicate this information everywhere we use lookups, at least in top-level, user accessible functions. I don't want to overestimate the curiosity of a new user v/s their unwillingness to click a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not to say that we shouldn't have a glossary with a more detailed explanation, but there should be something like:
[lookup values](link to DD lookup docs) (the positions of the cells in each dimension)
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually "lookup values (indicating the positions of each cell)" sounds pretty descriptive