-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructured dynamical core, variables array-agnostic and 3-dimensional, modular netCDF output #525
Conversation
Problem, while this works (LowerTriangularMatrix on layer 1, time step 1) julia> progn.vor[:,1,1]
65×64 LowerTriangularMatrix{ComplexF32}:
0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im … this doesn't julia> progn.vor[:,end,1]
ERROR: BoundsError: attempt to access 2144×2×2 Array{ComplexF32, 3} at index [1:2144, 64, 1] the |
This is a consequence of the LTA/LTM being a subtype of AbstractArray{T,4} in this case. What we did with the two ways of indexing is a of a hack to be fair, the LTA can't be a 3-dim and 4-dim at the same time that easily. As far as I know we could maybe redefine the behaviour of |
Does this actually appear in the dynamical core? Because the 3d indexing with leading colon does cause allocations, so it should be avoided anyway, or we can introduce a non-allocating view variarant. That's possible as well. In the end, I think we can just avoid using the |
Another alternative would be to do something a bit similar to EndpointRanges.jl and define a custom |
Okay that's interesting, thanks for looking into it. No we wouldn't really need to use I am generally not super sure about the whole two ways of indexing
I obviously don't want to keep changing things but I see somewhat an appeal to make our life easier with 2-4. But I'm wondering whether in the end we should first reimplement the gradients and transforms for |
Let's not overthink it, I would say. We defined Additionally we defined several functions so that the flat indexing with The alternative would be to swap the behaviour by making it a subtype of And so far, the only limitation that I can think of are those connected with Edit: Another way to look at is also that the way we have it currently implemented we gives us the matrix style indexing and some support for flat indexing. At the same time though we can also always directly index the |
As far as I can see it, EndpointRanges does even directly allow to define those as subtypes of their types. It's a pretty small package, so it wouldn't be too bad to depend on it. I could have a look at it some day. |
With closures over functions there's also easy ways to do stuff like this in a really compact way: Like: set!(simulation, sea_surface_temperature=(λ, ϕ) -> interpolate(ϕ, λ, model.land_sea_mask.mask) < 0.1 ? 273.15 + 30*cosd(ϕ) + 5*sind(λ-30) : NaN) or set!(simulation, snow_depth=(λ, ϕ) -> interpolate(ϕ, λ, model.orography.orography) > 1000 ? 100 : 0) Btw, we should be more consistent with the order of longitude and latitude for function calls (see above) |
It's |
test/set.jl
Outdated
u_grid = transform(u, model.spectral_transform) | ||
v_grid = transform(v, model.spectral_transform) | ||
|
||
set!(simulation, u=u_grid, v=v_grid, coslat_scaling_included=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as lf
is not set here, this uses lf=1
(default)
U = similar(u) | ||
V = similar(v) | ||
|
||
SpeedyTransforms.UV_from_vordiv!(U, V, prog_new.vor[lf], prog_new.div[lf], model.spectral_transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, this calculates U, V
from lf
which in this testset is set to lf=2
above
u = randn(spectral_grid.SpectralVariable3D, trunc+2, trunc+1, nlayers) | ||
v = randn(spectral_grid.SpectralVariable3D, trunc+2, trunc+1, nlayers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any transform tests I usually start in spectral space because the grid space has more degrees of freedom meaning that rand
of some grid contains a lot of information (high resolution noise) that's lost in the first transform as not even the highest wavenumbers can hold that. You can probably circumvent this by using a linear trunction (instead of the default quadratic) but because we always do quadratic, I'd just make sure to start any spec-grid uv-vor/div roundtrip with information that's surely represented in any space
src/dynamics/prognostic_variables.jl
Outdated
curl!(vor, u_new, v_new, S; add) | ||
divergence!(div, u_new, v_new, S; add) | ||
else | ||
curl!(vor, u_, v_, S; add) | ||
divergence!(div, u_, v_, S; add) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a division by the radius missing here. As discussed somewhere in the docs (sufficiently I hope) all spectral gradient operators omit the radius scaling as it's not needed in the dynamical core (hence we use vor, div being scaled by the radius). I've therefore added to a bunch of curl
and divergence
functions a radius
keyword (default 1) that one needs to pass on. The SpectralTransform
doesn't know about the radius (because it's invariant to the radius)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All spectral gradients with c3f7398 now can take an optional radius
keyword argument. Otherwise radius=1
is assumed.
Regarding performance, quick comparison between v0.10 and main on my macbook air
|
I did some benchmarks of the transforms and gradients between import Pkg
Pkg.activate(".")
using SpeedyWeather, BenchmarkTools
NF = Float32
trunc = 63
Grid = FullGaussianGrid
#nlayers = 8
SG = SpectralGrid(; NF, trunc, Grid)
S = SpectralTransform(SG, recompute_legendre=false)
grid = randn(SG.Grid{NF}, SG.nlat_half)
L1 = zeros(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
L2 = rand(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
L3 = zeros(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
L4 = zeros(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
@benchmark spectral!(L1, grid, S)
@benchmark gridded!(grid, L1, S)
#@benchmark transform!(L1, grid, S)
#@benchmark transform!(grid, L1, S)
@benchmark curl!(L3, L1, L2, S)
@benchmark divergence!(L4, L1, L2, S)
@benchmark SpeedyWeather.UV_from_vordiv!(L1, L2, L3, L4, S)
@benchmark ∇²!(L1, L4, S, inverse=false)
@benchmark ∇²!(L1, L4, S, inverse=true) New branch: @benchmark transform!(L1, grid, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 49.417 μs … 101.292 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 50.417 μs ┊ GC (median): 0.00%
Time (mean ± σ): 50.637 μs ± 1.018 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▆█▇▅▂
▁▂▂▂▂▂▂██████▇▇▄▅▅▄▄▃▃▃▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
49.4 μs Histogram: frequency by time 54.8 μs <
Memory estimate: 12.38 KiB, allocs estimate: 250.
@benchmark transform!(grid, L1, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 47.500 μs … 105.000 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 48.667 μs ┊ GC (median): 0.00%
Time (mean ± σ): 48.801 μs ± 1.289 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▇▃ ▁▃▆▆▆▃▁▁
▂▅██████▆▆▆▇████████▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▂▂▂▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
47.5 μs Histogram: frequency by time 52.6 μs <
Memory estimate: 12.41 KiB, allocs estimate: 251.
@benchmark curl!(L3, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.532 μs … 3.722 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.727 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.733 μs ± 49.194 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▇█▆▆▄▂ ▁ ▂
▆▅▃▃▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃██████████▆▆▅▃▅▆▅▅▅▄▄▅▅▄▃▅▄▃▄▄▃▃▁▃▄▃▄ █
2.53 μs Histogram: log(frequency) by time 3 μs <
Memory estimate: 32 bytes, allocs estimate: 1.
@benchmark divergence!(L4, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.505 μs … 3.782 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.694 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.703 μs ± 57.143 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▆▆██▇▅▂▁▁ ▂
▄▄▆▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██████████▇▇▅▆▆▅▄▄▆▆▅▅▅▄▃▅▅▃▅▅▄▃▅▁▄▄▄▄▄ █
2.5 μs Histogram: log(frequency) by time 2.99 μs <
Memory estimate: 32 bytes, allocs estimate: 1.
@benchmark SpeedyWeather.UV_from_vordiv!(L1, L2, L3, L4, S)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.146 μs … 16.948 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.391 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.431 μs ± 307.718 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▂ ▂█▆▁▂▃▁ ▃▁ ▁
██▅▃▃▅▅▃▂▂▃▄▃▃███████▇▆███▇▆▆▆▅▆▇▆▅▅▅▅▇▇▇▆▆▅▅▆▆▅▅▅▆▆▆▅▅▆▅▇▅ █
3.15 μs Histogram: log(frequency) by time 4.06 μs <
Memory estimate: 64 bytes, allocs estimate: 1.
@benchmark ∇²!(L1, L4, S, inverse=false)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.446 μs … 5.304 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.621 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.623 μs ± 70.167 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▂▂▃▁▂▁ ▂▆█▇▆▄▃▂ ▂
▇▇▄▆▄▄▄▃▆██▆▃████████████████▇▇▇█▇▆▅▅▁▆▃▆▄▅▅▆▄▄▅▅▄▅▃▅▁▅▅▄▅ █
1.45 μs Histogram: log(frequency) by time 1.87 μs <
Memory estimate: 32 bytes, allocs estimate: 1.
@benchmark ∇²!(L1, L4, S, inverse=true)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.446 μs … 2.358 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.558 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.562 μs ± 22.520 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ █▇
▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃█▁██▄▆▃▂▁▂▂▁▂▂▂▁▂▂▁▂▂▂ ▂
1.45 μs Histogram: frequency by time 1.62 μs <
Memory estimate: 32 bytes, allocs estimate: 1. Main branch: @benchmark spectral!(L1, grid, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 48.125 μs … 111.625 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 51.958 μs ┊ GC (median): 0.00%
Time (mean ± σ): 52.343 μs ± 1.526 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆██▆▆▆▆▆▅▅▄▄▃▂▂ ▁ ▂
▄▃▃▃▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▆██████████████████▇█████▇██▇▆▇▇▆▆▇▇▇▆▆▆▆ █
48.1 μs Histogram: log(frequency) by time 57.9 μs <
Memory estimate: 10.88 KiB, allocs estimate: 202.
@benchmark gridded!(grid, L1, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 46.792 μs … 110.000 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 49.209 μs ┊ GC (median): 0.00%
Time (mean ± σ): 49.510 μs ± 1.329 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▅█▂▁
▂▁▁▁▁▁▁▁▂▁▂▃▅▅██████▇█▆▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
46.8 μs Histogram: frequency by time 54.9 μs <
Memory estimate: 10.91 KiB, allocs estimate: 203.
@benchmark curl!(L3, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.477 μs … 4.042 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.671 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.679 μs ± 49.191 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▆▃█
▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▄███▆▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂ ▂
2.48 μs Histogram: frequency by time 2.9 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark divergence!(L4, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.486 μs … 5.880 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.685 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.707 μs ± 101.890 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▆█▇▅▄▂▂ ▂
▇▆▃▁▁▆▃▁▃▁▁▁▁▁████████▇▇▇▇▆▆▇▆▆▅▆▇▅▇▅▆▆▇▆▆█▇▇▆▆▆▆▆▇▇▆▇▆▇▆▇▇ █
2.49 μs Histogram: log(frequency) by time 3.19 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark SpeedyWeather.UV_from_vordiv!(L1, L2, L3, L4, S)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.979 μs … 3.183 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.125 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.132 μs ± 45.261 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁█▆▅▁ ▁▁ ▁
▅▅▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▁▁█████▇██▆▆▆▄▇▅▄▆▅▄▄▅▄▄▅▄▄▄▁▄▅▃▄▄▄▄▅▃▅▃▄ █
1.98 μs Histogram: log(frequency) by time 2.38 μs <
Memory estimate: 16 bytes, allocs estimate: 1.
@benchmark ∇²!(L1, L4, S, inverse=false)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.400 μs … 7.075 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.508 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.522 μs ± 111.858 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅▇█
▂▂▁▂▁▁▁▁▁▁▁▁▂▁▂▄███▄▃▃▂▃▂▃▂▂▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
1.4 μs Histogram: frequency by time 1.75 μs <
Memory estimate: 32 bytes, allocs estimate: 1.
@benchmark ∇²!(L1, L4, S, inverse=true)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.413 μs … 8.100 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.517 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.528 μs ± 130.292 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▂▂▂▂▁▁▁▁▁▁▁▁▁▂▇█▆█▇▇▇▃▂▂▂▄▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▂ ▃
1.41 μs Histogram: frequency by time 1.75 μs <
Memory estimate: 32 bytes, allocs estimate: 1. |
Okay that's very good because that means that all our new indexing doesn't cause any problems in practice. Yay! I bet it's some type instability, maybe needs a function barrier or similar. Given the size of this pull request I don't recall all the changes we made in that direction 🙈 |
I was continuing the benchmarking a bit by dissecting There are differences in speed between multiple of the individual functions, but by far the biggest offender both in terms of absolute time and relative difference is There's also e.g. a 4x difference in Here are detailed results (benchmarked in blocks that are directly comparable): |
Awesome! Yeah the vertical integration and advection did change a bit in terms of structure without the vertical layering. Maybe I haven't been careful there. |
I had removed all |
Was (T31L8, with julia> @btime SpeedyWeather.vertical_advection!($diagn, $model)
10.062 ms (202808 allocations: 15.47 MiB) now julia> @btime SpeedyWeather.vertical_advection!($diagn, $model)
281.434 μs (24 allocations: 2.81 KiB) 40x faster! 🎉 remaining allocations are k_stencil = max.(min.(nlayers, k-B:k+B), 1) which allocates a vector and then |
The full benchmark suite: Main(https://gist.github.com/maximilian-gelbrecht/97ad1e55c044a1df3ec8d26fb9e8a7d8):
|
Model | T | L | Physics | Δt | SYPD | Memory |
---|---|---|---|---|---|---|
BarotropicModel | 31 | 1 | false | 1800 | 63138 | 1.17 MB |
ShallowWaterModel | 31 | 1 | false | 1800 | 38006 | 1.19 MB |
PrimitiveDryModel | 31 | 8 | true | 1800 | 1984 | 4.11 MB |
PrimitiveWetModel | 31 | 8 | true | 1800 | 1431 | 4.47 MB |
So maybe we are even slightly faster now 🎉
While we're at it at little performance overview of the calls in
(Doesn't add up, it's faster to benchmark all together...) Translated to a good pie chart we have |
Yes, I can add them to the benchmark suite. |
A draft superceding #493 to restructure the prognostic variables.
Includes
device
andArrayType
inSpectralGrid
the variables 3D/4D are now directly in prognostic variables, not nested in
.layers[1].timesteps[1]
or.surface
the type signature for
PrognosticVariables
however got a bit more complicated, maybe there's ways to simplify that