Restructured dynamical core, variables array-agnostic and 3-dimensional, modular netCDF output #525

milankl · 2024-04-25T19:52:28Z

A draft superceding #493 to restructure the prognostic variables.

Includes device and ArrayType in SpectralGrid

julia> spectral_grid
SpectralGrid:
├ Spectral:   T63 LowerTriangularMatrix{Complex{Float32}}, radius = 6.371e6 m
├ Grid:       96-ring OctahedralGaussianGrid{Float32}, 10944 grid points
├ Resolution: 216km (average)
├ Vertical:   2-level SigmaCoordinates
└ Device:     CPU using Array

the variables 3D/4D are now directly in prognostic variables, not nested in .layers[1].timesteps[1] or .surface

julia> tree(progn)
PrognosticVariables{Float32, Vector{Float32}, OctahedralGaussianGrid{Float32}, Matrix{ComplexF32}, Array{ComplexF32, 3}, ShallowWater}
├ trunc
...
├ vor
├ div
├ temp
├ humid
├ pres
...

the type signature for PrognosticVariables however got a bit more complicated, maybe there's ways to simplify that

Base.@kwdef struct PrognosticVariables{
    NF<:AbstractFloat,
    ArrayType2D<:AbstractArray{NF, 1},                  # 2D real type (land + ocean)
    Grid<:AbstractGridArray{NF, 1, ArrayType2D},        # Grid for those
    ArrayTypeComplex3D<:AbstractArray{Complex{NF}, 2},  # 3D complex type (surface pressure)
    ArrayTypeComplex4D<:AbstractArray{Complex{NF}, 3},  # 4D complex type (layered variables)
    M<:ModelSetup,
} <: AbstractPrognosticVariables

milankl · 2024-04-25T19:54:58Z

Problem, while this works (LowerTriangularMatrix on layer 1, time step 1)

julia> progn.vor[:,1,1]
65×64 LowerTriangularMatrix{ComplexF32}:
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  …

this doesn't

julia> progn.vor[:,end,1]
ERROR: BoundsError: attempt to access 2144×2×2 Array{ComplexF32, 3} at index [1:2144, 64, 1]

the end gets translated to 64 which is size(progn.vor)[2] ...

maximilian-gelbrecht · 2024-05-02T11:18:10Z

This is a consequence of the LTA/LTM being a subtype of AbstractArray{T,4} in this case.

What we did with the two ways of indexing is a of a hack to be fair, the LTA can't be a 3-dim and 4-dim at the same time that easily. As far as I know we could maybe redefine the behaviour of end by defining lastindex(::LowerTriangularArray) but then it wouldn't work properly anymore for the 4-dim case. I guess we have to decide if we want the end to work as expected in the case of flat indexing (lm) or matrix indexing (l, m).

maximilian-gelbrecht · 2024-05-02T12:36:44Z

Does this actually appear in the dynamical core? Because the 3d indexing with leading colon does cause allocations, so it should be avoided anyway, or we can introduce a non-allocating view variarant. That's possible as well.

In the end, I think we can just avoid using the end there. We have the constant N_STEPS defined which can just use in its place there. Similarly we'll always have access to the number of horizontal levels as well easily.

maximilian-gelbrecht · 2024-05-02T17:37:29Z

Another alternative would be to do something a bit similar to EndpointRanges.jl and define a custom iend/lmend that works for the flat lm-indexing (and only there).

milankl · 2024-05-02T17:54:17Z

Another alternative would be to do something a bit similar to EndpointRanges.jl and define a custom iend/lmend that works for the flat lm-indexing (and only there).

Okay that's interesting, thanks for looking into it. No we wouldn't really need to use end inside the dynamical core or if we can easily replace it, it's more generally about using LowerTriangularArray intuitively. I guess if we've clearly defined what one can and cannot do with those then all good, but if I stumble across something that I expect to work and it doesn't then I'd rather flag this for discussion then to silently rewrite the code to just make it work

I am generally not super sure about the whole two ways of indexing LowerTriangularArray, while it seems possible to make this convenient I'm wondering whether in the end it just means we get hung up on cases where it doesn't work as expected or causes some small edge case issue that's it's not worth it overall. E.g. I see several possibilities

support flat and matrix indexing throughout but get hung up somewhat whenever this causes issues (what we have now)
support only flat indexing and use ij2k manually inside the dynamical core whenever necessary
support only matrix indexing by storing the upper triangle explicitly but disallow setindex! there
support matrix indexing only for LowerTriangularMatrix in addition to flat indexing for ...Array that way we have the more user facing LowerTriangularMatrix intuitive but are consitently indexing within the dynamical core

I obviously don't want to keep changing things but I see somewhat an appeal to make our life easier with 2-4. But I'm wondering whether in the end we should first reimplement the gradients and transforms for LowerTriangularArray to make a better informed decision also on what's important for performance and GPU

maximilian-gelbrecht · 2024-05-02T18:10:51Z

Let's not overthink it, I would say.

We defined LowerTriangularArray{T,N} to be a subtype of AbstractArray{T,N} but store an array AbstractArray{T,N-1}. All indexing with N indices (matrix indexing) will work flawlessly due to the abstract array interface.

Additionally we defined several functions so that the flat indexing with N-1 indices works in almost all cases. I think that's fine, and in some way that's the same as if we would only support matrix indexing and use ij2k always in the dynamical core. I think it's totally to keep it as it is, and just be a bit aware of flat indexing being a bit limited (the docs should mention that!).

The alternative would be to swap the behaviour by making it a subtype of AbstractArray{T,N-1} and have the matrix indexing to be somewhat limited.

And so far, the only limitation that I can think of are those connected with size and axes. And that is because in the end, it can't be a 3-d and 4-d array at the same time. That's not that bad.

Edit: Another way to look at is also that the way we have it currently implemented we gives us the matrix style indexing and some support for flat indexing. At the same time though we can also always directly index the .data object, when in doubt about the flat indexing.

maximilian-gelbrecht · 2024-05-02T18:14:41Z

Another alternative would be to do something a bit similar to EndpointRanges.jl and define a custom iend/lmend that works for the flat lm-indexing (and only there).

Okay that's interesting, thanks for looking into it.

As far as I can see it, EndpointRanges does even directly allow to define those as subtypes of their types. It's a pretty small package, so it wouldn't be too bad to depend on it. I could have a look at it some day.

maximilian-gelbrecht · 2024-09-19T16:08:05Z

With closures over functions there's also easy ways to do stuff like this in a really compact way:

Like:

set!(simulation, sea_surface_temperature=(λ, ϕ) -> interpolate(ϕ, λ, model.land_sea_mask.mask) < 0.1 ? 273.15 + 30*cosd(ϕ) + 5*sind(λ-30) : NaN)

or

set!(simulation, snow_depth=(λ, ϕ) -> interpolate(ϕ, λ, model.orography.orography) > 1000 ? 100 : 0)

Btw, we should be more consistent with the order of longitude and latitude for function calls (see above)

milankl · 2024-09-19T16:48:41Z

we should be more consistent with the order of longitude and latitude for function calls

It's interpolate functions that do lat-lon instead of lon-lat (what we should do as that also matches Julia matrix ordering with how our grids are organised?

milankl · 2024-09-20T10:59:34Z

test/set.jl

+    u_grid = transform(u, model.spectral_transform)
+    v_grid = transform(v, model.spectral_transform)   
+
+    set!(simulation, u=u_grid, v=v_grid, coslat_scaling_included=false)


as lf is not set here, this uses lf=1 (default)

milankl · 2024-09-20T11:00:12Z

test/set.jl

+    U = similar(u)
+    V = similar(v)
+
+    SpeedyTransforms.UV_from_vordiv!(U, V, prog_new.vor[lf], prog_new.div[lf], model.spectral_transform)


however, this calculates U, V from lf which in this testset is set to lf=2 above

milankl · 2024-09-20T11:03:45Z

test/set.jl

+    u = randn(spectral_grid.SpectralVariable3D, trunc+2, trunc+1, nlayers)
+    v = randn(spectral_grid.SpectralVariable3D, trunc+2, trunc+1, nlayers)


For any transform tests I usually start in spectral space because the grid space has more degrees of freedom meaning that rand of some grid contains a lot of information (high resolution noise) that's lost in the first transform as not even the highest wavenumbers can hold that. You can probably circumvent this by using a linear trunction (instead of the default quadratic) but because we always do quadratic, I'd just make sure to start any spec-grid uv-vor/div roundtrip with information that's surely represented in any space

milankl · 2024-09-20T11:18:49Z

src/dynamics/prognostic_variables.jl

+        curl!(vor, u_new, v_new, S; add)
+        divergence!(div, u_new, v_new, S; add)
+    else 
+        curl!(vor, u_, v_, S; add)
+        divergence!(div, u_, v_, S; add)


there's a division by the radius missing here. As discussed somewhere in the docs (sufficiently I hope) all spectral gradient operators omit the radius scaling as it's not needed in the dynamical core (hence we use vor, div being scaled by the radius). I've therefore added to a bunch of curl and divergence functions a radius keyword (default 1) that one needs to pass on. The SpectralTransform doesn't know about the radius (because it's invariant to the radius)

All spectral gradients with c3f7398 now can take an optional radius keyword argument. Otherwise radius=1 is assumed.

milankl · 2024-09-20T13:36:21Z

I still can't fully close the uv-vordiv roundtrip, there's something off, maybe with the coslat scaling because the error largely projects onto the even zonal modes

working on it

milankl · 2024-09-23T16:19:37Z

Regarding performance, quick comparison between v0.10 and main on my macbook air

Model (default config), resolution	v0.10	main	change
Barotropic T31	20k SYPD	17k SYPD	-18%
Barotropic T85	880 SYPD	700 SYPD	-25%
ShallowWater T31	2800 SYPD	2300 SYPD	-22%
ShallowWater T85	125 SYPD	100 SYPD	-25%

maximilian-gelbrecht · 2024-09-24T08:23:59Z

I did some benchmarks of the transforms and gradients between main and mk/newvariables. Performance between the two branches is pretty much identical. So, the malus has to come from somewhere else

import Pkg
Pkg.activate(".")

using SpeedyWeather, BenchmarkTools

NF = Float32 
trunc = 63
Grid = FullGaussianGrid
#nlayers = 8 

SG = SpectralGrid(; NF, trunc, Grid)
S = SpectralTransform(SG, recompute_legendre=false)

grid = randn(SG.Grid{NF}, SG.nlat_half)
L1 = zeros(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
L2 = rand(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
L3 = zeros(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)
L4 = zeros(LowerTriangularMatrix{Complex{NF}}, trunc+2, trunc+1)

@benchmark spectral!(L1, grid, S)
@benchmark gridded!(grid, L1, S)

#@benchmark transform!(L1, grid, S)
#@benchmark transform!(grid, L1, S)

@benchmark curl!(L3, L1, L2, S)
@benchmark divergence!(L4, L1, L2, S)

@benchmark SpeedyWeather.UV_from_vordiv!(L1, L2, L3, L4, S)

@benchmark ∇²!(L1, L4, S, inverse=false)
@benchmark ∇²!(L1, L4, S, inverse=true)

New branch:

@benchmark transform!(L1, grid, S)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  49.417 μs … 101.292 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     50.417 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   50.637 μs ±   1.018 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▂▆█▇▅▂                                                 
  ▁▂▂▂▂▂▂██████▇▇▄▅▅▄▄▃▃▃▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  49.4 μs         Histogram: frequency by time         54.8 μs <

 Memory estimate: 12.38 KiB, allocs estimate: 250.

@benchmark transform!(grid, L1, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  47.500 μs … 105.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     48.667 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   48.801 μs ±   1.289 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅█▇▃      ▁▃▆▆▆▃▁▁                                          
  ▂▅██████▆▆▆▇████████▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▂▂▂▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  47.5 μs         Histogram: frequency by time         52.6 μs <

 Memory estimate: 12.41 KiB, allocs estimate: 251.

@benchmark curl!(L3, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.532 μs …  3.722 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.727 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.733 μs ± 49.194 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       ▂▇█▆▆▄▂ ▁                             ▂
  ▆▅▃▃▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃██████████▆▆▅▃▅▆▅▅▅▄▄▅▅▄▃▅▄▃▄▄▃▃▁▃▄▃▄ █
  2.53 μs      Histogram: log(frequency) by time        3 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

 @benchmark divergence!(L4, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.505 μs …  3.782 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.694 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.703 μs ± 57.143 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▂▆▆██▇▅▂▁▁                              ▂
  ▄▄▆▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██████████▇▇▅▆▆▅▄▄▆▆▅▅▅▄▃▅▅▃▅▅▄▃▅▁▄▄▄▄▄ █
  2.5 μs       Histogram: log(frequency) by time     2.99 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

@benchmark SpeedyWeather.UV_from_vordiv!(L1, L2, L3, L4, S)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.146 μs …  16.948 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.391 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.431 μs ± 307.718 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▂            ▂█▆▁▂▃▁   ▃▁                                  ▁
  ██▅▃▃▅▅▃▂▂▃▄▃▃███████▇▆███▇▆▆▆▅▆▇▆▅▅▅▅▇▇▇▆▆▅▅▆▆▅▅▅▆▆▆▅▅▆▅▇▅ █
  3.15 μs      Histogram: log(frequency) by time      4.06 μs <

 Memory estimate: 64 bytes, allocs estimate: 1.

@benchmark ∇²!(L1, L4, S, inverse=false)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.446 μs …  5.304 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.621 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.623 μs ± 70.167 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▂   ▂▂▃▁▂▁  ▂▆█▇▆▄▃▂                              ▂
  ▇▇▄▆▄▄▄▃▆██▆▃████████████████▇▇▇█▇▆▅▅▁▆▃▆▄▅▅▆▄▄▅▅▄▅▃▅▁▅▅▄▅ █
  1.45 μs      Histogram: log(frequency) by time     1.87 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

@benchmark ∇²!(L1, L4, S, inverse=true)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.446 μs …  2.358 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.558 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.562 μs ± 22.520 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                      ▁ █▇                    
  ▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃█▁██▄▆▃▂▁▂▂▁▂▂▂▁▂▂▁▂▂▂ ▂
  1.45 μs        Histogram: frequency by time        1.62 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

Main branch:

@benchmark spectral!(L1, grid, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  48.125 μs … 111.625 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     51.958 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   52.343 μs ±   1.526 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                      ▆██▆▆▆▆▆▅▅▄▄▃▂▂     ▁                    ▂
  ▄▃▃▃▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▆██████████████████▇█████▇██▇▆▇▇▆▆▇▇▇▆▆▆▆ █
  48.1 μs       Histogram: log(frequency) by time      57.9 μs <

 Memory estimate: 10.88 KiB, allocs estimate: 202.

@benchmark gridded!(grid, L1, S)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  46.792 μs … 110.000 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     49.209 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   49.510 μs ±   1.329 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                 ▃▅█▂▁                                          
  ▂▁▁▁▁▁▁▁▂▁▂▃▅▅██████▇█▆▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  46.8 μs         Histogram: frequency by time         54.9 μs <

 Memory estimate: 10.91 KiB, allocs estimate: 203.

@benchmark curl!(L3, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.477 μs …  4.042 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.671 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.679 μs ± 49.191 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                           ▆▃█                                
  ▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▄███▆▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂ ▂
  2.48 μs        Histogram: frequency by time         2.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

 @benchmark divergence!(L4, L1, L2, S)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.486 μs …   5.880 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.685 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.707 μs ± 101.890 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▂▆█▇▅▄▂▂                                      ▂
  ▇▆▃▁▁▆▃▁▃▁▁▁▁▁████████▇▇▇▇▆▆▇▆▆▅▆▇▅▇▅▆▆▇▆▆█▇▇▆▆▆▆▆▇▇▆▇▆▇▆▇▇ █
  2.49 μs      Histogram: log(frequency) by time      3.19 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

@benchmark SpeedyWeather.UV_from_vordiv!(L1, L2, L3, L4, S)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.979 μs …  3.183 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.125 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.132 μs ± 45.261 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▁█▆▅▁ ▁▁                                ▁
  ▅▅▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▁▁█████▇██▆▆▆▄▇▅▄▆▅▄▄▅▄▄▅▄▄▄▁▄▅▃▄▄▄▄▅▃▅▃▄ █
  1.98 μs      Histogram: log(frequency) by time     2.38 μs <

 Memory estimate: 16 bytes, allocs estimate: 1.

@benchmark ∇²!(L1, L4, S, inverse=false)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.400 μs …   7.075 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.508 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.522 μs ± 111.858 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  ▅▇█                                          
  ▂▂▁▂▁▁▁▁▁▁▁▁▂▁▂▄███▄▃▃▂▃▂▃▂▂▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  1.4 μs          Histogram: frequency by time        1.75 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

@benchmark ∇²!(L1, L4, S, inverse=true)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.413 μs …   8.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.517 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.528 μs ± 130.292 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                 █                                             
  ▂▂▂▂▁▁▁▁▁▁▁▁▁▂▇█▆█▇▇▇▃▂▂▂▄▅▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▂▂ ▃
  1.41 μs         Histogram: frequency by time        1.75 μs <

 Memory estimate: 32 bytes, allocs estimate: 1.

milankl · 2024-09-24T08:42:00Z

Okay that's very good because that means that all our new indexing doesn't cause any problems in practice. Yay! I bet it's some type instability, maybe needs a function barrier or similar. Given the size of this pull request I don't recall all the changes we made in that direction 🙈

maximilian-gelbrecht · 2024-09-24T13:42:41Z

I was continuing the benchmarking a bit by dissecting dynamics_tendencies! for the PrimitiveWetModel.

There are differences in speed between multiple of the individual functions, but by far the biggest offender both in terms of absolute time and relative difference is vertical_advection!.

There's also e.g. a 4x difference in vertical_integration! but the absolute difference is much smaller than for vertical_advection!

Here are detailed results (benchmarked in blocks that are directly comparable):
mk/newvariables https://gist.github.com/maximilian-gelbrecht/d71181d39536336de7347c54f4515fd9
main https://gist.github.com/maximilian-gelbrecht/8f33757a00e3d7b5bcfb3071994339bd

milankl · 2024-09-24T14:25:02Z

Awesome! Yeah the vertical integration and advection did change a bit in terms of structure without the vertical layering. Maybe I haven't been careful there.

milankl · 2024-09-24T16:00:06Z

I had removed all @inbounds in the dynamics tendencies, they are back in now - let's see how much of a difference that makes

milankl · 2024-09-24T20:44:31Z

Was (T31L8, with CenteredVerticalAdvection{NF, 1})

julia> @btime SpeedyWeather.vertical_advection!($diagn, $model)
  10.062 ms (202808 allocations: 15.47 MiB)

now

julia> @btime SpeedyWeather.vertical_advection!($diagn, $model)
  281.434 μs (24 allocations: 2.81 KiB)

40x faster! 🎉 remaining allocations are Symbol creations, which could be replaced but don't think this is necessary. The reasons was that to create the vertical stencils, (1, 1, 2), (1, 2, 3), (2, 3, 4) we had the following line

k_stencil = max.(min.(nlayers, k-B:k+B), 1)

which allocates a vector and then [2:end] was allocating a new vector... Previously this wasn't a problem but now with the reordering of the vertical this moved two loops down and got really expensive!

maximilian-gelbrecht · 2024-09-25T07:19:11Z

The full benchmark suite:

Main

(https://gist.github.com/maximilian-gelbrecht/97ad1e55c044a1df3ec8d26fb9e8a7d8):

Model	T	L	Physics	Δt	SYPD	Memory
BarotropicModel	31	1	false	1800	61467	1.26 MB
ShallowWaterModel	31	1	false	1800	37730	1.28 MB
PrimitiveDryModel	31	8	true	1800	1611	4.24 MB
PrimitiveWetModel	31	8	true	1800	1239	4.59 MB

`mk/newvariables`

(https://gist.github.com/maximilian-gelbrecht/e712878dc3e8ff9b0a31c566cb605def)

Model	T	L	Physics	Δt	SYPD	Memory
BarotropicModel	31	1	false	1800	63138	1.17 MB
ShallowWaterModel	31	1	false	1800	38006	1.19 MB
PrimitiveDryModel	31	8	true	1800	1984	4.11 MB
PrimitiveWetModel	31	8	true	1800	1431	4.47 MB

So maybe we are even slightly faster now 🎉

milankl · 2024-09-25T12:21:32Z

While we're at it at little performance overview of the calls in dynamics_tendencies! (T31L8 resolution, PrimitiveWetModel)

Function	# transforms	time	share of total	memory allocations
`pressure_gradient_flux!`	2 (2D only)	93.034 μs	3%	260 allocations: 12.88 KiB
`linear_virtual_temperature!`	0	1.762 μs	<0.1%	0 allocations: 0 bytes
`temperature_anomaly!`	0	3.897 μs	0.1%	0 allocations: 0 bytes
`geopotential!`	0	8.988 μs	0.3%	0 allocations: 0 bytes
`vertical_integration!`	0	25.179 μs	1%	0 allocations: 0 bytes
`surface_pressure_tendency!`	1 (2D only)	29.679 μs	1%	130 allocations: 6.44 KiB
`vertical_velocity!`	0	7.813 μs	0.2%	0 allocations: 0 bytes
`linear_pressure_gradient!`	0	1.454 μs	>0.1%	0 allocations: 0 bytes)
`vertical_advection!`	0	294.082 μs	10%	24 allocations: 2.81 KiB
`vordiv_tendencies!`	2	656.258 μs	23%	1940 allocations: 98.38 KiB
`temperature_tendency!`	3	1.175 ms	42%	2910 allocations: 147.56 KiB
`humidity_tendency!`	3	1.092 ms	39%	2910 allocations: 147.56 KiB
`bernoulli_potential!`	1	272.455 μs	10%	970 allocations: 49.19 KiB
all	9x3D + 2x2D	2.8ms	100%	9144 allocations: 464.81 KiB

(Doesn't add up, it's faster to benchmark all together...)
So the transforms very much dominate the RHS calculations, most other kernels (except vertical advection at 10% now) are negligible either in grid or spectral space. @jackleland that also shows you how important the transforms are 🙈 .
@maximilian-gelbrecht This looks like another benchmark we could do regularly.

Translated to a good pie chart we have

maximilian-gelbrecht · 2024-09-25T12:45:28Z

Yes, I can add them to the benchmark suite.

PrognosticVariables with LowerTriangularArray

322afb3

milankl added structure 🏠 Internal structure of the code array types 🔢 LowerTriangularMatrices and RingGrids dynamics 〰️ Affects the dynamical core labels Apr 25, 2024

maximilian-gelbrecht and others added 19 commits May 16, 2024 17:16

WIP: initial indexing changes

6ad2472

WIP: indexing not working yet

d48246b

WIP indexing (probably) working

624654a

WIP: zeros, ones, rand, ...

94a8337

WIP: copyto!

37045fc

copyto! working

3249876

LTA N-1 dim change tests passing

2997c3e

adjusting for size(alms)

4873de7

fast cartesian indexing + broadcasting

eb8725b

clean up explicit arithmetics functions

a7183f9

code cleanup broadcasting

bc196fe

remove add! tests

df15546

show LowerTriangularMatrix

676c630

fix LTA plot

6fea1c5

PrognosticVariables new structure

b884c56

DiagnosticVariables new structure

90100cb

zeros for Vector{Particle}

8ab67c1

no set vars tests due to new structure

891987e

SpectralGrid with Spectral/GridVariable2D/3D types

ccdc992

milankl added 2 commits September 19, 2024 18:42

export set!

139ab28

reworking UV from vordiv tests with set!

91e76df

milankl commented Sep 20, 2024

View reviewed changes

milankl added 3 commits September 20, 2024 16:48

move radius keyword into spectral gradient kernels

c3f7398

set! linting

87c846a

spectral gradients: typo corrected

b61bd7c

@inbounds in dynamics tendencies

4fc9414

milankl added 3 commits September 24, 2024 18:26

FLoops dependency removed

4088358

allocation free vertical velocity

5a3e4e2

(almost) non-allocating vertical advection

1ce6fdb

general vertical advection stencil via ntuple

ca4ea51

milankl merged commit 798f9bb into main Sep 25, 2024
6 checks passed

This was referenced Sep 25, 2024

Correct vertical advection of humidity #556

Closed

Modularise netCDF output #522

Closed

maximilian-gelbrecht mentioned this pull request Sep 25, 2024

Benchmarks for individual dynamics functions #577

Merged

milankl deleted the mk/newvariables branch October 7, 2024 18:43

milankl mentioned this pull request Dec 19, 2024

Issue with Model Execution When trunc is Greater Than or Equal to 600 #644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructured dynamical core, variables array-agnostic and 3-dimensional, modular netCDF output #525

Restructured dynamical core, variables array-agnostic and 3-dimensional, modular netCDF output #525

milankl commented Apr 25, 2024

milankl commented Apr 25, 2024

maximilian-gelbrecht commented May 2, 2024 •

edited

Loading

maximilian-gelbrecht commented May 2, 2024 •

edited

Loading

maximilian-gelbrecht commented May 2, 2024

milankl commented May 2, 2024

maximilian-gelbrecht commented May 2, 2024 •

edited

Loading

maximilian-gelbrecht commented May 2, 2024

maximilian-gelbrecht commented Sep 19, 2024 •

edited

Loading

milankl commented Sep 19, 2024

milankl Sep 20, 2024

milankl Sep 20, 2024

milankl Sep 20, 2024

milankl Sep 20, 2024

milankl Sep 23, 2024 •

edited

Loading

milankl commented Sep 20, 2024 •

edited

Loading

milankl commented Sep 23, 2024

maximilian-gelbrecht commented Sep 24, 2024 •

edited

Loading

milankl commented Sep 24, 2024

maximilian-gelbrecht commented Sep 24, 2024 •

edited

Loading

milankl commented Sep 24, 2024

milankl commented Sep 24, 2024

milankl commented Sep 24, 2024

maximilian-gelbrecht commented Sep 25, 2024 •

edited

Loading

milankl commented Sep 25, 2024 •

edited

Loading

maximilian-gelbrecht commented Sep 25, 2024

		u = randn(spectral_grid.SpectralVariable3D, trunc+2, trunc+1, nlayers)
		v = randn(spectral_grid.SpectralVariable3D, trunc+2, trunc+1, nlayers)

Restructured dynamical core, variables array-agnostic and 3-dimensional, modular netCDF output #525

Restructured dynamical core, variables array-agnostic and 3-dimensional, modular netCDF output #525

Conversation

milankl commented Apr 25, 2024

milankl commented Apr 25, 2024

maximilian-gelbrecht commented May 2, 2024 • edited Loading

maximilian-gelbrecht commented May 2, 2024 • edited Loading

maximilian-gelbrecht commented May 2, 2024

milankl commented May 2, 2024

maximilian-gelbrecht commented May 2, 2024 • edited Loading

maximilian-gelbrecht commented May 2, 2024

maximilian-gelbrecht commented Sep 19, 2024 • edited Loading

milankl commented Sep 19, 2024

milankl Sep 20, 2024

Choose a reason for hiding this comment

milankl Sep 20, 2024

Choose a reason for hiding this comment

milankl Sep 20, 2024

Choose a reason for hiding this comment

milankl Sep 20, 2024

Choose a reason for hiding this comment

milankl Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

milankl commented Sep 20, 2024 • edited Loading

milankl commented Sep 23, 2024

maximilian-gelbrecht commented Sep 24, 2024 • edited Loading

milankl commented Sep 24, 2024

maximilian-gelbrecht commented Sep 24, 2024 • edited Loading

milankl commented Sep 24, 2024

milankl commented Sep 24, 2024

milankl commented Sep 24, 2024

maximilian-gelbrecht commented Sep 25, 2024 • edited Loading

Main

mk/newvariables

milankl commented Sep 25, 2024 • edited Loading

maximilian-gelbrecht commented Sep 25, 2024

maximilian-gelbrecht commented May 2, 2024 •

edited

Loading

maximilian-gelbrecht commented May 2, 2024 •

edited

Loading

maximilian-gelbrecht commented May 2, 2024 •

edited

Loading

maximilian-gelbrecht commented Sep 19, 2024 •

edited

Loading

milankl Sep 23, 2024 •

edited

Loading

milankl commented Sep 20, 2024 •

edited

Loading

maximilian-gelbrecht commented Sep 24, 2024 •

edited

Loading

maximilian-gelbrecht commented Sep 24, 2024 •

edited

Loading

maximilian-gelbrecht commented Sep 25, 2024 •

edited

Loading

`mk/newvariables`

milankl commented Sep 25, 2024 •

edited

Loading