Offload matrix based interpolation #239

l90lpa · 2024-11-16T00:12:22Z

I've opened this as a draft PR to get early feedback and to recognise that we might want to iterate on the design.

This PR:

adds a SparseMatrix class into Atlas that wraps eckit's SparseMatrix class to combine it with GPU memory management
adds a hicsparse backend to Atlas's sparse linear algebra interface
adds "multiply-add" support into Atlas's sparse linear algebra interface
updates interpolation to support use of the hicparse backend of Atlas's sparse linear algebra interface

…eMatrix.

…ckend.

wdeconinck · 2024-11-18T21:14:26Z

Thanks @l90lpa, so far I had a first look at the atlas::linalg::SparseMatrix class only.

I am making a note that in principle it could also just inherit from eckit::linalg::SparseMatrix, and add the device capability on top. On the other hand this way it's possible to implement it differently, and construct an eckit::linalg::SparseMatrix only when needed.

src/atlas/linalg/SparseMatrix.cc

wdeconinck · 2024-11-18T21:20:18Z

It should be mentioned explicitly that this PR depends on first merging #237 and should then be rebased on develop.

wdeconinck · 2024-11-19T11:20:50Z

Perhaps this PR is adding too many ingredients at once.
Maybe we could just focus on the gpu-offloading first and add the 'multiply-add' later?

In that respect of multiply-add (for a new PR), should matrix_multiply_add not immediately also implement the following formula including $$\alpha$$ and $$\beta$$?

$$y = \alpha A x + \beta y$$

I guess here you have $$\alpha=1$$ and $$\beta=1$$, so that:

$$y = A x + y$$

That could still be a simpler overloaded API for sure.

l90lpa · 2024-11-19T12:48:42Z

Perhaps this PR is adding too many ingredients at once. Maybe we could just focus on the gpu-offloading first and add the 'multiply-add' later?

In that respect of multiply-add (for a new PR), should matrix_multiply_add not immediately also implement the following formula including α and β ?
y = α A x + β y

I guess here you have α = 1 and β = 1 , so that:
y = A x + y

That could still be a simpler overloaded API for sure.

Regarding PR content: I'm happy to break-up the PR however you would like. I only included the complete work so that you would be able to see an overview in advance. That said, I just want to mention that the reason multiply-add is part of this work is to facilitate the GPU offload of interpolation's execute_adjoint.

Regarding multiply-add: whether matrix_multiply_add should immediately implement the more broad interface is up to you (I'm happy either way). I chose not to, because I wasn't aware of a need for the additional behaviour, and so I didn't want to introduce an interface that wouldn't get used.

wdeconinck · 2024-11-19T14:08:15Z

I appreciate very much this "draft" PR to show the complete work! Thanks! 🙏
I had a deeper look now and I definitely like your approach taken so far.

Ideally with the overall design in mind now this can be split several different PRs in this order:

Implementing multiply-add, possibly with the extended capability using $$\alpha$$ and $$\beta$$...
...and update the adjoint interpolation methods to use this. Because this routine did not exist before, @MarekWlasak implemented the adjoint as first a matrix-multiply followed by a += . Perhaps this approach should still be used for the eckit-backend codepaths, and could possibly live inside the matrix_multiply_add implementation for the eckit-backend (within atlas) itself.
Implement GPU-offloading capability for the new sparse matrix
Create hicsparse backend
Adapt interpolation methods to enable the GPU offloading.

l90lpa · 2024-11-19T14:20:16Z

That sounds like a great plan, and thanks for taking a look so far! I'll work on submitting those PR's.

…device memory.

l90lpa · 2024-11-21T21:44:42Z

I've submitted PR's for the first 2 items (#240 and #241). And, I'll submitted the remaining 2 PRs as their dependencies pass review and get merged in.

l90lpa added 9 commits November 15, 2024 23:48

Extract dummyShouldNotBeCalled function and hic namespace macro.

317c499

Find and link to hipsparse and cusparse.

ccccf70

Add hicsparse wrapper to parts of hipsparse and cusparse.

47c246a

Add SparseMatrix class into Atlas with host-device memory management.

17aa0e4

Use atlas::linalg::SparseMatrix class instead of eckit::linalg::Spars…

082faaa

…eMatrix.

Add hicSparse backend to sparse matrix multiply.

a4a423e

Add multiply_add function for sparse matrix linear algebra.

a20766e

Update interpolation to work with hicsparse sparse matrix multiply ba…

8ed5e97

…ckend.

Cache hicSparse handle to avoid library re-initialization.

487ff07

github-actions bot added the contributor label Nov 16, 2024

wdeconinck reviewed Nov 18, 2024

View reviewed changes

src/atlas/linalg/SparseMatrix.cc Show resolved Hide resolved

wdeconinck added the approved-for-ci label Nov 18, 2024

github-actions bot removed the approved-for-ci label Nov 21, 2024

l90lpa added 2 commits November 21, 2024 19:50

Enable on-device halo exchange in interpolation.

a79d8eb

Update add interpolation offload test to exercise passing in updated …

9a33983

…device memory.

l90lpa force-pushed the feature/hicsparse-sparse-linalg-backend branch from 761d459 to 9a33983 Compare November 21, 2024 19:51

l90lpa mentioned this pull request Nov 21, 2024

Interpolation GPU Offload JCSDA-internal/atlas#97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload matrix based interpolation #239

Offload matrix based interpolation #239

l90lpa commented Nov 16, 2024

wdeconinck commented Nov 18, 2024

wdeconinck commented Nov 18, 2024

wdeconinck commented Nov 19, 2024

l90lpa commented Nov 19, 2024 •

edited

Loading

wdeconinck commented Nov 19, 2024

l90lpa commented Nov 19, 2024

l90lpa commented Nov 21, 2024

Offload matrix based interpolation #239

Are you sure you want to change the base?

Offload matrix based interpolation #239

Conversation

l90lpa commented Nov 16, 2024

wdeconinck commented Nov 18, 2024

wdeconinck commented Nov 18, 2024

wdeconinck commented Nov 19, 2024

l90lpa commented Nov 19, 2024 • edited Loading

wdeconinck commented Nov 19, 2024

l90lpa commented Nov 19, 2024

l90lpa commented Nov 21, 2024

l90lpa commented Nov 19, 2024 •

edited

Loading