Replies: 3 comments 1 reply
-
@nicoleavans, you might find this discussion interesting. |
Beta Was this translation helpful? Give feedback.
-
I think this is conceptually analogous to what I was working towards here: #64. I'm quite unsatisfied with the interface I came up with for the A problem I ran in to is that it felt like point-to-point vs various collectives really each wanted The interface I propose currently supports splitting a message into multiple messages (I was thinking ahead to partitioned communication) |
Beta Was this translation helpful? Give feedback.
-
I have a revised attempt, no PR yet, but the conclusion is something like this (from 2d halo perf test). It's kind of analogous to the Group interface in NCCL (@dssgabriel) diff --git a/perf_tests/test_2dhalo.cpp b/perf_tests/test_2dhalo.cpp
index 8b34381..638f267 100644
--- a/perf_tests/test_2dhalo.cpp
+++ b/perf_tests/test_2dhalo.cpp
@@ -46,12 +46,12 @@ void send_recv(benchmark::State &, MPI_Comm comm, const Space &space, int nx, in
auto ym1_s = Kokkos::subview(v, make_pair(1, nx + 1), 1, Kokkos::ALL);
auto ym1_r = Kokkos::subview(v, make_pair(1, nx + 1), 0, Kokkos::ALL);
- std::vector<KokkosComm::Req> reqs;
- // std::cerr << get_rank(rx, ry) << " -> " << get_rank(xp1, ry) << "\n";
- reqs.push_back(KokkosComm::isend(space, xp1_s, get_rank(xp1, ry), 0, comm));
- reqs.push_back(KokkosComm::isend(space, xm1_s, get_rank(xm1, ry), 1, comm));
- reqs.push_back(KokkosComm::isend(space, yp1_s, get_rank(rx, yp1), 2, comm));
- reqs.push_back(KokkosComm::isend(space, ym1_s, get_rank(rx, ym1), 3, comm));
+ KokkosComm::Handle<Space> h = KokkosComm::plan(space, comm, [=](KokkosComm::Handle<Space> &handle) {
+ KokkosComm::isend(handle, xp1_s, get_rank(xp1, ry), 0);
+ KokkosComm::isend(handle, xm1_s, get_rank(xm1, ry), 1);
+ KokkosComm::isend(handle, yp1_s, get_rank(rx, yp1), 2);
+ KokkosComm::isend(handle, ym1_s, get_rank(rx, ym1), 3);
+ });
KokkosComm::recv(space, xm1_r, get_rank(xm1, ry), 0, comm);
KokkosComm::recv(space, xp1_r, get_rank(xp1, ry), 1, comm);
@@ -59,9 +59,7 @@ void send_recv(benchmark::State &, MPI_Comm comm, const Space &space, int nx, in
KokkosComm::recv(space, yp1_r, get_rank(rx, yp1), 3, comm);
// wait for comm
- for (KokkosComm::Req &req : reqs) {
- req.wait();
- }
+ h.wait();
} I think the names of the types are a work in progress, but this fixes the problem with the current implementation where With this construction, the communicator and space are associated with the handle, rather than each operation. Those |
Beta Was this translation helpful? Give feedback.
-
Non-contiguous Kokkos views may have to be either serialized (into a temporary buffer) or an MPI datatype has to be created and committed. Currently, these resources are managed on the fly, which may come at a cost. In #50 (comment) I suggested a wrapper that holds whatever kokkos deems necessary to perform the communication of a view.
A slightly refined version of this could be:
Alternatively, KokkosComm could cache the datatype in the session object that will be introduced later. That will not help with the buffer problem though.
A downside of the Resource handle is that it is not per-se thread-safe if a buffer is used instead of a datatype, unless buffers are allocated on a per-thread basis. This adds some complexity, either to the application or to the implementation.
Beta Was this translation helpful? Give feedback.
All reactions