Replies: 3 comments 16 replies
-
Option 3** Define the req.wait to be ordered within the execution space and require the application to fence if it needs to. That way, asynchronous data movements (e.g., the unpack of a message) on a CUDA stream for example don't have to be fenced if the next kernel is submitted on that same stream. This would be true for NCCL (if you want to use that as a backend) and may one day be true for MPI as well. In what situations would the user have to fence the output of the communication anyway? Isn't everything ordered inside an execution context? (this may be my ignorance of kokkos semantics, apologies) |
Beta Was this translation helpful? Give feedback.
-
Option 2 is very error-prone IMO, because the behavior of |
Beta Was this translation helpful? Give feedback.
-
I'm kind of inclined to template |
Beta Was this translation helpful? Give feedback.
-
Option 1
req = isend(space, ...); req.wait(); // or, without spaces req = isend(...); req.wait();
For the above, to ensure communication is actually done,
req.wait()
may in general need to fencespace
, which meansReq
needs to have an execution space member in it. SinceReq
is not a template struct (as currently proposed), that member execution space would need to be type-erased.We could sidestep this by requiring that the user fence any space instances they explicitly use. This is kind of a bummer because now
wait
is not sufficient to finish the communication.Option 2
Discussion
The challenege with option 1 is that it makes the following pattern hard to implement:
For
wait_all
to know thatspace.fence
only needs to be called once,wait_all
needs to see the actual execution space instances held in theReq
, but it can't because they've been type-erased. We'd have to use RTTI to tag theReq
with the type of the held space, and then do some dynamic casting and call the space'soperator==
to issue only one fence for each space instance present across all requests.Or
Req
could be templated on the Kokkos Execution space. Maybe this is fine - we could do something like a HostReq and a Req for Requests against host and device spaces.Beta Was this translation helpful? Give feedback.
All reactions