-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mulit-GPU and CUDA Stream Support #60
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. Is there a way we can test if this is working (perhaps from torch?). I remember that it wasn't that easy with sphericart. Perhaps we can launch 10 small mops operations on 10 different CUDA streams and see if we get a speed-up. Would that make any sense @nickjbrowning?
#ifndef MOPS_CUDA_ENABLED | ||
C10_THROW_ERROR(ValueError, "MOPS was not compiled with CUDA support " + A.device().str()); | ||
#else | ||
c10::cuda::CUDAGuard deviceGuard{A.device()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this deviceGuard
do? I see that it's not being used explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sets the current CUDA device to be the same one as A.device()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no easy way that I can see for us to test whether a kernel has launched on a specific stream from PyTorch. We can probably do this with the CUDA API but that seems a bit overkill.
#ifdef MOPS_CUDA_ENABLED | ||
#include <c10/cuda/CUDAGuard.h> | ||
#include <c10/cuda/CUDAStream.h> | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that mops-torch/src/sap.cpp
has not been modified past the headers (i.e. the stream is not actually taken into account), and the same is true for opsaw
and sasaw
. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed this for SAP. OPSAW and SASAW aren't implemented yet (SASAW is in a different branch) so when I get back to that I'll make it consistent.
OPSA supported.