-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCC/CTX: passing cuda-check from tl ucp to mlx5 #1013
base: master
Are you sure you want to change the base?
Conversation
@Sergei-Lebedev can you please take a look at this small patch? |
@MamziB please fix commit title to pass code style check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a general question regarding the backwards compatibility
Also, since we're introducing the capability for a specific TL - do we have to update other TL's as well? (including private ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my opinion it's more logical to return supported memory types as part of attributes. Since you need to query service team you can add additional field to attributes and check it in tl mlx5
After conducting a thorough code review of UCC, I found that implementing the desired feature without modifying the existing API seems unfeasible. Specifically, we need to make changes in the attribute section. Currently, the When the function
|
@manjugv hey manjo, did you have any thoughts on this PR? I think you mentioned you would take a look at it |
@manjugv ping |
@manjugv At this stage, I'm focused on exploring all potential solutions for this issue, which is why Tommy, Sergey, and I haven't reached a conclustion yet. We're aiming to be thorough and consider various angles before deciding on a path forward. If you have any additional insights or alternative approaches in mind for this PR, we’d greatly appreciate your guidance. |
Can one of the admins verify this patch? |
Inside TL MLX5, we need to know if TL Service has cuda-support or not. Since we cannot call ucp_context_query() directly inside TL MLX5, we use a shared variable and pass it to TL MLX5. This can be extended to other capabilities in the future as well.