Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add explicit memory mapping #1037

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

wfaderhold21
Copy link
Collaborator

What

Adds explicit mapping of user-provided memory for various TLs.

@wfaderhold21
Copy link
Collaborator Author

@nsarka @janjust @Sergei-Lebedev @manjugv May be easier to discuss / track updates to API here.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@wfaderhold21
Copy link
Collaborator Author

@manjugv @janjust Any comments?

*
* @return Error code as defined by @ref ucc_status_t.
*/
ucc_status_t ucc_mem_exchange_post(ucc_mem_map_mem_h *memh);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfaderhold21 Is there a need to have an explicit call for exchanging memory handles? Why can't one use allgather or other collectives?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manjugv Fundamentally, this is an allgather collective. This explicitly has the UCC library perform the operation rather than the User. Is it better to have the User exchange the opaque structure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value add, and need for explicit collective call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manjugv removed the explicit collective.

@manjugv
Copy link
Contributor

manjugv commented Nov 27, 2024

Ping @Sergei-Lebedev @janjust

*
* This routine maps a user-specified memory segment with a ucc_context_h. The
* segment is considered "mapped" with the context until either the user calls
* ucc_mem_unmap or ucc_context_destroy(). A handle to the mapped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to let user control map and unmap i.e. it's user responsibility to unmap all segments before context destroy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem relates to whether mapped memory is associated with a context or its own object. From my understanding, mapped memory would be used by collectives for communication purposes, meaning it would need to be associated with a context as a context contains network resources. In that case, when a user decides to destroy a context, the memory must be unmapped in order to prevent erroneous usage afterwards. @manjugv please correct me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but my question is why UCC should track it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the text to make it the user's responsibility

* @return Error code as defined by ucc_status_t.
*/

ucc_status_t ucc_mem_map(ucc_context_h context, ucc_mem_map_flags_t flags,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it global function that does allgather or it just creates local handle?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global function that does allgather

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, in this case how that should work if context is not global and we don't have any OOB function at this point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me a little here. If we create a local context, can we perform communication on that context between other processes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfaderhold21 add size parameter

*
* @return Error code as defined by ucc_status_t.
*/
ucc_status_t ucc_mem_unmap(ucc_mem_map_mem_h *memh);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need some collective operation to unmap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can be performed locally. is there a reason to make it collective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for cuda ipc registration do we need to notify remote rank that memory is unmapped?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point. i think in order to be safe, we should make this a collective operation

Copy link
Collaborator

@nsarka nsarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API change looks good. I'm interested in seeing the implementation, specifically how it will keep track of which TLs have associated memhandles. For example, will this map also create a shmem region for TL/SHM? Can we somehow use it to avoid extra memcpys in/out of shmem?

@janjust
Copy link
Collaborator

janjust commented Dec 19, 2024

@manjugv we have an agreement, changes are reflected - if you approve, we can merge

@manjugv
Copy link
Contributor

manjugv commented Dec 19, 2024

@janjust We will wait for Ferrol to put together an implementation, so we have API + implementation to go with it. Also, I have requested @wfaderhold21 to add an FAQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants