-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Umpire memory manager for GPU pool memory allocation #943
base: vlasiator_gpu
Are you sure you want to change the base?
Conversation
This reverts commit 3a14535.
I also now built Umpire on Mahti so I can trial this - is this sufficient for building or do you think we need additional flags?
|
Ah, ok, I think I see at least one reason why this might be causing errors. In regular CUDA/HIP code, one can use the same fpuFree macro for both UM and device memory, but here we need to have a specific call for freeing UM memory. In Vlasiator_gpu, those haven't yet been distinguished. Also, I guess Hashinator will need to be updated to support Umpire to really benefit from it. |
For Hashinator we would "just" need to add a new split allocator that uses Umpire. |
I think that should probably be ok. I didn't specify the CUDA architecture, but if it works, then you shouldn't need other stuff. |
Myep, even after fixing those two calls it still complains on exit:
The address Interestingly, as I was unable to debug this on Mahti, I then switched to my own desktop computer with a GTX1060. Built Umpire, compiled, run, and.... no error. :) |
I notice now that the allocators constructed here do not use the syntax for umpire threadsafe allocators: |
This PR adds Umpire memory manager for GPU pool memory allocation. However, the implementation crashes due to a silent error in the base version, see the below attached Zulip discussion. I mark this as draft, as it probably makes sense to fix the base version error first.
Zulip:
"Ok, I tried to figure out what is wrong, and it looks like the problem is not the Umpire implementation, but an already existing issue in the vlasiator_gpu branch, at least since
The reason the bug only shows up with the Umpire implementation is that the
Managed
class ingpu_base.hpp
does not have error handling (ie, the gpuFree() just errors silently and execution continues):If I add error handling, then the program fails exactly at the same location where Umpire implementation fails:
with the following output (on Mahti):
"