Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu #75

Open
sjtu-zwh opened this issue Sep 11, 2024 · 4 comments

Comments

@sjtu-zwh
Copy link

sjtu-zwh commented Sep 11, 2024

I used docker image "flexflow/flexflow-cuda-12.1:latest" to run flexflow on a 24GB RTX 3090,but it generated a out of memory error:

import flexflow.serve as ff 
ff.init(num_gpus=1, memory_per_gpu=11000, zero_copy_memory_per_node=11000)
[0 - 7fe1339374c0]    0.000000 {5}{gpu}: /usr/FlexFlow/deps/legion/runtime/realm/cuda/cuda_module.cc(4745):CUDA_DRIVER_FNPTR(cuIpcGetMemHandle)(&alloc.ipc_handle, alloc.dev_ptr) = 2(CUDA_ERROR_OUT_OF_MEMORY): out of memory
Aborted

Was it because I used the wrong code? How can I fix it?

@lhr-30
Copy link

lhr-30 commented Sep 12, 2024

maybe you can set the memory_per_gpu to a larger number like 20000, etc.

@sjtu-zwh
Copy link
Author

maybe you can set the memory_per_gpu to a larger number like 20000, etc.

I tried to set the memory_per_gpu to 21000, but still got the out of memory error

import flexflow.serve as ff
ff.init(num_gpus=1,memory_per_gpu=21000, zero_copy_memory_per_node=21000)
[0 - 7f474c55f4c0]    0.000000 {5}{gpu}: /usr/FlexFlow/deps/legion/runtime/realm/cuda/cuda_module.cc(4745):CUDA_DRIVER_FNPTR(cuIpcGetMemHandle)(&alloc.ipc_handle, alloc.dev_ptr) = 2(CUDA_ERROR_OUT_OF_MEMORY): out of memory
Aborted

@lhr-30
Copy link

lhr-30 commented Sep 12, 2024

which model did you use? It seems that when you launch it, the OOM happens

@sjtu-zwh
Copy link
Author

which model did you use? It seems that when you launch it, the OOM happens

I did not load any model, just initialized the flexflow backend.

@sjtu-zwh sjtu-zwh changed the title CUDA out of memory when I use flexflow on one gpu cuIpcGetMemHandle triggled CUDA out of memory when I use flexflow on one gpu Sep 19, 2024
@sjtu-zwh sjtu-zwh changed the title cuIpcGetMemHandle triggled CUDA out of memory when I use flexflow on one gpu cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu Sep 19, 2024
@lockshaw lockshaw transferred this issue from flexflow/flexflow-train Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants