cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu #75

sjtu-zwh · 2024-09-11T13:23:20Z

I used docker image "flexflow/flexflow-cuda-12.1:latest" to run flexflow on a 24GB RTX 3090,but it generated a out of memory error：

import flexflow.serve as ff 
ff.init(num_gpus=1, memory_per_gpu=11000, zero_copy_memory_per_node=11000)
[0 - 7fe1339374c0]    0.000000 {5}{gpu}: /usr/FlexFlow/deps/legion/runtime/realm/cuda/cuda_module.cc(4745):CUDA_DRIVER_FNPTR(cuIpcGetMemHandle)(&alloc.ipc_handle, alloc.dev_ptr) = 2(CUDA_ERROR_OUT_OF_MEMORY): out of memory
Aborted

Was it because I used the wrong code? How can I fix it?

The text was updated successfully, but these errors were encountered:

lhr-30 · 2024-09-12T03:03:42Z

maybe you can set the memory_per_gpu to a larger number like 20000, etc.

sjtu-zwh · 2024-09-12T05:19:04Z

maybe you can set the memory_per_gpu to a larger number like 20000, etc.

I tried to set the memory_per_gpu to 21000, but still got the out of memory error

import flexflow.serve as ff
ff.init(num_gpus=1,memory_per_gpu=21000, zero_copy_memory_per_node=21000)
[0 - 7f474c55f4c0]    0.000000 {5}{gpu}: /usr/FlexFlow/deps/legion/runtime/realm/cuda/cuda_module.cc(4745):CUDA_DRIVER_FNPTR(cuIpcGetMemHandle)(&alloc.ipc_handle, alloc.dev_ptr) = 2(CUDA_ERROR_OUT_OF_MEMORY): out of memory
Aborted

lhr-30 · 2024-09-12T06:55:28Z

which model did you use? It seems that when you launch it, the OOM happens

sjtu-zwh · 2024-09-12T11:35:44Z

which model did you use? It seems that when you launch it, the OOM happens

I did not load any model, just initialized the flexflow backend.

sjtu-zwh changed the title ~~CUDA out of memory when I use flexflow on one gpu~~ cuIpcGetMemHandle triggled CUDA out of memory when I use flexflow on one gpu Sep 19, 2024

sjtu-zwh changed the title ~~cuIpcGetMemHandle triggled CUDA out of memory when I use flexflow on one gpu~~ cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu Sep 19, 2024

lockshaw transferred this issue from flexflow/flexflow-train Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu #75

cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu #75

sjtu-zwh commented Sep 11, 2024 •

edited

Loading

lhr-30 commented Sep 12, 2024

sjtu-zwh commented Sep 12, 2024

lhr-30 commented Sep 12, 2024

sjtu-zwh commented Sep 12, 2024

cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu #75

cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu #75

Comments

sjtu-zwh commented Sep 11, 2024 • edited Loading

lhr-30 commented Sep 12, 2024

sjtu-zwh commented Sep 12, 2024

lhr-30 commented Sep 12, 2024

sjtu-zwh commented Sep 12, 2024

sjtu-zwh commented Sep 11, 2024 •

edited

Loading