You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SETUP:
I have a server with 8x V100 GPUs and have successfully used the fsdp_qlora script to train a Llama-3.1-70B model on a custom dataset using hqq_dora. Attempting to convert the model_state_dict.safetensors to adapter_model.safetensors using a python script nearly identical to #60
ERROR:
CUDA out of memory error occurs at approximately 70% of loading the model:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 31.73 GiB of which 10.44 MiB is free. Process 1081625 has 398.00 MiB memory in use. Process 1083739 has 886.00 MiB memory in use. Including non-PyTorch memory, this process has 30.46 GiB memory in use. Of the allocated memory 29.92 GiB is allocated by PyTorch, and 193.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
NVIDIA-SMI shows that the model is being loaded only on GPU0:
SETUP:
I have a server with 8x V100 GPUs and have successfully used the fsdp_qlora script to train a Llama-3.1-70B model on a custom dataset using hqq_dora. Attempting to convert the model_state_dict.safetensors to adapter_model.safetensors using a python script nearly identical to #60
ERROR:
CUDA out of memory error occurs at approximately 70% of loading the model:
NVIDIA-SMI shows that the model is being loaded only on GPU0:
How can I use multiple GPUs to perform the conversion and avoid the OOM error?
The text was updated successfully, but these errors were encountered: