You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you know why i got this problem with pretrain_gpt_single_node.sh?
I'm setting N_GPUS=1
and got
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 191, in _get_group_rank
raise RuntimeError("The given group does not exist")
RuntimeError: The given group does not exist
from
Megatron-DeepSpeed/megatron/training.py", line 400, in setup_model_and_optimizer
model = get_model(model_provider_func)
i'm using NCG docker with pytorch and apex, deepspeed and other packages installed from you requirements.txt
my setup is 2x 3090
The text was updated successfully, but these errors were encountered:
germanjke
changed the title
The given group does not exist
The given group does not exist pytorch
Apr 25, 2023
Do you know why i got this problem with
pretrain_gpt_single_node.sh
?I'm setting
N_GPUS=1
and got
from
i'm using NCG docker with
pytorch
andapex
,deepspeed
and other packages installed from yourequirements.txt
my setup is 2x 3090
The text was updated successfully, but these errors were encountered: