Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’ #89

Open
Alan-D-Chen opened this issue Sep 23, 2024 · 2 comments

Comments

@Alan-D-Chen
Copy link

Alan-D-Chen commented Sep 23, 2024

★---> train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l    -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up   -dates': 5024}
★---> train stats after 160800 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l  2ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4887', 'grad_norm': 'nan', 'counters/examples': 160800, 'counters/updates': 5025}

I have trained in SFT, and got the XXX.pt file. What is wrong with this staff ?

@Alan-D-Chen
Copy link
Author

And my SFT is not convergent, what is more, the DPO even just do not work.

In SFT, I run:
python -u train.py model=pythia69 datasets=[hh] loss=sft exp_name=anthropic_dpo_pythia69 gradient_accumulation_steps=2 batch_size=64 eval_batch_size=32 trainer=FSDPTrainer sample_during_eval=false

but the results are
image.png

In DPO, I run:
python -u train.py model=llama7b model.name_or_path=/workspace/sa/L20_TEST/LLM_models/llama2/hf_7B/ datasets=[hh] loss=dpo loss.beta=0.1 exp_name=anthropic_dpo_pythia69 gradient_accumulation_steps=2 batch_size=64 eval_batch_size=32 trainer=FSDPTrainer sample_during_eval=false model.fsdp_policy_mp=bfloat16 model.archive=.cache/root/anthropic_dpo_pythia69_2024-09-14_10-30-03_907191/step-159744/policy.pt

But the results are
`★---> train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}

★---> train stats after 160800 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l 2ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4887', 'grad_norm': 'nan', 'counters/examples': 160800, 'counters/updates': 5025}

Finished generating 1 epochs on train split
writing checkpoint to .cache/root/anthropic_dpo_pythia69_2024-09-22_09-38-57_157738/LATEST/policy.pt...
[rank0]:[2024-09-22 18:41:34,834] torch.distributed.fsdp._debug_utils: [WARNING] FSDP _optim_state_dict() profiling: defaultdict(<class 'float'>, {'preprocessing': 0.012136668432503939, 'preprocessing_with_comm': 0.042172754649072886, 'state_converting': 13.85691294586286, <Type.ALL: 'all'>: 13.912817124743015})
writing checkpoint to .cache/root/anthropic_dpo_pythia69_2024-09-22_09-38-57_157738/LATEST/optimizer.pt...
writing checkpoint to .cache/root/anthropic_dpo_pythia69_2024-09-22_09-38-57_157738/LATEST/scheduler.pt...
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync .cache/root/wandb/offline-run-20240922_094100-59mk2v5i`

@Alan-D-Chen
Copy link
Author

Alan-D-Chen commented Sep 23, 2024

@eric-mitchell Dear, would you like do me a favor ?Can you help me ? HUGE thanks for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant