-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen model issues & embedding and loss has nan #52
Comments
+1 |
请问是sft阶段还是dpo阶段啊,我在作者的框架下,用sft微调chatglm3 loss也会nan |
dpo |
Hi, any update on this? Were you able to fix this issue? |
I've got this problem too when using the model Qwen2.5-7B. |
after a loss backward and optimizer step, then forward the embedding layer output hidden states become inf and loss is nan.
The text was updated successfully, but these errors were encountered: