-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qs #230
Comments
Maybe you can consider if drop-less MOE mode can solve your issue, which is achieved by setting |
The results are still diverse for each process and the results are different from setting capacity_factor=1.25. |
Do you have more information? I didn't get what you said. Outputs from different GPUs:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I trained MOE on 8 gpus with 8 experts. When I conducted the inference in parallel, I found each process had a similar but different result. I would like to ask you what could be the cause of this?
The text was updated successfully, but these errors were encountered: