-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
**Training script** #23
Comments
Casting weights to bf16 is not recommended and removed for now. |
here's the gradient accumulation from the And here's a small example |
for gradient accumulation, i have opened a PR: #29 |
Hello, |
Hi @celsofranssa, the hyperparameters in HF model cards (for example here) are tuned for TPU-v3-8. But you can run the script on GPU adjusting batch size accordingly and mb switching |
The text was updated successfully, but these errors were encountered: