-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-training using GPUs is strange #21
Comments
Are you pre-training from scratch or initializing from the .h5 file? I've been pre-training with init from the h5 file and the loss appears to be unchanged between epochs
Going to try from scratch to see if it makes a difference. |
From scratch. |
I trained from scratch and no difference. I reduced the dataset size to only 10.000 sentences to make it easier to debug and perhaps make the model overfit the data but the loss doesn't change from epoch to epoch. So, still not able to pre-train from scratch but it appears we aren't dealing with the same problem. Would be good to know if anyone succeeded in pre-training from scratch. |
I think most of the codes were from the following google official tf2.0 code. I also tried the pre-training using the above repository, but it failed. I posted an issue in the official google repository. |
I am trying pre-training from scratch using GPUs in Japanese, but the pre-training seems strange.
In the following log,
masked_lm_accuracy
andsentence_order_accuracy
suddenly dropped.Has someone succeeded in pre-training from scratch?
The text was updated successfully, but these errors were encountered: