-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threshold appear to be nan during the training process #27
Comments
Thanks for your attention, I am sorry that I just see this issue. if lowing the initial lr doesn't work. I suggest that you can try to change the activation function in the last layer of BN Mudule. I had changed the compressed sigmoid function to a linear function with limiting the output to [0.25,0.9], it seems that the NAN will no longer appear. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi tao han:
I am a graduate student in SEU, trying to replace the backbone of IIM (VGG16_FPN or HRNet) to my Transformer crowd counting model. However, even I low the initial lr 2 1e-6 to 1e-7 in SHA. the threshold even appears to be NAN in the 700 epoch. Also, the best MAE is only 126, which is far away from my model combined with other losses (more than MSE) on SHA.
I noticed that in this link #7 (comment) you have mentioned that we also could lower the initial threshold, I wonder to sure whether is the initial weight 0.5 in the Binarized module. But even I change the initial weight to 0.4, the t_max also starts with 0.54. I get confused with the Binarized module. looking forward to your reply, my email is [email protected]/ [email protected]
The text was updated successfully, but these errors were encountered: