You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training the segnext_l model as a standard teacher on my data and keeping the checkpoint obtained for distillation(mmseg/cwd) from segnext_l--->segnext_tiny. When doing this after starting few iterations i am getting all losses as nan for all upcoming iterations.
I am using all the latest versions .
The results of student model also remains 0.
The text was updated successfully, but these errors were encountered:
I face the same problem like you. After I doing several times experiment, I think this is because the lr rate is too large which makes gradient explosion. In mmrazor schedules modules, they do learning rate warm up, which will first make lr larger and the reduce lr by weight decay. So I change my learning rate number(which make it smaller) it works. I hope this can help you.
Describe the bug
I am training the segnext_l model as a standard teacher on my data and keeping the checkpoint obtained for distillation(mmseg/cwd) from segnext_l--->segnext_tiny. When doing this after starting few iterations i am getting all losses as nan for all upcoming iterations.
I am using all the latest versions .
The results of student model also remains 0.
The text was updated successfully, but these errors were encountered: