[Bug] Custom Distillation MMSeg CWD loss nan problem #638

Priyanshu88 · 2024-04-12T07:27:30Z

Describe the bug

I am training the segnext_l model as a standard teacher on my data and keeping the checkpoint obtained for distillation(mmseg/cwd) from segnext_l--->segnext_tiny. When doing this after starting few iterations i am getting all losses as nan for all upcoming iterations.

I am using all the latest versions .

The results of student model also remains 0.

tori-hotaru · 2024-10-09T07:36:49Z

I face the same problem like you. After I doing several times experiment, I think this is because the lr rate is too large which makes gradient explosion. In mmrazor schedules modules, they do learning rate warm up, which will first make lr larger and the reduce lr by weight decay. So I change my learning rate number(which make it smaller) it works. I hope this can help you.

Priyanshu88 added the bug Something isn't working label Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Custom Distillation MMSeg CWD loss nan problem #638

[Bug] Custom Distillation MMSeg CWD loss nan problem #638

Priyanshu88 commented Apr 12, 2024

tori-hotaru commented Oct 9, 2024

[Bug] Custom Distillation MMSeg CWD loss nan problem #638

[Bug] Custom Distillation MMSeg CWD loss nan problem #638

Comments

Priyanshu88 commented Apr 12, 2024

Describe the bug

tori-hotaru commented Oct 9, 2024