Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Custom Distillation MMSeg CWD loss nan problem #638

Open
Priyanshu88 opened this issue Apr 12, 2024 · 1 comment
Open

[Bug] Custom Distillation MMSeg CWD loss nan problem #638

Priyanshu88 opened this issue Apr 12, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Priyanshu88
Copy link

Describe the bug

I am training the segnext_l model as a standard teacher on my data and keeping the checkpoint obtained for distillation(mmseg/cwd) from segnext_l--->segnext_tiny. When doing this after starting few iterations i am getting all losses as nan for all upcoming iterations.

I am using all the latest versions .

image

The results of student model also remains 0.

@Priyanshu88 Priyanshu88 added the bug Something isn't working label Apr 12, 2024
@tori-hotaru
Copy link

I face the same problem like you. After I doing several times experiment, I think this is because the lr rate is too large which makes gradient explosion. In mmrazor schedules modules, they do learning rate warm up, which will first make lr larger and the reduce lr by weight decay. So I change my learning rate number(which make it smaller) it works. I hope this can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants