Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix QAT resume with BN models, checkpoint name #260

Merged
merged 4 commits into from
Oct 18, 2023

Conversation

oguzhanbsolak
Copy link
Contributor

I noticed these bugs recently and this pr is independent of the bug fixes and updates I mentioned in last week's meeting.

image

When resuming a checkpoint, a new optimizer is created from the model. As batchnorm fusing reduces models' parameter sizes, when resuming from a qat_checkpoint new optimizer have less parameters than the optimizer state_dict at the checkpoint.
Therefore, I added a update_optimizer function to call at initiate_qat state to strip off the batchnorm parameters from the optimizer.

Also, after resuming a qat_checkpoint, checkpoint names were incorrect. This pr includes a simple fix for that as well.

Limitations:

  • This solution is tested for a single params_group case. As our current train.py structure always produces a single params_group, it should be okay. However, if we ever add functionality to have different learning rates for different layers, we should check this function as well.

Copy link
Contributor

@ermanok ermanok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@rotx-eva rotx-eva changed the title Bugfixes: Qat resume with BN models, ckpt name Fix QAT resume with BN models, checkpoint name Oct 18, 2023
@rotx-eva rotx-eva merged commit 87351b3 into analogdevicesinc:develop Oct 18, 2023
3 of 4 checks passed
rotx-eva pushed a commit that referenced this pull request Mar 8, 2024
* Add AutoEncoder Model and Evaluation Notebook (#260)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants