-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: '_orig_mod.conv1.output_shift' #340
Comments
I am facing the exact same error on WSL and native windows when training the sample scripts. |
I was having similar issue. Fix was to run from the qatv2 branch instead of the default. You can see my issue #335. |
Can you provide some more details? I believe qatv2 has already been merged to the default branch |
Hmm maybe it has. But when I had this same problem about 3 weeks ago, I had to pull 331 for training repo and 354 for synthesis myself. After that the training scripts run with no problem. |
Hi, Thank you for letting us know about this issue. This is a known issue and will be fixed in the next PR. As a temporary solution you may pass "--compiler-mode none" argument in your training script. Also, if you have multiple gpus, please add "--gpus 0" argument, and don't use the distributed training. Please note that this step was realized at the end of the training to evaluate your best checkpoint. You can safely ignore this error and continue with the other steps as well. |
I followed the installation and setup steps from the README. However, when I run a training script such as
scripts/train_mnist.sh
I am faced with the error:Traceback (most recent call last):
File "/home/wsl/ai8x-training/train.py", line 1564, in
main()
File "/home/wsl/ai8x-training/train.py", line 742, in main
test(test_loader, model, criterion, [pylogger], args=args, mode="best",
File "/home/wsl/ai8x-training/train.py", line 1097, in test
model = apputils.load_lean_checkpoint(model, best_ckpt_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsl/ai8x-training/distiller/build/editable.distiller-0.4.0rc0-py3-none-any/distiller/apputils/checkpoint.py", line 92, in load_lean_checkpoint
return load_checkpoint(model, chkpt_file, model_device=model_device,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsl/ai8x-training/distiller/build/editable.distiller-0.4.0rc0-py3-none-any/distiller/apputils/checkpoint.py", line 212, in load_checkpoint
normalize_dataparallel_keys = _load_compression_scheduler()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wsl/ai8x-training/distiller/build/editable.distiller-0.4.0rc0-py3-none-any/distiller/apputils/checkpoint.py", line 133, in _load_compression_scheduler
compression_scheduler.load_state_dict(checkpoint['compression_sched'], normalize_keys)
File "/home/wsl/ai8x-training/distiller/build/editable.distiller-0.4.0rc0-py3-none-any/distiller/scheduler.py", line 213, in load_state_dict
masker.mask = loaded_masks[name]
~~~~~~~~~~~~^^^^^^
KeyError: '_orig_mod.conv1.output_shift'
This happens when the training is done for all the training scripts that I tried. Please help me in resolving it.
The text was updated successfully, but these errors were encountered: