Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error(s) in loading state_dict for TransformerDecoderModel #70

Open
imomayiz opened this issue Apr 9, 2020 · 0 comments
Open

Error(s) in loading state_dict for TransformerDecoderModel #70

imomayiz opened this issue Apr 9, 2020 · 0 comments

Comments

@imomayiz
Copy link

imomayiz commented Apr 9, 2020

When running the following command python3 transfer.py --load transformer.pt --model 'transformer', I get the following errors:
RuntimeError: Error(s) in loading state_dict for TransformerDecoderModel: Missing key(s) in state_dict: "encoder.embed_out", "encoder.embed_tokens.weight", "encoder.layers.0.self_attn.in_proj_weight", "encoder.layers.0.self_attn.in_proj_bias", "encoder.layers.0.self_attn.out_proj.weight", "encoder.layers.0.self_attn.out_proj.bias", "encoder.layers.0.encoder_attn.in_proj_weight", "encoder.layers.0.encoder_attn.in_proj_bias", "encoder.layers.0.encoder_attn.out_proj.weight", "encoder.layers.0.encoder_attn.out_proj.bias", "encoder.layers.0.fc1.weight", "encoder.layers.0.fc1.bias", "encoder.layers.0.fc2.weight", "encoder.layers.0.fc2.bias", "encoder.layers.0.layer_norms.0.weight", "encoder.layers.0.layer_norms.0.bias", "encoder.layers.0.layer_norms.1.weight", "encoder.layers.0.layer_norms.1.bias", "encoder.layers.1.self_attn.in_proj_weight", "encoder.layers.1.self_attn.in_proj_bias", "encoder.layers.1.self_attn.out_proj.weight", "encoder.layers.1.self_attn.out_proj.bias", "encoder.layers.1.encoder_attn.in_proj_weight", "encoder.layers.1.encoder_attn.in_proj_bias", "encoder.layers.1.encoder_attn.out_proj.weight", "encoder.layers.1.encoder_attn.out_proj.bias", "encoder.layers.1.fc1.weight", "encoder.layers.1.fc1.bias", "encoder.layers.1.fc2.weight", "encoder.layers.1.fc2.bias", "encoder.layers.1.layer_norms.0.weight", "encoder.layers.1.layer_norms.0.bias", "encoder.layers.1.layer_norms.1.weight", "encoder.layers.1.layer_norms.1.bias", "encoder.layers.2.self_attn.in_proj_weight", "encoder.layers.2.self_attn.in_proj_bias", "encoder.layers.2.self_attn.out_proj.weight", "encoder.layers.2.self_attn.out_proj.bias", "encoder.layers.2.encoder_attn.in_proj_weight", "encoder.layers.2.encoder_attn.in_proj_bias", "encoder.layers.2.encoder_attn.out_proj.weight", "encoder.layers.2.encoder_attn.out_proj.bias", "encoder.layers.2.fc1.weight", "encoder.layers.2.fc1.bias", "encoder.layers.2.fc2.weight", "encoder.layers.2.fc2.bias", "encoder.layers.2.layer_norms.0.weight", "encoder.layers.2.layer_norms.0.bias", "encoder.layers.2.layer_norms.1.weight", "encoder.layers.2.layer_norms.1.bias", "encoder.layers.3.self_attn.in_proj_weight", "encoder.layers.3.self_attn.in_proj_bias", "encoder.layers.3.self_attn.out_proj.weight", "encoder.layers.3.self_attn.out_proj.bias", "encoder.layers.3.encoder_attn.in_proj_weight", "encoder.layers.3.encoder_attn.in_proj_bias", "encoder.layers.3.encoder_attn.out_proj.weight", "encoder.layers.3.encoder_attn.out_proj.bias", "encoder.layers.3.fc1.weight", "encoder.layers.3.fc1.bias", "encoder.layers.3.fc2.weight", "encoder.layers.3.fc2.bias", "encoder.layers.3.layer_norms.0.weight", "encoder.layers.3.layer_norms.0.bias", "encoder.layers.3.layer_norms.1.weight", "encoder.layers.3.layer_norms.1.bias", "encoder.layers.4.self_attn.in_proj_weight", "encoder.layers.4.self_attn.in_proj_bias", "encoder.layers.4.self_attn.out_proj.weight", "encoder.layers.4.self_attn.out_proj.bias", "encoder.layers.4.encoder_attn.in_proj_weight", "encoder.layers.4.encoder_attn.in_proj_bias", "encoder.layers.4.encoder_attn.out_proj.weight", "encoder.layers.4.encoder_attn.out_proj.bias", "encoder.layers.4.fc1.weight", "encoder.layers.4.fc1.bias", "encoder.layers.4.fc2.weight", "encoder.layers.4.fc2.bias", "encoder.layers.4.layer_norms.0.weight", "encoder.layers.4.layer_norms.0.bias", "encoder.layers.4.layer_norms.1.weight", "encoder.layers.4.layer_norms.1.bias", "encoder.layers.5.self_attn.in_proj_weight", "encoder.layers.5.self_attn.in_proj_bias", "encoder.layers.5.self_attn.out_proj.weight", "encoder.layers.5.self_attn.out_proj.bias", "encoder.layers.5.encoder_attn.in_proj_weight", "encoder.layers.5.encoder_attn.in_proj_bias", "encoder.layers.5.encoder_attn.out_proj.weight", "encoder.layers.5.encoder_attn.out_proj.bias", "encoder.layers.5.fc1.weight", "encoder.layers.5.fc1.bias", "encoder.layers.5.fc2.weight", "encoder.layers.5.fc2.bias", "encoder.layers.5.layer_norms.0.weight", "encoder.layers.5.layer_norms.0.bias", "encoder.layers.5.layer_norms.1.weight", "encoder.layers.5.layer_norms.1.bias". Unexpected key(s) in state_dict: ".embed_out_g", ".embed_out_v", ".embed_tokens.weight", ".embed_positions.weight", ".layers.0.self_attn.in_proj_bias", ".layers.0.self_attn.in_proj_weight_g", ".layers.0.self_attn.in_proj_weight_v", ".layers.0.self_attn.out_proj.bias", ".layers.0.self_attn.out_proj.weight_g", ".layers.0.self_attn.out_proj.weight_v", ".layers.0.encoder_attn.in_proj_bias", ".layers.0.encoder_attn.in_proj_weight_g", ".layers.0.encoder_attn.in_proj_weight_v", ".layers.0.encoder_attn.out_proj.bias", ".layers.0.encoder_attn.out_proj.weight_g", ".layers.0.encoder_attn.out_proj.weight_v", ".layers.0.fc1.bias", ".layers.0.fc1.weight_g", ".layers.0.fc1.weight_v", ".layers.0.fc2.bias", ".layers.0.fc2.weight_g", ".layers.0.fc2.weight_v", ".layers.0.layer_norms.0.weight", ".layers.0.layer_norms.0.bias", ".layers.0.layer_norms.1.weight", ".layers.0.layer_norms.1.bias", ".layers.1.self_attn.in_proj_bias", ".layers.1.self_attn.in_proj_weight_g", ".layers.1.self_attn.in_proj_weight_v", ".layers.1.self_attn.out_proj.bias", ".layers.1.self_attn.out_proj.weight_g", ".layers.1.self_attn.out_proj.weight_v", ".layers.1.encoder_attn.in_proj_bias", ".layers.1.encoder_attn.in_proj_weight_g", ".layers.1.encoder_attn.in_proj_weight_v", ".layers.1.encoder_attn.out_proj.bias", ".layers.1.encoder_attn.out_proj.weight_g", ".layers.1.encoder_attn.out_proj.weight_v", ".layers.1.fc1.bias", ".layers.1.fc1.weight_g", ".layers.1.fc1.weight_v", ".layers.1.fc2.bias", ".layers.1.fc2.weight_g", ".layers.1.fc2.weight_v", ".layers.1.layer_norms.0.weight", ".layers.1.layer_norms.0.bias", ".layers.1.layer_norms.1.weight", ".layers.1.layer_norms.1.bias", ".layers.2.self_attn.in_proj_bias", ".layers.2.self_attn.in_proj_weight_g", ".layers.2.self_attn.in_proj_weight_v", ".layers.2.self_attn.out_proj.bias", ".layers.2.self_attn.out_proj.weight_g", ".layers.2.self_attn.out_proj.weight_v", ".layers.2.encoder_attn.in_proj_bias", ".layers.2.encoder_attn.in_proj_weight_g", ".layers.2.encoder_attn.in_proj_weight_v", ".layers.2.encoder_attn.out_proj.bias", ".layers.2.encoder_attn.out_proj.weight_g", ".layers.2.encoder_attn.out_proj.weight_v", ".layers.2.fc1.bias", ".layers.2.fc1.weight_g", ".layers.2.fc1.weight_v", ".layers.2.fc2.bias", ".layers.2.fc2.weight_g", ".layers.2.fc2.weight_v", ".layers.2.layer_norms.0.weight", ".layers.2.layer_norms.0.bias", ".layers.2.layer_norms.1.weight", ".layers.2.layer_norms.1.bias", ".layers.3.self_attn.in_proj_bias", ".layers.3.self_attn.in_proj_weight_g", ".layers.3.self_attn.in_proj_weight_v", ".layers.3.self_attn.out_proj.bias", ".layers.3.self_attn.out_proj.weight_g", ".layers.3.self_attn.out_proj.weight_v", ".layers.3.encoder_attn.in_proj_bias", ".layers.3.encoder_attn.in_proj_weight_g", ".layers.3.encoder_attn.in_proj_weight_v", ".layers.3.encoder_attn.out_proj.bias", ".layers.3.encoder_attn.out_proj.weight_g", ".layers.3.encoder_attn.out_proj.weight_v", ".layers.3.fc1.bias", ".layers.3.fc1.weight_g", ".layers.3.fc1.weight_v", ".layers.3.fc2.bias", ".layers.3.fc2.weight_g", ".layers.3.fc2.weight_v", ".layers.3.layer_norms.0.weight", ".layers.3.layer_norms.0.bias", ".layers.3.layer_norms.1.weight", ".layers.3.layer_norms.1.bias", ".layers.4.self_attn.in_proj_bias", ".layers.4.self_attn.in_proj_weight_g", ".layers.4.self_attn.in_proj_weight_v", ".layers.4.self_attn.out_proj.bias", ".layers.4.self_attn.out_proj.weight_g", ".layers.4.self_attn.out_proj.weight_v", ".layers.4.encoder_attn.in_proj_bias", ".layers.4.encoder_attn.in_proj_weight_g", ".layers.4.encoder_attn.in_proj_weight_v", ".layers.4.encoder_attn.out_proj.bias", ".layers.4.encoder_attn.out_proj.weight_g", ".layers.4.encoder_attn.out_proj.weight_v", ".layers.4.fc1.bias", ".layers.4.fc1.weight_g", ".layers.4.fc1.weight_v", ".layers.4.fc2.bias", ".layers.4.fc2.weight_g", ".layers.4.fc2.weight_v", ".layers.4.layer_norms.0.weight", ".layers.4.layer_norms.0.bias", ".layers.4.layer_norms.1.weight", ".layers.4.layer_norms.1.bias", ".layers.5.self_attn.in_proj_bias", ".layers.5.self_attn.in_proj_weight_g", ".layers.5.self_attn.in_proj_weight_v", ".layers.5.self_attn.out_proj.bias", ".layers.5.self_attn.out_proj.weight_g", ".layers.5.self_attn.out_proj.weight_v", ".layers.5.encoder_attn.in_proj_bias", ".layers.5.encoder_attn.in_proj_weight_g", ".layers.5.encoder_attn.in_proj_weight_v", ".layers.5.encoder_attn.out_proj.bias", ".layers.5.encoder_attn.out_proj.weight_g", ".layers.5.encoder_attn.out_proj.weight_v", ".layers.5.fc1.bias", ".layers.5.fc1.weight_g", ".layers.5.fc1.weight_v", ".layers.5.fc2.bias", ".layers.5.fc2.weight_g", ".layers.5.fc2.weight_v", ".layers.5.layer_norms.0.weight", ".layers.5.layer_norms.0.bias", ".layers.5.layer_norms.1.weight", ".layers.5.layer_norms.1.bias", ".layers.6.self_attn.in_proj_bias", ".layers.6.self_attn.in_proj_weight_g", ".layers.6.self_attn.in_proj_weight_v", ".layers.6.self_attn.out_proj.bias", ".layers.6.self_attn.out_proj.weight_g", ".layers.6.self_attn.out_proj.weight_v", ".layers.6.encoder_attn.in_proj_bias", ".layers.6.encoder_attn.in_proj_weight_g", ".layers.6.encoder_attn.in_proj_weight_v", ".layers.6.encoder_attn.out_proj.bias", ".layers.6.encoder_attn.out_proj.weight_g", ".layers.6.encoder_attn.out_proj.weight_v", ".layers.6.fc1.bias", ".layers.6.fc1.weight_g", ".layers.6.fc1.weight_v", ".layers.6.fc2.bias", ".layers.6.fc2.weight_g", ".layers.6.fc2.weight_v", ".layers.6.layer_norms.0.weight", ".layers.6.layer_norms.0.bias", ".layers.6.layer_norms.1.weight", ".layers.6.layer_norms.1.bias", ".layers.7.self_attn.in_proj_bias", ".layers.7.self_attn.in_proj_weight_g", ".layers.7.self_attn.in_proj_weight_v", ".layers.7.self_attn.out_proj.bias", ".layers.7.self_attn.out_proj.weight_g", ".layers.7.self_attn.out_proj.weight_v", ".layers.7.encoder_attn.in_proj_bias", ".layers.7.encoder_attn.in_proj_weight_g", ".layers.7.encoder_attn.in_proj_weight_v", ".layers.7.encoder_attn.out_proj.bias", ".layers.7.encoder_attn.out_proj.weight_g", ".layers.7.encoder_attn.out_proj.weight_v", ".layers.7.fc1.bias", ".layers.7.fc1.weight_g", ".layers.7.fc1.weight_v", ".layers.7.fc2.bias", ".layers.7.fc2.weight_g", ".layers.7.fc2.weight_v", ".layers.7.layer_norms.0.weight", ".layers.7.layer_norms.0.bias", ".layers.7.layer_norms.1.weight", ".layers.7.layer_norms.1.bias", ".layers.8.self_attn.in_proj_bias", ".layers.8.self_attn.in_proj_weight_g", ".layers.8.self_attn.in_proj_weight_v", ".layers.8.self_attn.out_proj.bias", ".layers.8.self_attn.out_proj.weight_g", ".layers.8.self_attn.out_proj.weight_v", ".layers.8.encoder_attn.in_proj_bias", ".layers.8.encoder_attn.in_proj_weight_g", ".layers.8.encoder_attn.in_proj_weight_v", ".layers.8.encoder_attn.out_proj.bias", ".layers.8.encoder_attn.out_proj.weight_g", ".layers.8.encoder_attn.out_proj.weight_v", ".layers.8.fc1.bias", ".layers.8.fc1.weight_g", ".layers.8.fc1.weight_v", ".layers.8.fc2.bias", ".layers.8.fc2.weight_g", ".layers.8.fc2.weight_v", ".layers.8.layer_norms.0.weight", ".layers.8.layer_norms.0.bias", ".layers.8.layer_norms.1.weight", ".layers.8.layer_norms.1.bias", ".layers.9.self_attn.in_proj_bias", ".layers.9.self_attn.in_proj_weight_g", ".layers.9.self_attn.in_proj_weight_v", ".layers.9.self_attn.out_proj.bias", ".layers.9.self_attn.out_proj.weight_g", ".layers.9.self_attn.out_proj.weight_v", ".layers.9.encoder_attn.in_proj_bias", ".layers.9.encoder_attn.in_proj_weight_g", ".layers.9.encoder_attn.in_proj_weight_v", ".layers.9.encoder_attn.out_proj.bias", ".layers.9.encoder_attn.out_proj.weight_g", ".layers.9.encoder_attn.out_proj.weight_v", ".layers.9.fc1.bias", ".layers.9.fc1.weight_g", ".layers.9.fc1.weight_v", ".layers.9.fc2.bias", ".layers.9.fc2.weight_g", ".layers.9.fc2.weight_v", ".layers.9.layer_norms.0.weight", ".layers.9.layer_norms.0.bias", ".layers.9.layer_norms.1.weight", ".layers.9.layer_norms.1.bias", ".layers.10.self_attn.in_proj_bias", ".layers.10.self_attn.in_proj_weight_g", ".layers.10.self_attn.in_proj_weight_v", ".layers.10.self_attn.out_proj.bias", ".layers.10.self_attn.out_proj.weight_g", ".layers.10.self_attn.out_proj.weight_v", ".layers.10.encoder_attn.in_proj_bias", ".layers.10.encoder_attn.in_proj_weight_g", ".layers.10.encoder_attn.in_proj_weight_v", ".layers.10.encoder_attn.out_proj.bias", ".layers.10.encoder_attn.out_proj.weight_g", ".layers.10.encoder_attn.out_proj.weight_v", ".layers.10.fc1.bias", ".layers.10.fc1.weight_g", ".layers.10.fc1.weight_v", ".layers.10.fc2.bias", ".layers.10.fc2.weight_g", ".layers.10.fc2.weight_v", ".layers.10.layer_norms.0.weight", ".layers.10.layer_norms.0.bias", ".layers.10.layer_norms.1.weight", ".layers.10.layer_norms.1.bias", ".layers.11.self_attn.in_proj_bias", ".layers.11.self_attn.in_proj_weight_g", ".layers.11.self_attn.in_proj_weight_v", ".layers.11.self_attn.out_proj.bias", ".layers.11.self_attn.out_proj.weight_g", ".layers.11.self_attn.out_proj.weight_v", ".layers.11.encoder_attn.in_proj_bias", ".layers.11.encoder_attn.in_proj_weight_g", ".layers.11.encoder_attn.in_proj_weight_v", ".layers.11.encoder_attn.out_proj.bias", ".layers.11.encoder_attn.out_proj.weight_g", ".layers.11.encoder_attn.out_proj.weight_v", ".layers.11.fc1.bias", ".layers.11.fc1.weight_g", ".layers.11.fc1.weight_v", ".layers.11.fc2.bias", ".layers.11.fc2.weight_g", ".layers.11.fc2.weight_v", ".layers.11.layer_norms.0.weight", ".layers.11.layer_norms.0.bias", ".layers.11.layer_norms.1.weight", ".layers.11.layer_norms.1.bias".

The error is in loading the state dict of the model. I didn't have such a problem with mlstm though.
Can you please tell me what's wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant