Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to solve dataset AssertionError? #4

Open
b2220333 opened this issue Jul 29, 2019 · 5 comments
Open

How to solve dataset AssertionError? #4

b2220333 opened this issue Jul 29, 2019 · 5 comments

Comments

@b2220333
Copy link

After I decide to use python2 to run, I use pycharm to create a new python2 virtual environment and run:

git clone https://github.com/tohinz/multiple-objects-gan
cd multiple-objects-gan
vim requirements.txt to del pkg-resources==0.0.0 to prevent errors.
pip install -r requirements.txt
cd models/
wget -c https://www2.informatik.uni-hamburg.de/wtm/software/multiple-objects-gan/model-ms-coco-attngan.zip
unzip model-ms-coco-attngan.zip
cd ../code/coco/attngan/
edit coco_eval.yml change to 
DATA_DIR: '/home/sam/code/python/pytorch/image_caption/dataset/coco2014'
IMG_DIR: "/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014"
mkdir -p DAMSMencoders/coco/
wget -c https://www.dropbox.com/s/zj3z0lvkfd8vaga/image_encoder100.pth?dl=0 -O DAMSMencoders/coco/image_encoder100.pth
wget -c https://www.dropbox.com/s/jo325z064a7x07k/text_encoder100.pth?dl=0 -O DAMSMencoders/coco/text_encoder100.pth
python2 main.py --cfg cfg/coco_eval.yml

After I run above instructions I got errors:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '../../../models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Save to:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
Traceback (most recent call last):
  File "main.py", line 134, in <module>
    assert dataset
AssertionError
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$

If I comment line 134 of main.py in code/coco/attngan directory and run the same instruction again, it shows:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '../../../models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 158, in <module>
    algo.sample(split_dir, num_samples=25, draw_bbox=True)
  File "/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/trainer.py", line 489, in sample
    text_encoder.load_state_dict(state_dict)
  File "/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
        size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ 

How could I solve these problems?
Thank you~

@tohinz
Copy link
Owner

tohinz commented Jul 30, 2019

Hi, the first error means something is wrong with the created dataset. Check out the init() method of the TextDataset class in code/coco/attngan/datasets.py and make sure it loads everything correctly (especially the captions file/pre-processed metadata downloaded from the original AttnGAN Github).

For the second error I guess your path to the state dict of NET_E might be wrong, I suggest setting an absolute path to check if that solves the issue.

@b2220333
Copy link
Author

b2220333 commented Aug 1, 2019

I spend some time to check as you hints.
I found that it is really a loading dataset problem.
However, I coundn't find out what's going on.
After I setting NET_E and NET_G directory to be absolute path, the problem stills...

Here is the running outputs:

(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
Traceback (most recent call last):
  File "main.py", line 138, in <module>
    assert dataset
AssertionError
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$ python2 main.py --cfg cfg/coco_eval.yml
Using config:
{'B_VALIDATION': True,
 'CONFIG_NAME': 'attn2',
 'CUDA': True,
 'DATASET_NAME': 'coco',
 'DATA_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014',
 'GAN': {'B_ATTENTION': True,
         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 96,
         'GF_DIM': 48,
         'R_NUM': 3,
         'Z_DIM': 100},
 'GPU_ID': '0',
 'IMG_DIR': '/home/sam/code/python/pytorch/image_caption/dataset/coco2014/val2014',
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 20},
 'TRAIN': {'BATCH_SIZE': 50,
           'B_NET_D': False,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.0002,
           'FLAG': False,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/DAMSMencoders/coco/text_encoder100.pth',
           'NET_G': '/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/models/model-ms-coco-attngan-0100.pth',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 5.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 2000},
 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
 'WORKERS': 1}
bboxes:  (40470, 3, 4)
labels:  (40470, 3, 1)
Load from:  /home/sam/code/python/pytorch/image_caption/dataset/coco2014/captions.pickle
/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 162, in <module>
    algo.sample(split_dir, num_samples=25, draw_bbox=True)
  File "/home/sam/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan/trainer.py", line 489, in sample
    text_encoder.load_state_dict(state_dict)
  File "/home/sam/anaconda3/envs/py2_t1/lib/python2.7/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
        size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.
(py2_t1) sam@sam-ub1804:~/PycharmProjects/py2_t1/multiple-objects-gan/code/coco/attngan$

What can I do next?
Thank you~~

@tohinz
Copy link
Owner

tohinz commented Aug 1, 2019

Hi, I think the two errors are unrelated.
The first problem is with the dataset, something in its construction does not seem to work correctly. When initializing the dataset, can you check that lines 159-169 work correctly, i.e. check their shape and/or content to make sure they are all loaded correctly? E.g. self.number_example should be 40470 for the validation set.

For the second error it looks like something might be wrong with the pretrained text encoder. Try downloading it again to make sure the file is not corrupted. Also, which PyTorch version are you using? I think they changed the state_dict loading in one of the previous versions, so that might be an issue here, too.

@moegi161
Copy link

moegi161 commented May 30, 2020

Hi, I encounter the same problem as above and I have already tried re-downloading the DAMSM text encoder. I am using python 2.7.12 and pytorch 0.4.1 in a docker container.

Here is my running outputs

Starting training on the MS-COCO data set.
Using config:
{'B_VALIDATION': False,
'CONFIG_NAME': 'glu-gan2',
'CUDA': True,
'DATASET_NAME': 'coco',
'DATA_DIR': '/workspace/data/MS-COCO',
'GAN': {'B_ATTENTION': True,
'B_DCGAN': False,
'CONDITION_DIM': 100,
'DF_DIM': 96,
'GF_DIM': 48,
'R_NUM': 3,
'Z_DIM': 100},
'GPU_ID': '0,1,2',
'IMG_DIR': '/workspace/data/MS-COCO/train/train2014',
'RNN_TYPE': 'LSTM',
'TEXT': {'CAPTIONS_PER_IMAGE': 5, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 12},
'TRAIN': {'BATCH_SIZE': 14,
'B_NET_D': True,
'DISCRIMINATOR_LR': 0.0002,
'ENCODER_LR': 0.0002,
'FLAG': True,
'GENERATOR_LR': 0.0002,
'MAX_EPOCH': 120,
'NET_E': 'DAMSMencoders/coco/text_encoder100.pth',
'NET_G': '',
'RNN_GRAD_CLIP': 0.25,
'SMOOTH': {'GAMMA1': 4.0,
'GAMMA2': 5.0,
'GAMMA3': 10.0,
'LAMBDA': 50.0},
'SNAPSHOT_INTERVAL': 5},
'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3},
'WORKERS': 20}
bboxes: (82783, 3, 4)
labels: (82783, 3, 1)
Load filenames from: /workspace/data/MS-COCO/train/filenames.pickle (82783)
Load from: /workspace/data/MS-COCO/captions.pickle
num_exp:82783
('Load pretrained model from ', 'https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth')
473
Load image encoder from: DAMSMencoders/coco/image_encoder100.pth
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
9
Traceback (most recent call last):
File "main.py", line 152, in
algo.train()
File "/workspace/code/coco/attngan/trainer.py", line 252, in train
text_encoder, image_encoder, netG, netsD, start_epoch = self.build_models()
File "/workspace/code/coco/attngan/trainer.py", line 76, in build_models
text_encoder.load_state_dict(state_dict)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RNN_ENCODER:
size mismatch for encoder.weight: copying a param of torch.Size([1, 300]) from checkpoint, where the shape is torch.Size([27297, 300]) in current model.

num_exp is the self.number_example
473 & 9 is the length of state_dict of image encoder and text_encoder respectively.

May I know how to solve this error?
Thank you.

@tohinz
Copy link
Owner

tohinz commented Jun 2, 2020

Hi, to me this looks like a problem with the state_dict for the text encoder. The text encoder has an embedding layer of shape [27297, 300] (27297 words, each with a 300-dim embedding). It seems that the state_dict only has an embedding of size [1, 300]. Length 9 for the text_encoder state_dict is correct, the entries should be: ['encoder.weight', 'rnn.weight_ih_l0', 'rnn.weight_hh_l0', 'rnn.bias_ih_l0', 'rnn.bias_hh_l0', 'rnn.weight_ih_l0_reverse', 'rnn.weight_hh_l0_reverse', 'rnn.bias_ih_l0_reverse', 'rnn.bias_hh_l0_reverse']

Could you please check that

  • self.n_words = 27297 (seems to be the case for you) and
  • state_dict["encoder.weight"].shape = (27297, 300) (this is the pre-trained model from AttnGAN).

The pre-trained AttnGAN text encoder model (text_encoder100.pth) should be about 33MB in size, the image encoder (image_encoder100.pth) about 86MB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants