Minor issue with avg pooling #12

hash2430 · 2019-09-18T06:23:18Z

I could be wrong, since I normaly don't do speech verification.
But in my case, if I run the training it gives following error when affine transform is done to match the embedding dim after avg pool
"x = self.model.fc(x)" gives
RuntimeError: size mismatch, m1: [512 x 1024], m2: [2048 x 512] at /opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THC/generic/THCTensorMathBlas.cu:266

I think this is because
avgpool is supposed to be on the temporal dimension by design, and in the commited version of the code, the avg pooling is done on frequency domain.
avg pool2d is supposed to give [F, T] = [4, 2] => [4,1] but instead it gives [4, 2] => [1, 2]
Thus the dimension after torch.view is half smaller than what is expected by the model.fc layer.

So I suggest

for myResNet.init()

Again, I'm no expert of speech verification.
Anybody has another idea on how to fix that bug that is occuring to me, please please let me know.

The text was updated successfully, but these errors were encountered:

fangmq · 2022-03-10T09:14:19Z

@hash2430 hello! I find the same problem when I read the code, have you tried to avgpool the temporal dimension and the performance become better?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor issue with avg pooling #12

Minor issue with avg pooling #12

hash2430 commented Sep 18, 2019 •

edited

Loading

fangmq commented Mar 10, 2022

Minor issue with avg pooling #12

Minor issue with avg pooling #12

Comments

hash2430 commented Sep 18, 2019 • edited Loading

fangmq commented Mar 10, 2022

hash2430 commented Sep 18, 2019 •

edited

Loading