Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor issue with avg pooling #12

Open
hash2430 opened this issue Sep 18, 2019 · 1 comment
Open

Minor issue with avg pooling #12

hash2430 opened this issue Sep 18, 2019 · 1 comment

Comments

@hash2430
Copy link

hash2430 commented Sep 18, 2019

I could be wrong, since I normaly don't do speech verification.
But in my case, if I run the training it gives following error when affine transform is done to match the embedding dim after avg pool
"x = self.model.fc(x)" gives
RuntimeError: size mismatch, m1: [512 x 1024], m2: [2048 x 512] at /opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THC/generic/THCTensorMathBlas.cu:266

I think this is because
avgpool is supposed to be on the temporal dimension by design, and in the commited version of the code, the avg pooling is done on frequency domain.
avg pool2d is supposed to give [F, T] = [4, 2] => [4,1] but instead it gives [4, 2] => [1, 2]
Thus the dimension after torch.view is half smaller than what is expected by the model.fc layer.

image

So I suggest
image

for myResNet.init()

Again, I'm no expert of speech verification.
Anybody has another idea on how to fix that bug that is occuring to me, please please let me know.

@fangmq
Copy link

fangmq commented Mar 10, 2022

@hash2430 hello! I find the same problem when I read the code, have you tried to avgpool the temporal dimension and the performance become better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants