An Improved Deep Embedding Learning Method for Short Duration Speaker Verification - Pytorch Implementation
This is a pytorch implementation of the model(modified cross-conv. pooling) presented by Zhifu Gao in An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.
I am sorry that most of the code except the model is old and dirty. Because I try to it only private database. but there is no problem with performance or operation. If you only fit the input size - batch X 1 X feature dim. X frame.
Original paper's parameter is very big model. Cross-conv. pooling layer output is 512 x 512 = 262144, it makes small batch size and a lot of training time and so on. I recommend you use small size parameter about 128 x 128.
I hope this code helps researcher reach higher score.
- batch X 1 X feature dim. X frame.
Original paper:
- Gao's paper:
@article{,
author = {Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai},
title = {An Improved Deep Embedding Learning Method for Short Duration Speaker Verification},
conference = {Interspeech 2018},
year = {2018},
}
Also, use the part of code:
- my git repository
- Baseline code - data loader and so on.
- liorshk's git repository
- Facenet pytorch implimetation
- hbredin's git repository
- Voxceleb Database reader
- This code has only model implementation. Data loader and the other code was recycled from this code