This repository contains simple scripts for a training i-vector speaker recognition system on Voxceleb1[1] dataset using Kaldi. It was modified based on swshon's work[2]. Note that this experiment is not speaker verification indeed. The scoring is to compute similarity between two test utterances rather than that between an enrolled speaker and a test utterance.
- Kaldi Toolkit
- Download and unzip audio files from http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
- Create a directory named voxceleb1 with two subdirectories named train and test. Move dev data to train directory, test data to test directory.
- Download List of trial pairs for Verification(http://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt). Move it to voxceleb1 dir.
- run cmd: ln -fsr "your path to kaldi-trunk/egs/sre08/v1/sid" sid
- run cmd: ln -fsr "your path to kaldi-trunk/egs/sre08/v1/steps" steps
- run cmd: ln -fsr "your path to kaldi-trunk/egs/sre08v/1/utils" steps
- Modify dataset directories and parameters in run.sh file to fit in your machine.
- Run run.sh file
The 2048 component GMM-UBM and 600-dimensional i-vector extractor were trained using voxceleb1 training data for verification task. Training parameter is almost same compared to sre10 baseline on Kaldi egs.
GMM-2048 CDS eer : 15.6%
GMM-2048 LDA+CDS eer : 7.937%
GMM-2048 PLDA eer : 5.652%
The Voxceleb1 dataset, a large-scale speaker identification dataset was published in 2017 with speaker embedding baseline[1] and reported i-vector shows 8.8% EER. The i-vector was extracted using 1024 component GMM-UBM, so the EER is fairly worse compared to the result above.
[1] A. Nagraniy, J. S. Chung, and A. Zisserman, “VoxCeleb: A large-scale speaker identification dataset,” in Interspeech, 2017, pp. 2616–2620.