Speaker-Diarization-with-SNN

Final project for the Seminario en Aplicaciones de Redes Neuronales en la recuperación de información musical. The objetive is to use a Siamese Neuronal Network architecture in the Speaker Diarization task. We use the Librispeech dataset for training and validation.

First Implementation

The first implementation is in Keras and uses the SincNet architecture to lower the dimensionality of the convolutional task and work directly with the raw audio. With this approach we can obteain a good training error but the model does not generalize well and the validetion error was high.

Second Implementation

The second implementation is in PyTorch and uses the Wav2Vec model to extract the acoustic features of raw audio and proceed with this low dimensionality vector for the analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speaker-Diarization-with-SNN

First Implementation

Second Implementation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speaker-Diarization-with-SNN

First Implementation

Second Implementation