This repository contains the PyTorch implementation for the following paper:
B. Nortier, M. Sadeghi, and R. Serizel, Unsupervised Speech Enhancement with Diffusion-based Generative Models, ICASSP 2024.
- Installation
- Training
- Pretrained checkpoint
- Demo
- Audio samples
- Supplementary material
- Bibtex
- References
Create a virtual environment using Python 3.8
and install the package dependencies via
pip install -r requirements.txt
We find that the line pypesq==1.2.4
may cause errors in which case we recommend using the alternative suggestion to install pypesq with the command
pip install https://github.com/vBaiCai/python-pesq/archive/master.zip
A diffusion-based clean speech generative model can be trained using train.py
:
python train.py --transform_type exponent --format wsj0 --gpus 2 --batch_size 14 --resume_from_checkpoint file/to/last.ckpt
A pretrained checkpoint for a clean speech generative model trained on the WSJ0 dataset can be downloaded via this Google drive link.
A demo of the UDiffSE framework is provided in demo.ipynb
. This notebook presents a demonstration of sampling from clean speech prior learned via a diffusion-based generative model, followed by speech enhancement of a test noisy speech signal.
A collection of audio samples that compare the speech enhancement performances of UDiffSE, RVAE [1] and SGMSE+ [2] over the WSJ0-QUT and TCD-TIMIT datasets may be found on UDiffSE's webpage.
Supplementary material, including additional details, discussions, and parameter studies that serve to expand our work is provided in the docs
directory (direct link).
@inproceedings{nortier2023unsupervised,
title={Unsupervised speech enhancement with diffusion-based generative models},
author={Nortier, Bern{\'e} and Sadeghi, Mostafa and Serizel, Romain},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024},
organization={IEEE}
}
[1] S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, “A recurrent variational autoencoder for speech enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
[2] J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech enhancement and dereverberation with diffusion-based generative models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.