GitHub

Improving GANs for Speech Enhancement

Introduction

This is the repository of the DSEGAN, ISEGAN, (and the baseline SEGAN) in our original paper:

H. Phan, I. V. McLoughlin, L. Pham, O. Y. Chén, P. Koch, M. De Vos, and A. Mertins, "Improving GANs for Speech Enhancement," IEEE Signal Processing Letters, 2020. (accepted)

ISEGAN (Iterated SEGAN) and DSEGAN (Deep SEGAN) were built upon the SEGAN proposed by Pascual et al. and SEGAN repository from santi-pdp. Different from SEGAN with a single generator, ISEGAN and DSEGAN have multiple generators which are chained to perform multi-stage enhancement mapping:

The enhacement result of one generator is supposed to be further enhanced/corrected by the next generator in the chain. DSEGAN's generators are independent while ISEGAN's generators share parameters. Similar to SEGAN, the generators are based on fully convolutional architecture and receive raw speech waveforms to accomplish speech enhancement:

The project is developed with TensorFlow.

Dependencies

tensorflow_gpu 1.9
numpy==1.1.3
scipy==1.0.0

Data

The speech enhancement dataset used in the work can be found in Edinburgh DataShare. The following script downloads and prepares the data for TensorFlow format:

./download_audio.sh
./create_training_tfrecord.sh

Or alternatively download the dataset, convert the wav files to 16kHz sampling and set the noisy and clean training files paths in the config file e2e_maker.cfg in cfg/. Then run the script:

python make_tfrecords.py --force-gen --cfg cfg/e2e_maker.cfg

Training

Once you have the TFRecords file created in data/segan.tfrecords you can simply run one of the following scripts.

# ISEGAN: run inside isegan directory
./run_isegan.sh
# DSEGAN: run inside dsegan directory
./run_dsegan.sh
# SEGAN baseline: run inside segan directory
./run_segan.sh

Each script consists of commands for training and testing with 5 different checkpoints of the trained model on the test audio files with. You can modify the bash script to customize parameters (e.g. which GPUs to use) and what you want to run.

Enhancement results on two different test files:

Enhanced Wav Files

For comparison purpose, enhanced wave files of DSEGAN with depth of 2 are available at this here

Reference

@article{phan2019idsegan,
  title={Improving GANs for Speech Enhancement},
  author={Huy Phan, Ian V. McLoughlin, Lam Pham, Oliver Y. Ch\'en, Philipp Koch, Maarten De Vos, and Alfred Mertins},
  journal={arXiv preprint arXiv:2001.05532},
  year={2020}
}

Contact

e-mail: [email protected]

Further things to add

When I have some time, I will try to improve comments on the source code.
The pretrained models will be uploaded separately.
Some audio examples will be added for demonstration.

Notes

If using this code, parts of it, or developments from it, please cite the above reference.
We do not provide any support or assistance for the supplied code nor we offer any other compilation/variant of it.
We assume no responsibility regarding the provided code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving GANs for Speech Enhancement

Introduction

Dependencies

Data

Training

Enhanced Wav Files

Reference

Contact

Further things to add

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
cfg		cfg
dsegan		dsegan
evaluate		evaluate
isegan		isegan
segan		segan
README.md		README.md
create_training_tfrecord.sh		create_training_tfrecord.sh
download_audio.sh		download_audio.sh
make_tfrecords.py		make_tfrecords.py

pquochuy/idsegan

Folders and files

Latest commit

History

Repository files navigation

Improving GANs for Speech Enhancement

Introduction

Dependencies

Data

Training

Enhanced Wav Files

Reference

Contact

Further things to add

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages