Tacotron 2 (using Griffin Lim, not Wavenet)

NOTE

This is a forked version of NVIDIA's tacotron2 repository, which I changed to work on NVIDIA K80 GPUs, instead of the V100 GPUs, used originally by them.

Tacotron 2 (using Griffin Lim, not Wavenet)

Tacotron 2 PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.

This implementation includes distributed and fp16 support and uses the LJSpeech dataset.

Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's Apex Library.

Results from Tensorboard while Training:

Pre-requisites

NVIDIA GPU + CUDA cuDNN

Setup

Download and extract the LJ Speech dataset
Clone repo: git clone https://github.com/RiccardoGrin/NVIDIA-tacotron2.git
CD into repo: cd NVIDIA-tacotron2
Update .wav paths: sed -i -- 's,DUMMY,/home/ubuntu/LJSpeech-1.1/wavs,g' filelists/*.txt
- Alternatively, set load_mel_from_disk=True in hparams.py and update mel-spectrogram paths
Install pytorch 0.4
Install python requirements: pip install -r requirements.txt
Change 'dist_url' in hparams.py to the repo directory, where test.dpt file does not exist

Training

python train.py --output_directory=outdir --log_directory=logdir
(OPTIONAL) tensorboard --logdir=outdir/logdir

Multi-GPU (distributed) and FP16 Training

python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

This does train much faster and better than the normal training, however this may start by overflowing for a few steps, with messages similar to the following, before it starts training correctly:

'OVERFLOW! Skipping step. Attempted loss scale: 4294967296'

Inference

Start and open a Jupyter Notebook
Open inference.ipynb
Follow instructions on notebook and run

Inference results

Below are the inference results after 15750, and 4750 steps respectively, for the input text: "You stay in Wonderland and I show you how deep the rabbit hole goes." - Morpheus, The Matrix

You can download 'inference_test_15750.wav' and 'inference_test_4750.wav', to listen to the respective audio generated. Around step 4750, is when the network started to construct a proper alignment graph and make understandable sounds.

Related repos

nv-wavenet: Faster than real-time wavenet inference

Acknowledgements

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.

We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.

We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
filelists		filelists
text		text
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
audio_processing.py		audio_processing.py
data_utils.py		data_utils.py
distributed.dpt		distributed.dpt
distributed.py		distributed.py
fp16_optimizer.py		fp16_optimizer.py
hparams.py		hparams.py
inference.ipynb		inference.ipynb
inference_test_15750.png		inference_test_15750.png
inference_test_15750.wav		inference_test_15750.wav
inference_test_4750.png		inference_test_4750.png
inference_test_4750.wav		inference_test_4750.wav
layers.py		layers.py
logger.py		logger.py
loss_function.py		loss_function.py
loss_scaler.py		loss_scaler.py
model.py		model.py
multiproc.py		multiproc.py
plotting_utils.py		plotting_utils.py
requirements.txt		requirements.txt
stft.py		stft.py
tensorboard.png		tensorboard.png
train.py		train.py
train.sh		train.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOTE

Tacotron 2 (using Griffin Lim, not Wavenet)

Pre-requisites

Setup

Training

Multi-GPU (distributed) and FP16 Training

Inference

Inference results

Related repos

Acknowledgements

About

Releases

Packages

Languages

License

RiccardoGrin/NVIDIA-tacotron2

Folders and files

Latest commit

History

Repository files navigation

NOTE

Tacotron 2 (using Griffin Lim, not Wavenet)

Pre-requisites

Setup

Training

Multi-GPU (distributed) and FP16 Training

Inference

Inference results

Related repos

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages