A novoice's PyTorch implementation of FastSpeech 2: Fast and High-Quality End-to-End Text to Speech based on FastSpeech implementation of Deepest-Project FastSpeech.
The quality of voice samples generated by this repo is not upto mark, major reason being the use of batch_size = 8
due to inferior GPU memory and processing power. With batch_size>8
my CUDA memory ran out.
I would be glad if anyone reading this repo can take up the training with batch_size
as given in the paper and/or suggest ways of improving the results. 😇
Download the checkpoint from here trained on LJSpeech dataset. Place it in the training_log
folder. And run the inference.ipynb. For mel to audio generation I have used MelGan from 🔦 torch hub.
All code is writen in python 3.6.10
.
requirements.txt
contains the list of all packages required to run this repo.
pip install -r requirements.txt
For smooth working download the latest torch and suitable cuda version from here. This repo works with pytorch => 1.4. Not sure about the lower versions, let me know if they work.
Before moving to the next step update the hparams.py
file as per your requirements.
The folder MFA_filelist contains pre extracted alignments using Montreal Forced Aligner on the LJSpeech Dataset. For more information on using MFA visit here.
python preprocess.py -d /root_path/to/wavs/
python compute_statistics.py
Update the hparams.py file with appropraite infor about pitch and energy
Make sure you have the training_log
folder existing in the repo before running the below command.
python train.py
- The output of the present checkpoint is not good, because of lack of training. Will update with the best checkpoint as soon as I can.
- There are outliers in the dataset that needs to be taken care. Hopefully that can make the training more lean.
- Using a lower batch size doesnot work well with this model.
- Normalizing pitch and energy may also help with faster training or better convergence.
- This model was trained on Nvidia GTX GeForce 960M 4gb, which is pretty low standard in comparison to the requirements of this model.
- Feel free to share intresting insights.