Release v2.2.1: Important notice and minor patch release · openvpi/DiffSinger

Vocoder fine-tuning is available

Everything about vocoder training, fine-tuning and research now has its own place: https://github.com/openvpi/SingingVocoders

User can now fine-tune the shared NSF-HiFiGAN vocoder model on their own datasets without much computing resources. In most cases, vocoder fine-tuning can reduce the noise caused by unmatched mel-spectrogram predictions with the ground truth on unseen datasets, improving the final audio quality. See the documentation about how to use custom vocoder models and deploy them to ONNX format in this repository.

Mutual influence between variance modules

A recent research from the developer team found some mutual influence between the duration predictor, the pitch predictor and the variance predictor of a variance model. The findings have been written as formal suggestions into the documentation. Following these suggestions to train your variance models can improve the accuracy and avoid unstable loudness.

Changes and bug fixes

This patch release contains the following changes:

The pitch expressiveness factor is now exposed by default but can be disabled by --freeze_expr
Note glide type can now be frozen by --freeze_glide for compatibility with OpenUTAU
Shallow diffusion and FP16 AMP are now enabled by default
The default f0_max configuration value is changed from 800 to 1100
Model path can be specified by --ckpt when exporting custom vocoder model to ONNX
Documentation about preparing and deploying custom vocoders is added and re-organized
Melody encoder is added to the new variance model architecture graph

The following bugs are fixed:

A relative path bug caused by custom checkpoint saving directory
Interpolation error is raised during inference of variance model when all notes are rest
The breathiness unexpectedly becomes NaN in some rare edge cases

Known issues

When training with DDP, the TensorBoard sometimes raises error and no longer updates after a validation. The temporary solution is adding the option --reload_multifile=true when launching TensorBoard.

Full change log: v2.2.0...v2.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.2.1: Important notice and minor patch release

Vocoder fine-tuning is available

Mutual influence between variance modules

Changes and bug fixes

Known issues