Skip to content

v2.4.0: Rectified Flow algorithm and new feature extractor based on harmonic-noise separation model

Latest
Compare
Choose a tag to compare
@yqzhishen yqzhishen released this 13 Jul 15:14
· 6 commits to main since this release

New generative model algorithm: Rectified Flow (#184)

Rectified Flow is a new ODE-based generative model algorithm which is introduced in this paper and used in Stable Diffusion 3. The experimental results has shown that Rectified Flow outperforms the former DDPM in all modules of DiffSinger. This should be the first publicly known usage of Rectified Flow in SVS systems.

Rectified Flow has already been the default algorithm to train a new DiffSinger model. No actions are required if you are using the template configuration file. Though not recommended, you can turn back to DDPM with the following line in your configuration:

diffusion_type: 'ddpm'  # default value is 'reflow'

Feature extractor based on harmonic-noise separation model (#196)

Harmonic-noise separation is a fundamental step to extract breathiness, voicing and tension from singing voice. The old WORLD-based method is unable to separate harmonic and noise clearly, making the extracted features not as accurate as expected. We introduced a new NN-based algorithm (Vocal Remover) for this separation process. With the new method, the performance of most variance parameters (especially tension) should improve.

The new harmonic-noise separator has already been the default choice for preprocessing new datasets. Please read the guidance in GettingStarted.md and download the model file. Though not recommended, you can still use WORLD with the following line in your configuration:

hnsep: world  # default value is 'vr'

Other improvements, changes and bug fixes

  • The --speedup option in infer.py is replaced by --steps for continuous acceleration of Rectified Flow
  • All exported models are adapted to the new continuous acceleration API
  • Mel log base migration: log10 setting is banned in preprocessing
  • Mel log base migration: all exported models are converted to accept log e mel spectrograms
  • The trainer now shows an error message when user sets all predict_* to false in variance model training
  • The binarizer now shows an error message when negative values are found in ph_dur or note_dur
  • Package versions in requirements.txt are updated; ONNX exporting requirements are written in requirements-onnx.txt
  • Bugfix: the extracted tension can be incorrect if the recording and label are not aligned

Some changes may not be listed above. See full change log: v2.3.0...v2.4.0