Skip to content

GPT‐SoVITS‐v2‐features (新特性)

RVC-Boss edited this page Aug 6, 2024 · 4 revisions

1----v1v2情况对比 (v2 compared with v1)

语种支持(可互相跨语种合成) GPT训练集时长 SoVITS训练集时长 推理速度 参数量 文本前端 功能
v1(1月发布) 中日英 2k小时 2k小时 baseline 200M baseline baseline
v2 中日英韩粤 2.5k小时 vq encoder2k小时,剩余5k小时 翻倍 不变 中日英逻辑均有增强 新增语速调节,无参考文本模式,更好的混合语种切分
Language Support (Cross-language synthesis) GPT Training Dataset Duration SoVITS Training Dataset Duration Inference Speed Number of Parameters Text Frontend Features
v1 (Released in January) Chinese, Japanese, English 2k hours 2k hours baseline 200M baseline baseline
v2 Chinese, Japanese, English, Korean, Cantonese 2.5k hours vq encoder 2k hours, while the other params 5k hours doubled unchanged Enhanced performance for Chinese, Japanese, and English Added speed control, reference-free mode, better mixed-language slices

2----v2模型新特点 (v2 Model New Features)

(1)SoVITS:对低音质参考音频(尤其是来源于网络的高频严重缺失、听着很闷的音频)合成出来音质更好

SoVITS: Improved synthesis quality for low-quality reference audio (especially audio with severe high-frequency loss and muffled sound from the internet).

(2)加大训练集到5k小时,zero shot性能更好音色更像

Increased Training Dataset: Expanded to 5k hours, enhancing zero-shot performance and making the timbre more similar.

(3)增加2个语种,现在可5语种之间互相跨语种合成(跨语种合成,指训练集、参考音频语种和需要合成的语种不同)

Added Two Languages: Now supports cross-language synthesis among five languages (cross-language synthesis means that the training dataset, reference audio language, and the language to be synthesized can all be different).

(4)更好的文本前端:持续迭代更新。v2中英文加入了多音字优化。

Improved Text Frontend: Continuously updated. For v2, Chinese and English have been optimized for polyphonic characters.