-
Notifications
You must be signed in to change notification settings - Fork 4.2k
GPT‐SoVITS‐v2‐features (新特性)
1----v1v2情况对比 (v2 compared with v1)
语种支持(可互相跨语种合成) | GPT训练集时长 | SoVITS训练集时长 | 推理速度 | 参数量 | 文本前端 | 功能 | |
---|---|---|---|---|---|---|---|
v1(1月发布) | 中日英 | 2k小时 | 2k小时 | baseline | 200M | baseline | baseline |
v2 | 中日英韩粤 | 2.5k小时 | vq encoder2k小时,剩余5k小时 | 翻倍 | 不变 | 中日英逻辑均有增强 | 新增语速调节,无参考文本模式,更好的混合语种切分 |
Language Support (Cross-language synthesis) | GPT Training Dataset Duration | SoVITS Training Dataset Duration | Inference Speed | Number of Parameters | Text Frontend | Features | |
---|---|---|---|---|---|---|---|
v1 (Released in January) | Chinese, Japanese, English | 2k hours | 2k hours | baseline | 200M | baseline | baseline |
v2 | Chinese, Japanese, English, Korean, Cantonese | 2.5k hours | vq encoder 2k hours, while the other params 5k hours | doubled | unchanged | Enhanced performance for Chinese, Japanese, and English | Added speed control, reference-free mode, better mixed-language slices |
2----v2模型新特点 (v2 Model New Features)
(1)SoVITS:对低音质参考音频(尤其是来源于网络的高频严重缺失、听着很闷的音频)合成出来音质更好
SoVITS: Improved synthesis quality for low-quality reference audio (especially audio with severe high-frequency loss and muffled sound from the internet).
(2)加大训练集到5k小时,zero shot性能更好音色更像
Increased Training Dataset: Expanded to 5k hours, enhancing zero-shot performance and making the timbre more similar.
(3)增加2个语种,现在可5语种之间互相跨语种合成(跨语种合成,指训练集、参考音频语种和需要合成的语种不同)
Added Two Languages: Now supports cross-language synthesis among five languages (cross-language synthesis means that the training dataset, reference audio language, and the language to be synthesized can all be different).
(4)更好的文本前端:持续迭代更新。v2中英文加入了多音字优化。
Improved Text Frontend: Continuously updated. For v2, Chinese and English have been optimized for polyphonic characters.