-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query regarding SoundStorm USLM implementation #1
Comments
By the way thanks for training code implementation. |
Thanks for your attention! |
Yes Soundstorm yield better quality due to use of conformer, I don't aspect any speed quality as well. When I get time and resource I will train SpeechTokeizer and USLM (Soundstorm) on large LibriLight, MLS and Gigaspeech dataset, I think it will yeild production level quality. Meanwhile please share SpeechTokeizer training code if possible. |
We will soon release a SpeechTokenizer trained on a larger dataset. But the open-sourcing of the training code might face some delays. This is due to the semantic distillation process during training which required modifications to the relevant model code within fairseq. Organizing this code and contemplating the most suitable way to release it might take a significant amount of time. Given our other ongoing projects, we cannot currently estimate a timeline for the release of the training code. |
When will this model weight be released? |
When you replaced VALL-E's NAR with Soundstorm, did you adopt SoundStorm's mask strategy? Or did you not change your mask strategy? |
@0417keito We adopt SoundStorm's mask strategy. |
@ZhangXInFD Are you simply replaced the 'NAR' of USLM with trained SoundStorm speech tokenizer for zero shot TTS task ?
Although quality of SoundStorm is much better Have you notice any speed advantages while using SoundStorm compare to original USLM ?
The text was updated successfully, but these errors were encountered: