-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
descript/encodec is too slow in dataloader #14
Comments
Yes, it is supposed to be done in collate function or inside the model itself, so we can take advantage of batches. In this implementation, I think I am doing one at a time (?) |
Run self.encodec on GPU can speed up the process but I got CUDA error.
|
You can load a dac encoder on each of your gpu, then send data to corresponding gpu and calculate it. Another temporary option is to do it and save in a path file for all your audio files in preprocess and then use it for training. |
So, I loaded the DAC model into pflowTTS, but I haven't had success in training the entire pflowTTS with DAC:
|
So, i manage to train pflow with dac encodec. I use the following code to decode audio with
The audio output is not as good as mel-spectrogram representation. Can I ask your comment on this @p0p4k? |
@rafaelvalle
|
@vuong-ts The latest meta paper AudioBox uses OT-CFM on encodec latents. But the twist is, pre-training the TTS model with lots of encodec data (~60k). Pflow tts without the loss-mask, MAS and text conditioning is almost equivalent to pre-training it. Then we can finetune on text conditioning. That might be the solution. Essentially give a masked wav -> masked latent -> train ot-cfm to predict the entire latent (like BERT) and then downstream it for TTS. |
Hi @p0p4k ,
I see that the process time of dac encode in
dev/descript_codec
branch is too slow on CPU in Dataloader. How can we speedup this process?The text was updated successfully, but these errors were encountered: