descript/encodec is too slow in dataloader #14

vuong-ts · 2023-12-08T09:45:51Z

I see that the process time of dac encode in dev/descript_codec branch is too slow on CPU in Dataloader. How can we speedup this process?

def batched_encodec(self, wav):
    with torch.no_grad():
         self.encodec.eval()
         wav = self.resampler(wav) # resample to 24khz
         signal = AudioSignal(wav, 24000)
         x = self.encodec.preprocess(signal.audio_data, signal.sample_rate)
         _, _, latents, _, _ = self.encodec.encode(x)
     return latents

The text was updated successfully, but these errors were encountered:

p0p4k · 2023-12-08T09:48:28Z

Yes, it is supposed to be done in collate function or inside the model itself, so we can take advantage of batches. In this implementation, I think I am doing one at a time (?)

vuong-ts · 2023-12-08T10:12:30Z

Run self.encodec on GPU can speed up the process but I got CUDA error.

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

p0p4k · 2023-12-08T10:23:59Z

You can load a dac encoder on each of your gpu, then send data to corresponding gpu and calculate it. Another temporary option is to do it and save in a path file for all your audio files in preprocess and then use it for training.

vuong-ts · 2023-12-08T13:43:13Z

So, I loaded the DAC model into pflowTTS, but I haven't had success in training the entire pflowTTS with DAC:

random segmentation fault.
NaN loss after 1/2 epochs.

vuong-ts · 2023-12-14T05:12:35Z

So, i manage to train pflow with dac encodec.

I use the following code to decode audio with dac.

dataset: ljspeech
epoch: 200
text: On one occasion a disturbance was raised which was not quelled until windows had been broken and forms and tables burnt.

with torch.no_grad():
    dac_encodec.eval()
    wav = resampler(wav) # resample to 24khz
    signal = AudioSignal(wav, 24000)
    x = dac_encodec.preprocess(signal.audio_data, signal.sample_rate).to(device)
    _, _, latents, _, _ = dac_encodec.encode(x)
#
output = synthesise(text, latents)
pred_latents = output["decoder_outputs"]
pred_latents = pred_latents.reshape(1, 32, 8, -1) # B x N x D x T
pred_latents = pred_latents.to(device)
#
z_q = 0
for i in range(32):
    z_q_i, indices = dac_encodec.quantizer.quantizers[i].decode_latents(pred_latents[:, i, :, :])
    z_q_i = dac_encodec.quantizer.quantizers[i].out_proj(z_q_i)
    z_q += z_q_i
#
# Synthesize text_to_speech
idp.Audio(dac_encodec.decode(z_q).squeeze(dim=0).detach().cpu().numpy(), rate=24_000)

The audio output is not as good as mel-spectrogram representation.
https://drive.google.com/file/d/18hDs-mL8mqwmuVTQd8ZMfFsfWFsfxMW9/view

Can I ask your comment on this @p0p4k?

vuong-ts · 2023-12-14T05:19:28Z

@rafaelvalle
Regarding the neural codec presentation, Can I ask you questions as you are one of the authors 😊

Have you tried to train Pflow with an audio codec code like VALL-E instead of mel-spectrogram?
Is the training of neural codec representation significantly slower than that of mel-spectrogram?

p0p4k · 2023-12-14T06:32:26Z

@vuong-ts The latest meta paper AudioBox uses OT-CFM on encodec latents. But the twist is, pre-training the TTS model with lots of encodec data (~60k). Pflow tts without the loss-mask, MAS and text conditioning is almost equivalent to pre-training it. Then we can finetune on text conditioning. That might be the solution. Essentially give a masked wav -> masked latent -> train ot-cfm to predict the entire latent (like BERT) and then downstream it for TTS.

vuong-ts changed the title ~~descript/encodec is too slow in dataworker~~ descript/encodec is too slow in dataloader Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

descript/encodec is too slow in dataloader #14

descript/encodec is too slow in dataloader #14

vuong-ts commented Dec 8, 2023 •

edited

Loading

p0p4k commented Dec 8, 2023

vuong-ts commented Dec 8, 2023

p0p4k commented Dec 8, 2023

vuong-ts commented Dec 8, 2023

vuong-ts commented Dec 14, 2023

vuong-ts commented Dec 14, 2023 •

edited

Loading

p0p4k commented Dec 14, 2023

descript/encodec is too slow in dataloader #14

descript/encodec is too slow in dataloader #14

Comments

vuong-ts commented Dec 8, 2023 • edited Loading

p0p4k commented Dec 8, 2023

vuong-ts commented Dec 8, 2023

p0p4k commented Dec 8, 2023

vuong-ts commented Dec 8, 2023

vuong-ts commented Dec 14, 2023

vuong-ts commented Dec 14, 2023 • edited Loading

p0p4k commented Dec 14, 2023

vuong-ts commented Dec 8, 2023 •

edited

Loading

vuong-ts commented Dec 14, 2023 •

edited

Loading