Make it multi-language? #2

zidsi · 2023-11-13T18:40:09Z

I was wondering if "injecting" language info would be possible. Something similar to what xtts is doing by injecting special language token e.g. [en] for GPT input.

Features from 3-sec speech prompt might not be enough (nor desired) to capture language of sample text (in order to do cross language speaker cloning). However concatenating "speech prompt" with some kind of language id (precomputed language features vector?) might enable ML (as multi-language) in addition to MS.

At inference changing this prompt part might enable inline language switching.

There might be better way of course. E.g. passing info directly to encoder PreNet? Anyway it wold be great to see this feature. VITS based YourTTS does similar thing.

p0p4k · 2023-11-13T18:46:18Z

I think it is possible to do it. Ill do it after I am sure this version of the model works at least for one language.

zidsi · 2023-11-15T19:45:56Z

LJSpeech sample sounds promissing. Will you be able to reuse weights for multi speaker (VCTK?) training? If "yes" I'll start training for single speaker dataset (non English).

p0p4k · 2023-11-16T01:55:28Z

Yes, can reuse.

zidsi · 2023-11-20T18:38:07Z

According to RADMMM title of issue/wish should be Make it Multiaccented.
Authors say:"We refer to our conditioning as accent instead of language,
because we consider language to be implicit in the phoneme
sequence. "
But let's first see how well 3sec conditioning works for multispeaker.

p0p4k · 2023-11-20T19:06:48Z

True, I am doing a multi-speaker training on my end as well, let's see if the generations are good enough without extra conditioning first. Good luck!

vuong-ts · 2023-12-02T05:35:31Z

Does the training of multi-speaker (VCTK) look good @p0p4k ?

rafaelvalle · 2023-12-02T06:47:57Z

VCTK should work but it should be easier to fit LibriTTS. The main issue with VCTK is that it there's a lot of silence at the beginning and end of some samples and automatic trimming methods are normally not accurate and end up trimming phonemes.
Accent and language control should be possible with one hot embeddings. VCTK and CML-Dataset are great candidates.

p0p4k · 2023-12-02T07:43:10Z

LibriTTS sounds like this @ 200k steps with guided sampling - https://voca.ro/1e0tSbWgbyuu

rishikksh20 · 2023-12-02T08:43:54Z

@p0p4k sample sounds good, I think with more training it will getting lot better. I think multi-linguility is easy to implement in this repo. I think problem occurs when you use one language native speaker prompt and generate other language speech.

p0p4k · 2023-12-02T09:08:28Z

On the other note, can adding some noise in the prompt help the model to extract "voice" better? Since I tried a zero-shot voice clone and it didn't perform that well.

archei2500 mentioned this issue Nov 4, 2024

Quick run in Google Colab doesn't work #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it multi-language? #2

Make it multi-language? #2

zidsi commented Nov 13, 2023

p0p4k commented Nov 13, 2023

zidsi commented Nov 15, 2023

p0p4k commented Nov 16, 2023

zidsi commented Nov 20, 2023

p0p4k commented Nov 20, 2023

vuong-ts commented Dec 2, 2023

rafaelvalle commented Dec 2, 2023

p0p4k commented Dec 2, 2023 •

edited

Loading

rishikksh20 commented Dec 2, 2023

p0p4k commented Dec 2, 2023

Make it multi-language? #2

Make it multi-language? #2

Comments

zidsi commented Nov 13, 2023

p0p4k commented Nov 13, 2023

zidsi commented Nov 15, 2023

p0p4k commented Nov 16, 2023

zidsi commented Nov 20, 2023

p0p4k commented Nov 20, 2023

vuong-ts commented Dec 2, 2023

rafaelvalle commented Dec 2, 2023

p0p4k commented Dec 2, 2023 • edited Loading

rishikksh20 commented Dec 2, 2023

p0p4k commented Dec 2, 2023

p0p4k commented Dec 2, 2023 •

edited

Loading