phoneme question #38

yiwei0730 · 2024-03-20T01:44:17Z

I would like to know why chose phonemizer as the method of use in the first place. I would like to ask if the mixed language (chinese+english) processing method can be replaced by bopomofo_to_ipa ?

p0p4k · 2024-03-20T04:05:13Z

can be replaced with anything you like :)
i just implemented the easiest basic plug and play phonemizer. If there is something better that you use, please send a PR and add a flag to use/not use the newer phonemizer. Thanks!

yiwei0730 · 2024-03-20T05:14:09Z

OK, let me think about it. I'm trying the hierspeech in the same time but the preprocessing with yappt to extract_F0 is annoying, it cost so many time.

yiwei0730 · 2024-03-20T08:32:07Z

I would like to ask if there will be any problems if mixed language is used in this project?
My primary language is Chinese and my secondary language is English

p0p4k · 2024-03-20T08:49:17Z

I don't think so, if model gets signal to use the language, it should adapt

yiwei0730 · 2024-03-20T09:00:23Z

I don't think so, if model gets signal to use the language, it should adapt

I'm thinking about whether I can use Bert-vits2's phoneme, or else use the original-Chinese initial+final+tone_number.
However, there are some differences in the data_utils used and need to be improved. bert-vits2 uses the tone_emb、language_embedding in the Text encoder. In addition, I also think that it may be more appropriate to take the data processing part outside and do preprocess_text first before performing training.

yiwei0730 · 2024-03-22T09:11:35Z

@p0p4k
I'm a little confused now about the data processing model of multilingual speakers. The way this repo is used special and different from other repo I have deal with in the past.
Then it seems that there is no text pre-processing(done in Text with using cleaners) or speaker map settings.
Are there any suggestions or scripts for dealing with multilingual speakers and multilanguage?

My idea:
Currently, the data folder(my multilingual data) is thrown into pflowtts_pytorch.
The plan is to use preprocess_text.py to handle text processing in this program. which Bert-vits2 used.
When entering the textmodule, you only need to read (you need to create a spk_map yourself to read to convert spk into numbers)
Finally, improve the python files train, speech-prompt, and pflowtts.

I don’t know if there’s anything I’ve overlooked or something I could ask for advice on.

p0p4k · 2024-03-25T07:54:55Z

@yiwei0730 you can use the bert_vits2 dataloader and modify some part in pflow to use it directly, throw away audio since we do not do e2e, just need spectrograms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phoneme question #38

phoneme question #38

yiwei0730 commented Mar 20, 2024

p0p4k commented Mar 20, 2024

yiwei0730 commented Mar 20, 2024

yiwei0730 commented Mar 20, 2024

p0p4k commented Mar 20, 2024

yiwei0730 commented Mar 20, 2024 •

edited

Loading

yiwei0730 commented Mar 22, 2024

p0p4k commented Mar 25, 2024

phoneme question #38

phoneme question #38

Comments

yiwei0730 commented Mar 20, 2024

p0p4k commented Mar 20, 2024

yiwei0730 commented Mar 20, 2024

yiwei0730 commented Mar 20, 2024

p0p4k commented Mar 20, 2024

yiwei0730 commented Mar 20, 2024 • edited Loading

yiwei0730 commented Mar 22, 2024

p0p4k commented Mar 25, 2024

yiwei0730 commented Mar 20, 2024 •

edited

Loading