Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phoneme question #38

Open
yiwei0730 opened this issue Mar 20, 2024 · 7 comments
Open

phoneme question #38

yiwei0730 opened this issue Mar 20, 2024 · 7 comments

Comments

@yiwei0730
Copy link

I would like to know why chose phonemizer as the method of use in the first place. I would like to ask if the mixed language (chinese+english) processing method can be replaced by bopomofo_to_ipa ?

@p0p4k
Copy link
Owner

p0p4k commented Mar 20, 2024

can be replaced with anything you like :)
i just implemented the easiest basic plug and play phonemizer. If there is something better that you use, please send a PR and add a flag to use/not use the newer phonemizer. Thanks!

@yiwei0730
Copy link
Author

OK, let me think about it. I'm trying the hierspeech in the same time but the preprocessing with yappt to extract_F0 is annoying, it cost so many time.

@yiwei0730
Copy link
Author

I would like to ask if there will be any problems if mixed language is used in this project?
My primary language is Chinese and my secondary language is English

@p0p4k
Copy link
Owner

p0p4k commented Mar 20, 2024

I don't think so, if model gets signal to use the language, it should adapt

@yiwei0730
Copy link
Author

yiwei0730 commented Mar 20, 2024

I don't think so, if model gets signal to use the language, it should adapt

I'm thinking about whether I can use Bert-vits2's phoneme, or else use the original-Chinese initial+final+tone_number.
However, there are some differences in the data_utils used and need to be improved. bert-vits2 uses the tone_emb、language_embedding in the Text encoder. In addition, I also think that it may be more appropriate to take the data processing part outside and do preprocess_text first before performing training.

@yiwei0730
Copy link
Author

@p0p4k
I'm a little confused now about the data processing model of multilingual speakers. The way this repo is used special and different from other repo I have deal with in the past.
Then it seems that there is no text pre-processing(done in Text with using cleaners) or speaker map settings.
Are there any suggestions or scripts for dealing with multilingual speakers and multilanguage?

My idea:
Currently, the data folder(my multilingual data) is thrown into pflowtts_pytorch.
The plan is to use preprocess_text.py to handle text processing in this program. which Bert-vits2 used.
When entering the textmodule, you only need to read (you need to create a spk_map yourself to read to convert spk into numbers)
Finally, improve the python files train, speech-prompt, and pflowtts.

I don’t know if there’s anything I’ve overlooked or something I could ask for advice on.

@p0p4k
Copy link
Owner

p0p4k commented Mar 25, 2024

@yiwei0730 you can use the bert_vits2 dataloader and modify some part in pflow to use it directly, throw away audio since we do not do e2e, just need spectrograms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants