-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FYI: Run models from piper with the Next-gen Kaldi subproject sherpa-onnx #251
Comments
@csukuangfj where to find the Android APKs? |
@csukuangfj Yes, it would be good to know about android tts as well. Could you please tell where to get it? |
I'm sorry for not getting back to you sooner. I have been working on converting more models from piper. Now all models of the following languages have been converted to sherpa-onnx:
You can find the Android APKs on the following page. |
Are there using standard android text-to-speech api or not?
…On 10/29/23, Fangjun Kuang ***@***.***> wrote:
I'm sorry for not getting back to you sooner.
I have been working on converting more models from piper.
Now all models of the following languages have been converted to
sherpa-onnx:
- English (both US and GB)
- French
- German
- Spanish (both ES and MX)
You can find the Android APKs on the following page.
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html
<img width="901" alt="Screenshot 2023-10-29 at 17 48 27"
src="https://github.com/rhasspy/piper/assets/5284924/c36b2eb7-ca4a-411d-8a03-48851a8d2c09">
--
Reply to this email directly or view it on GitHub:
#251 (comment)
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>
--
with best regards Beqa Gozalishvili
Tell: +995593454005
Email: ***@***.***
Web: https://gozaltech.org
Skype: beqabeqa473
Telegram: https://t.me/gozaltech
facebook: https://facebook.com/gozaltech
twitter: https://twitter.com/beqabeqa473
Instagram: https://instagram.com/beqa.gozalishvili
|
No, it uses sherpa-onnx with vits pre-trained models for tts. Everything is open-sourced. You can find the source code for the android project at The underlying C++ code can be found at https://github.com/k2-fsa/sherpa-onnx The JNI C++ binding code can be found at You can find kotlin API examples at |
Aah, ok, i ment standard tts-engine api bindings. I may try to do it
in some future to use this tts as a standard andtoid tts engine for
example with screenreaders.
…On 10/29/23, Fangjun Kuang ***@***.***> wrote:
> Are there using standard android text-to-speech api or not?
@beqabeqa473
No, it uses sherpa-onnx with vits pre-trained models for tts.
Everything is open-sourced. You can find the source code for the android
project at
https://github.com/k2-fsa/sherpa-onnx/tree/master/android/SherpaOnnxTts
The underlying C++ code can be found at
https://github.com/k2-fsa/sherpa-onnx
The JNI C++ binding code can be found at
https://github.com/k2-fsa/sherpa-onnx/tree/master/sherpa-onnx/jni
You can find kotlin API examples at
https://github.com/k2-fsa/sherpa-onnx/tree/master/kotlin-api-examples
--
Reply to this email directly or view it on GitHub:
#251 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
--
with best regards Beqa Gozalishvili
Tell: +995593454005
Email: ***@***.***
Web: https://gozaltech.org
Skype: beqabeqa473
Telegram: https://t.me/gozaltech
facebook: https://facebook.com/gozaltech
twitter: https://twitter.com/beqabeqa473
Instagram: https://instagram.com/beqa.gozalishvili
|
Thanks for doing this @csukuangfj! I'd looked into sherpa-onnx at one point, but wasn't sure how to proceed. I'd like to link to your work when you think it's stable enough; I do want to make sure people understand that pronunciations may be slightly different due to the pre-computed lexicon. Speaking of the lexicon, could it be extended dynamically at runtime with your approach? |
We have detailed documentation at Could you tell us what you want to do? We can clarify the doc if you think it is not clear.
The lexicon.txt is generated by following the colab notebook from this repo The exact code can be found at Could you explain where the difference comes from?
No, it cannot. If there is an OOV at runtime, it is simply ignored, though a message is printed to tell the user
Thank you! I think the support for offline VITS models is stable now. (The APIs for the VITS model are quite simple and |
I meant more "big picture" in how I should proceed. I wasn't sure if it was worth investigating porting Piper to sherpa-onnx. I'd be curious if you've noticed any speed difference. |
@synesthesiam Could you have a look at the following two PRs? |
https://huggingface.co/csukuangfj/vits-piper-pt_PT-tugao-medium/tree/main I have converted all of the models from piper to sherpa-onnx. (No that you can all run the models on Android/iOS/Raspberry Pi, etc). |
@csukuangfj does this apply to piper models only? is lexicon required for coqui tts models? I'm following up on [#257] I couldn't use my coqui tts converted sherpa onyx model because I had to manually add words to lexicon and there was poor pronunciation for single words. |
No, it is also not required for coqui tts models All vits models for coqui don't use lexicon.txt for sherpa-onnx.
Please look at just one coqui model at For instance, you can look at Download it, unzip it, and you will find the code for exporting models from coqui to sherpa-onnx. |
@csukuangfj meaning your notebook doesn't work anymore ? https://colab.research.google.com/drive/1cI9VzlimS51uAw4uCR-OBeSXRPBc4KoK?usp=sharing |
I just updated the colab notebook. Please reload it. The updated colab notebook is much much simpler than before. |
Your colab notebook works for default vits models, but when I use my fine tuned vits model which contains words like "orrse", "atua" (not in the English dictionary) I get the error The first colab which used lexicons worked, but this does not work with a fine tuned model containing your own words. How can we solve this issue? |
please show your meta data and add |
adding --debug=1, I have the output:
and this is the generated token.txt file content:
|
Could you share your config.json? The English VITS models from coqui use phonemes. All other non-English models from coqui use Characters. |
From your
Unfortunately, we don't support models using You can find all supported models at You can find the script for converting the model by unzipping the downloaded file. |
@csukuangfj how can I fine-tune my model to support this ? I shared the colab notebook I used in my previous message. Can you take a look ? Is it possible to change the configuration and re-fine tune my model? In case that’s not possible and I decide to train/fine tune using piper , do you have a similar colab notebook for converting piper model to onnx ? |
Please download a model and unzip it, you will find the converting script. |
@csukuangfj I have fine tuned a model with characters_class="TTS.tts.models.vits.VitsCharacters" and I'm able to synthesis now using your colab notebook. it is working :) Thanks a lot. Now I want to try on android and iOS but I can see android uses the old code below. Will it ignore the lexicon file?
|
please see where and how this function is called. |
// Example 1:
// modelDir = "vits-vctk"
// modelName = "vits-vctk.onnx"
// lexicon = "lexicon.txt"
// Example 2:
// https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
// https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-amy-low.tar.bz2
// modelDir = "vits-piper-en_US-amy-low"
// modelName = "en_US-amy-low.onnx"
// dataDir = "vits-piper-en_US-amy-low/espeak-ng-data"
// Example 3:
// modelDir = "vits-zh-aishell3"
// modelName = "vits-aishell3.onnx"
// ruleFsts = "vits-zh-aishell3/rule.fst"
// lexcion = "lexicon.txt" In your case, please use Example 2. |
@csukuangfj Thanks a lot for your patience. I'm learning a lot as a beginner. I have run the android app with version 1.9.3 .so files and it worked but I had to make some changes to the
I had to change the original to the the version below which worked, but I'm not sure if it has any implications:
|
Thanks! Would you mind making a PR to fix it? |
The working code is from ChatGPT. I don't know why it works. I asked it why the app crashed and it told me why with a solution. I think you need to first check and confirm it does not cause any other issue before making a PR. Example, in your recent video on twitter (X), synthesis is very fast but mine is a bit slow, so not sure if it's due to the code. Thanks |
I just fixed it in the master branch. I am using a small model in the video. How large is your model? |
Okay that's great. Hope you will soon fix the single word pronunciation issue too. My model size is 145MB |
Sorry, it is not in the plan. The major difficulty is that the phonemizer used by IPAPhonemes is hard to port to C++. As you know, you are training your model in Python, but if you want to deploy it, every part must be converted to C++, including the phonemizer. All the VITS models from coqui-ai/tts are listed below.
You can see that only 3 of them are using I suggest that you switch to
or
|
You can also use espeak-ng in coqui-ai/tts, though I find that only English VITS models from coqui-ai/tts are using espeak-ng. |
Thank you @csukuangfj , I honestly don't think I stumbled across all of these instructions while I was trying to do the conversion for the hours I was trying. It was much easier to do with the instructions you created. I was able to use the sherpa-onnx-offline-tts example to create a wav with my custom voice trained from scratch. However, the quality was not very good at all. Lots of words with strange pronunciations. The words were pronounced much more accurately piper. Also, the JSON file that piper preprocess created for me needed some changes for your script to run. The language key and espeak key didn't look the same as the en_US-amy-medium.onnx.json file I compared it to. In en_US-amy-medium.onnx.json there is:
and
The json for my custom voice, trained from scratch only had this for language:
and also just "en" for espeak voice. This caused your example python script to error, so I adjusted the JSON manually. The JSON file for my onnx was created by piper preprocess, so maybe I used it wrong, which would explain why those fields are wrong/missing. I'll look into it some more. |
@csukuangfj Please check if my configuration for fine tuning a Vits model using coqui is okay. I am not getting intelligible sound after fine tuning using VitsCharacter, even for English words/phrases. Seems I am doing something wrong:
I read this and seems he fixed the issue by setting |
Sorry that I am not familiar with coqui-ai/tts. I suggest that you ask in the repo of coqui-ai/tts. |
okay no problem. I am switching from coqui to piper since I'm facing some issues. |
I am currently training using "use_phonemes=False" (coqui tts) and seems to be working so far. If it still doesn't work I will switch completely to piper. Piper has very good documentation |
So I managed to get both coqui tts and piper working but I have decided to stick to piper because the model size is smaller than coqui tts therefore reducing latency. Piper seems to have better pronunciations too. @csukuangfj I am not sure if you need to update script in model zip file.
changing the version to 1.16.1 doesn't work either. so I changed to Also, I had to manually change the json file to include:
because the original export from piper only had
Without changing the python script for exporting to sherpa-onnx will fail at :
since there is no "name_english" |
I just supported replacing the system TTS engine in k2-fsa/sherpa-onnx#508 You can find a YouTube video at |
@csukuangfj when will Sherpa support coqui XTTS-v2 models? |
The model is larger than 1 GB, which requires a GPU, I think. We won't support it in k2-fsa/sherpa-onnx, which is targeted mainly for embedded environment. But we may support it in k2-fsa/sherpa, though we cannot say a time when it will be supported. |
@csukuangfj what about StyleTTS2 models which has elevenlabs human sounding quality and pytorch support https://github.com/yl4579/StyleTTS2 |
Does it have onnx export support? |
Not at the moment |
@csukuangfj currently, which model sounds close to human quality on sherpa onnx? Coqui or piper tts models? And are these two the only shpera onnx supports? |
Please visit There are more than 100 tts models and the best way to find out which model sounds best to you is to try it by yourself. |
No. shepra-onnx currently supports VITS tts models and it is not limited to coqui or piper. |
I tried a couple of them in the past actually. I was hoping you'd have a "top 3" model list. What I noticed with sherpa onnx is there's a trade off between quality & on-device processing compared to cloud solutions out there. |
Could you describe which model you are using? @nanaghartey |
I'm using my own fine tuned coqui and piper tts vits models. Both sound good before converting to sherpa onnx...but this is the case for the various other English models I tried out |
@csukuangfj Please take a look at this issue on StyleTTS2 - #117 |
We have already supported Piper. Is there anything special with Style TTS2 @nanaghartey |
@csukuangfj Piper can't be compared to StyleTTS2. StyleTTS2 is currently the only open source solution close to proprietary solutions like elevenlabs, open ai's tts, recent gemini voices.. |
What is the model size of StyleTTS2? Does it require GPU? Could you post the link to the inference script with onnx for StyleTTS2? |
@csukuangfj The author who converted to onnx has not shared the script yet. I was thinking you'd take a look at the repo and see if it's something you can work on. As you can see from the thread, others are trying to export to onnx |
Sorry, I don't have extra time to do that. If there are existing ONNX inference scripts, I can take a look. |
@csukuangfj No problem. I will share the scripts once it's available. Thanks |
Is this discussion still related to Rhasspy/Piper or has it drifted to another (impressive) project? What's the status here and are there any ambitions to work in this direction? I don't have time but I would help funding this. |
sherpa-onnx provides runtime supports for models from various frameworks, including those from piper. sherpa-onnx does not provide support to train your models, but piper does that. Different from piper, sherpa-onnx provides support for various platforms and programming languages. For instance, you can run piper models with sherpa-onnx on iOS, Android, Linux, windows, macoOS, etc. Also, sherpa-onnx supports not only text-to-speech, but it also supports speech-to-text, speaker diarization, etc. |
Thanks, yes - I understood these differences. Especially the portabililty to other OSes like Android are super valuable and my question is basically if there are considerations to bring the sherpa-onnx functionality into the HA companion Android app. |
FYI: We have supported piper models in
https://github.com/k2-fsa/sherpa-onnx
Note that it does not depend on https://github.com/rhasspy/piper-phonemize
sherpa-onnx supports a variety of platforms, such as
It also provides various programming language APIs, e.g., C/C++/Python/Kotlin/Swift/C#/Go. We also have android APKs for TTS.
You can find the installation doc at https://k2-fsa.github.io/sherpa/onnx/install/index.html
You can find the usage of piper models with sherpa-onnx at
https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/vits.html#lessac-blizzard2013-medium-english-single-speaker
We also have a huggingface space for you to try piper models with sherpa-onnx.
Please visit
https://huggingface.co/spaces/k2-fsa/text-to-speech
You can find the PR supporting piper in sherpa-onnx at k2-fsa/sherpa-onnx#390
The text was updated successfully, but these errors were encountered: