Update on getting styletts2 piper-tts and xtts to all work in one install #33

DrewThomasson · 2024-10-14T16:52:36Z

I don't know if you'll find this helpful or not but Ive managed to get all Coqui tts, piper-tts and styletts2 all of them to work in one requirements file

Google Colab of it working in
Testing_all_tts_services.ipynb.zip

Huggingface space showing them all working together
https://huggingface.co/spaces/drewThomasson/testing-all-tts-models-together

ROBERT-MCDOWELL · 2024-10-14T16:59:13Z

@DrewThomasson
More choices we have better it is indeed, since AI project are like startups era....
I'm still working on the refactoring, cleaning clode and optimizing. a lot of work! :)

DrewThomasson · 2024-10-14T18:04:02Z

What I've managed to find when trying to get ways of getting Calibre's ebook-convert function and ffmpeg built into the pip install

For Calibre

I was also looking at getting calibre to work with a pip install instead and found this

https://github.com/gutenbergtools/ebookconverter

but it doesn't work for windows :(

I think we might be able to find the binary ebook-convert exe in a Calibre install on windows to use that instead on windows Info on that ebook-convert for windows

FFmpeg

also these potential for ffmpeg as a static binary to include in it 👀
https://github.com/eugeneware/ffmpeg-static

ROBERT-MCDOWELL · 2024-10-14T19:46:21Z

I already explored what you just found and I found the only solution I'm working on, don't worry about Calibre and Ffmpeg, I found a way to not break the native use. FYI if you use chatGPT or other A.I. to help you to code be aware that sometimes copy/paste can generate a big mess at the end :o)

btw faster-whisper and whisperX are good engine for now.

DrewThomasson · 2024-10-14T19:52:13Z

oh dang! kk👍

lol yeah I honestly Never expected this to blow up so my code was a pretty rushed job using chatgpt to cut corners ngl

ROBERT-MCDOWELL · 2024-10-14T19:53:14Z

ooops btw faster-whisper and whisperX are more STT thant TTS :o\

DrewThomasson · 2024-10-14T19:53:24Z

Oh faster-whisper/whisperX the.... fine tuning xtts script?

DrewThomasson · 2024-10-14T19:53:51Z

lol yeah was confused when you mentioned it

ROBERT-MCDOWELL · 2024-10-14T19:54:06Z

with python expect to blow up at anytime with a little glitch in the matrix :D
anyhow I'm happy about the tests a I'm doing but you'll be maybe shocked of the refactoring ;)

ROBERT-MCDOWELL · 2024-10-14T19:57:45Z

piper styleTTS2 are nice indeed, great community, active repo.
we must maybe create a new option --model_engine... later....
ok I go back to work, see ya

DrewThomasson · 2024-10-14T20:23:50Z

kk👍

lol exactly already in my upcoming plans :) ---->
#32 (comment)

ROBERT-MCDOWELL · 2024-10-15T01:49:03Z

bark is also a nice funny engine
https://github.com/suno-ai/bark

DrewThomasson · 2024-10-15T01:52:18Z

Tru

It's suppose to be built into coqui tts But I run into issues trying to run it through their api?

I'll look further into it because I do quite like the model

What's unique to it is that not only does it clone the voice, but it also changes the speaking style,

So like you might have for instance

"Once upon a time"

And if your using a voice where the sample uses the words like- a lot it might come out like

"So like once upon like a time"

Very cool

https://docs.coqui.ai/en/latest/models/bark.html

DrewThomasson · 2024-10-15T01:58:25Z

I'll try it with the new updated repo??? 👀

https://github.com/idiap/coqui-ai-TTS

IDK HOW I JUST KNOW ABOUT THIS

FINALLY A FORK WHERE UPDATES ARE BEING APPLIED

DrewThomasson · 2024-10-15T01:58:48Z

I'll update you when I get a result from it lol

ROBERT-MCDOWELL · 2024-10-15T02:04:53Z

didn't know too!

DrewThomasson · 2024-10-15T02:54:57Z

Looks like it works on my end!

At the moment I've gotten the random speaker thing to work in this from the docs

text = "Hello, my name is Manmay , how are you?"

from TTS.tts.configs.bark_config import BarkConfig
from TTS.tts.models.bark import Bark

config = BarkConfig()
model = Bark.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True)

# with random speaker
output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None)

# cloning a speaker.
# It assumes that you have a speaker file in `bark_voices/speaker_n/speaker.wav` or `bark_voices/speaker_n/speaker.npz`
output_dict = model.synthesize(text, config, speaker_id="ljspeech", voice_dirs="bark_voices/")

And in order to get the bark model file I just lazily ran this line and the just refered to its location like this
This to download the model

tts = TTS("tts_models/multilingual/multi-dataset/bark", gpu=False)

This to load the downloaded model without using the api thing from the docs

model.load_checkpoint(config, checkpoint_dir="""/Users/drew/Library/Application Support/tts/tts_models--multilingual--multi-dataset--bark""", eval=True)

Here is the test output file

output.wav.zip

These tests were run on my m1 pro 16gb mac laptop in a python 3.10 env lol

DrewThomasson · 2024-10-15T03:23:57Z

lol I once got this working in a beta version of VoxNovel in Google Colab I know ill be able to get this working for this later

adding to list tho lol

DrewThomasson · 2024-10-15T04:40:34Z

ooo also gona add to the plans to add a way to use deepfilternet2 to denoise any reference input audio files

deepfilternet2

gradio space I made for demo using it lol

confirmed to run on even ARM mac

ROBERT-MCDOWELL · 2024-10-15T13:03:26Z

excellent Drew! denoiser is amazing too!

DrewThomasson · 2024-10-17T19:26:25Z

@ROBERT-MCDOWELL
I got it running locally on windows btw

and I had to change the install instructions

I had to specify the gradio version so then the model link works idk why I had to do that tho lol and the coqio-tts instead of tts

also I added the punkt_tab to the nltk downloads

here you can see here

pip install coqui-tts==0.24.2 pydub nltk beautifulsoup4 ebooklib tqdm gradio==4.44.0

python -m nltk.downloader punkt
python -m nltk.downloader punkt_tab

DrewThomasson · 2024-10-17T19:27:30Z

@ROBERT-MCDOWELL

Hope that helps you out with your pip creation

DrewThomasson · 2024-10-20T18:01:05Z

@ROBERT-MCDOWELL

Any updates?

ROBERT-MCDOWELL · 2024-10-20T19:02:42Z

it's coming :)... maybe tonight or tomorrow, first clean. other things later...

ROBERT-MCDOWELL · 2024-10-22T03:27:13Z

@DrewThomasson
check my repo to test it before I make a huge PR with a lot of changes.
just run ./install.sh on linux/mac or ./install.bat on windows
for now I tested on linux, half on windows (docker problem on my laptop) and not on Mac OS but should be ok.
brb tomorrow

DrewThomasson · 2024-10-22T03:39:38Z

@ROBERT-MCDOWELL
KK!
I'll check it out when I get a chance on both my intel and ARM macs :)

Intel Mac
ARM Mac
Windows
Linux

DrewThomasson · 2024-10-22T03:55:25Z

@ROBERT-MCDOWELL

Interesting... I'm seeing docker auto-install for all of them?

Is that needed for all of them to run? or just a preinstall for anyone to use the docker version?

ROBERT-MCDOWELL · 2024-10-22T14:10:31Z

DockerfileUtils is for now calibre and ffmpeg. and can be easily extended to further apps if neeeded by only adding the app to the list. This way is avoiding many many issues with versions conflict between python and the OS. the install script is installing python_env 3.11 into the repo folder and we don't have to waste our time with all these conflicts.

btw, this PR keeps backward compatibility to your current version. meaning if user wants to use directly app.py so no worries, as long as his OS python is ready to run it.

yes please check on Mac and Windows as I don't have VM nor native of these, and frankly I spent already a lot of time on linux so ;0)

" Is that needed for all of them to run? or just a preinstall for anyone to use the docker version?"
there is the Dockerfile (I didn't touch) to run the entire project into docker (no changes)
there is the DockerfileUtils used to run calibre and ffmpeg (and more if needed) under python 3.11 virtual env which is activated through ebook2audiobook.sh and ebook2audiobook.cmd after of course installed by install.sh or install.bat

Anyhow, I create a PR and up to you to merge or not...

DrewThomasson · 2024-10-22T14:31:36Z

Weird, I don't see any pull requests on my end?

I'll push it manually once I test it on my end once I'm done reviewing then I guess lol

ROBERT-MCDOWELL · 2024-10-22T14:54:13Z

I'm working on it :)
what samples folder is done for?

DrewThomasson · 2024-10-22T14:56:38Z

Oh the samples folder is just examples of what xtts sounds like speaking those different languages.

It's not needed lol

I was gona use the folder of sample texts within an automated test run script too.

ROBERT-MCDOWELL · 2024-10-22T14:57:51Z

ha ok so I'm going to check each of them and move them into the right voices folder...
btw gradio and command line can be run together without collision and it's multitask...

DrewThomasson · 2024-10-22T14:59:18Z

Kk

They all used the same voice sample to generate all the sample audio files anyway

ROBERT-MCDOWELL · 2024-10-22T15:30:16Z

who "they"?

DrewThomasson · 2024-10-22T16:27:05Z

The samples folder lol

Anyway reviewing your PR rn

Might take a day or two lol it's a lota changes to review! 😅

DrewThomasson · 2024-10-22T19:05:49Z

Confirmed Works in ARM MAC! 😄

#33 (comment)

ROBERT-MCDOWELL · 2024-10-22T19:36:28Z

EXCELLENT!!! :D you can keep the root_dir check fix, it doesn't bother.
meanwhile I add audiobooks_dir as global into to required function to be sure it's taking it.
I'm going to add --version option too.

DrewThomasson added documentation Improvements or additions to documentation TODOs labels Oct 14, 2024

DrewThomasson assigned DrewThomasson and unassigned DrewThomasson Oct 14, 2024

DrewThomasson mentioned this issue Oct 15, 2024

📚 [Want to contribute?] ebook2audiobookxtts roadmap #32

Open

58 tasks

Update on getting styletts2 piper-tts and xtts to all work in one install #33

Update on getting styletts2 piper-tts and xtts to all work in one install #33

Comments

DrewThomasson commented Oct 14, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 14, 2024

DrewThomasson commented Oct 14, 2024 • edited Loading

What I've managed to find when trying to get ways of getting Calibre's ebook-convert function and ffmpeg built into the pip install

For Calibre

FFmpeg

ROBERT-MCDOWELL commented Oct 14, 2024 • edited Loading

DrewThomasson commented Oct 14, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 14, 2024

DrewThomasson commented Oct 14, 2024

DrewThomasson commented Oct 14, 2024

ROBERT-MCDOWELL commented Oct 14, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 14, 2024 • edited Loading

DrewThomasson commented Oct 14, 2024

ROBERT-MCDOWELL commented Oct 15, 2024

DrewThomasson commented Oct 15, 2024 • edited Loading

DrewThomasson commented Oct 15, 2024

DrewThomasson commented Oct 15, 2024

ROBERT-MCDOWELL commented Oct 15, 2024

DrewThomasson commented Oct 15, 2024 • edited Loading

Looks like it works on my end!

Here is the test output file

These tests were run on my m1 pro 16gb mac laptop in a python 3.10 env lol

DrewThomasson commented Oct 15, 2024

DrewThomasson commented Oct 15, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 15, 2024

DrewThomasson commented Oct 17, 2024 • edited Loading

DrewThomasson commented Oct 17, 2024

DrewThomasson commented Oct 20, 2024

ROBERT-MCDOWELL commented Oct 20, 2024

ROBERT-MCDOWELL commented Oct 22, 2024 • edited Loading

DrewThomasson commented Oct 22, 2024 • edited Loading

DrewThomasson commented Oct 22, 2024

ROBERT-MCDOWELL commented Oct 22, 2024 • edited Loading

DrewThomasson commented Oct 22, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 22, 2024 • edited Loading

DrewThomasson commented Oct 22, 2024

ROBERT-MCDOWELL commented Oct 22, 2024 • edited Loading

DrewThomasson commented Oct 22, 2024

ROBERT-MCDOWELL commented Oct 22, 2024

DrewThomasson commented Oct 22, 2024

DrewThomasson commented Oct 22, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 22, 2024 • edited Loading

DrewThomasson commented Oct 14, 2024 •

edited

Loading

DrewThomasson commented Oct 14, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 14, 2024 •

edited

Loading

DrewThomasson commented Oct 14, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 14, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 14, 2024 •

edited

Loading

DrewThomasson commented Oct 15, 2024 •

edited

Loading

DrewThomasson commented Oct 15, 2024 •

edited

Loading

DrewThomasson commented Oct 15, 2024 •

edited

Loading

DrewThomasson commented Oct 17, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 22, 2024 •

edited

Loading

DrewThomasson commented Oct 22, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 22, 2024 •

edited

Loading

DrewThomasson commented Oct 22, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 22, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 22, 2024 •

edited

Loading

DrewThomasson commented Oct 22, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 22, 2024 •

edited

Loading