AllTalk version 2 progress. #211
Replies: 12 comments 19 replies
-
Ohh that sounds really promising. I'm getting more and more excited. 😁 |
Beta Was this translation helpful? Give feedback.
-
This week (so far):
|
Beta Was this translation helpful? Give feedback.
-
I'm a novice but I have an M3 Max Macbook and would volunteer to help test, if you need it. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I realised that some of my updates on here show as hidden/"Show 3 previous replies" so if you miss them, they are here. |
Beta Was this translation helpful? Give feedback.
-
DeepSpeed for Linux - Finally simple installshttps://github.com/erew123/alltalk_tts/releases/tag/DeepSpeed-14.2-Linux Will be able to automate the standalone version of DeepSpeed on Linux now. |
Beta Was this translation helpful? Give feedback.
-
Voice 2 RVC pipelineAKA record your own voice and make it sound like someone else (no idea how good this will be, but its there) |
Beta Was this translation helpful? Give feedback.
-
Working on code documentation and tidy upSpent quite a bit of time on more code cleanup. The below is an example of part of one of the 4x TTS engine scripts. They've been cleaned up to show examples of how to import a new TTS engine, so that in theory, other people can use them as examples to import other engines in future. Left to do, before I throw a beta out:
Later on, after the BETA is out, Ill be dealing with bugs maybe and looking to setup the modular loader aspect of AllTalk aka do/dont load finetuning as part of the interface or other such aspects. Then also hitting on the other aspects of Feature requests. Though a day off may be kind of nice! |
Beta Was this translation helpful? Give feedback.
-
XTTS FinetuningHas been given some spit and polish (mostly spit). A couple of updated features. No more being restricted to just the 2x models to use for training. Choose your own folder for output w/ protection over-write. Set the maximum duration of audio files that Whisper can generate when making training data. A few bits tidied around the interface etc. |
Beta Was this translation helpful? Give feedback.
-
Final steps...
|
Beta Was this translation helpful? Give feedback.
-
Everyone..... I will probably not be commenting on here about the BETA any more...... :( Because I have finally managed to get the BETA up and available :) |
Beta Was this translation helpful? Give feedback.
-
Hi @erew123, This is a really fun and good new tts engine can it be added to Alltalk? https://github.com/SWivid/F5-TTS
|
Beta Was this translation helpful? Give feedback.
-
For the past month or so, I've been working on version 2 of AllTalk. The plan with version 2 is to try get issues/annoyances/limitations resolved that appear in version 1, as well as expand AllTalk's simplicity, feature set etc.
Screenshots a few posts down
Please note, what you see below is NOT a finished version. Its currently a functional version just so I can work on code in the back end
Things I have achieved so far in the code:
Introduced a Gradio management interface. Because of all the complexity of the settings you will have access to, Gradio is a simple interface for managing all of these settings. However, if you want to turn off loading the gradio interface, you will have the option to do that. Also I aim to make the gradio interface modular, so you can choose what sections of it would load in. This will allow the option to keep things as light weight as possible as/where you need.
Split out model loaders, not only to make model selection/loading easier. AllTalk now discovers all available models to load and passes that down through the API. Because of how I've split this out, this now allows for the possibility of adding (in theory) any TTS engine into AllTalk. You will also be able to set certain default values on a TTS engine by TTS engine basis. I'm 80% of the way through this section of code, at which point I can test adding a few new TTS engines in.
I intend to make a well documented template so that anyone with a little coding experience should be able to set up a new TTS engine within AllTalk.
curl -X POST "http://127.0.0.1:7851/api/tts-generate" -d "text_input=All of this is text spoken by the character. This is text not inside quotes, though that doesnt matter in the slightest" -d "text_filtering=standard" -d "character_voice_gen=female_01.wav" -d "narrator_enabled=false" -d "narrator_voice_gen=male_01.wav" -d "text_not_inside=character" -d "language=en" -d "output_file_name=myoutputfile" -d "output_file_timestamp=true" -d "autoplay=true" -d "autoplay_volume=0.8"
With the new API and model settings breakout, you can just send over the sections you want to the API and the rest of the settings, where not specified, will be pulled from the API default settings and TTS engine settings. So you will now be able to send over a request as simple as the following to have TTS generation:
curl -X POST "http://127.0.0.1:7851/api/tts-generate" -d "text_input=All of this is text spoken by the character. This is text not inside quotes, though that doesn't matter in the slightest"
You will be able to add in any of the other available settings in any combinations you wish, with the missing settings always being pulled from the defaults where they aren't specified. Having this new ability can simplify the development of other products/applications being able to work with AllTalk.
Screenshot a few posts down
tgwui_variablename
meaning that you don't confuse that variable name with another variable name within the script. Im also intending to put debugging code/print-out into AllTalk. The idea is that anyone wanting to update/work with AllTalks code, should be able to read it/understand.There are lots of other little changes/updates. Too numerous to mention. I still have a decent way to go before there is a BETA version of AllTalk to play with. I would estimate anywhere between 100 to 200 hours of work, between coding, documenting, testing etc.
Things I still have to work on:
I am not at this time asking for additional feature ideas OR TTS engines to integrate into V2. I'm trying to get the base code working, stable and clear to understand, with a few additional TTS engines added in,
I hope at some point to release a BETA version and then Im happy to take feedback and work on bugs, along with other possible features/integrations.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions