-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add google tts for all voice families #73
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add section to config doc on how to setup google cloud in order to run google tts model. I would benefit from that too in order to test the PR :)
https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md
Configuring google's TTS is pain in the a**. Here, I've tried to capture all the required steps but there is >0 chance that I've missed something. When you will be replicating these steps - please take notes, they might be required to create proper documentation. Enable text to speech API
Configure Billing account
Configure API client
|
thank you so much for the detailed instructions; |
You can get around all this by just enabling the Cloud Text-to-Speech API on the API key you are already using for Gemini and passing it in when you instantiate the client.
|
Thank you so much!! Will try it out.
…On Thu, Oct 24, 2024, 10:14 AM Evan Dempsey ***@***.***> wrote:
Configuring google's TTS is pain in the a**. Here, I've tried to capture
all the required steps but there is >0 chance that I've missed something.
When you will be replicating these steps - *please take notes*, they
might be required to create proper documentation.
Enable text to speech API
1. go to "https://console.cloud.google.com/apis/dashboard"
2. select your project (or create one by clicking on project list and
then on "new project"
3. click "+ ENABLE APIS AND SERVICES" at the top of the screen
4. enter "text-to-speech" into the search box
5. click on "Cloud Text-to-Speech API" and then on "ENABLE"
6. you should be here: "
https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project=
..."
Configure Billing account
1. click "..." on the left of the profile picture (top-right corner)
2. select "Payment method" and add payment method
3. again click on the "..." near the profile picture
4. select "Billing account management"
5. enable billing in your project
Configure API client
1. open the terminal
2. install google cloud CLI tools, on ubuntu its sudo snap install
google-cloud-cli
3. create application credentials file by running gcloud auth
application-default login
4. the browser will open - log into your account
5. in the terminal you should see information about the file with your
credentials Credentials saved to file:
/home/mobarski/.config/gcloud/application_default_credentials.json
6. set environment variable GOOGLE_APPLICATION_CREDENTIALS to path of
that file
7. install python package pip3 install google-cloud-texttospeech
You can get around all this by just enabling the Cloud Text-to-Speech API
on the API key you are already using for Gemini and passing it in when you
instantiate the client.
client = texttospeech.TextToSpeechClient(client_options={'api_key':
os.environ['GOOGLE_API_KEY']})
—
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTMY3JRAOH3MVDSKOZ3TNDZ5DXBNAVCNFSM6AAAAABQAFJAOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZVGI3DKMJUGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@evandempsey It didn't work...
Where did you import it from?
|
@souzatharsis Yes, that's the one. You probably need to add the API permission to the key you're using on the Google Cloud console. Go to https://console.cloud.google.com/apis/credentials, click on whatever key you're using for Gemini, then go down to API Restrictions and add the Cloud Text-to-Speech API. |
you are genius! (and GCloud is a maze)
It works! thanks!
<http://linkedin.com/in/tharsissouza>
…On Tue, Nov 5, 2024 at 5:33 PM Evan Dempsey ***@***.***> wrote:
@souzatharsis <https://github.com/souzatharsis> Yes, that's the one.
You probably need to add the API permission to the key you're using on the
Google Cloud console.
Go to https://console.cloud.google.com/apis/credentials, click on
whatever key you're using, then go down to *API Restrictions* and add the *Cloud
Text-to-Speech API*.
—
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTMY3LB22KV2ATDS2TQJNTZ7ETTDAVCNFSM6AAAAABQAFJAOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJYGEYDCOBQGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've managed to integrate podcastfy with Google's multispeaker model and I think we've found what NotebookLM is using... I am curious about your feedback before we merge into main. @brumar @mobarski @evandempsey @lfnovo Should we make this the default TTS model?
https://www.veed.io/view/eb65150f-ef2a-447c-8cb9-43674453ca8f?panel=share
https://www.veed.io/view/4c514532-9311-41a6-8af6-9053e14f7a5b?panel=share |
Ah, you think they're using this? https://cloud.google.com/text-to-speech/docs/create-dialogue-with-multispeakers It sounds great. It sounds a bit more natural than what I was able to achieve in my experiments by burning through Elevenlabs credits. My concern about setting it as the default is the rather irritating GCloud setup you are forcing on people then. But it should definitely be an option. Have you found out the maximum length of audio you can synthesize with this? It doesn't seem to be documented. |
It's short per turn. I had to update the prompt such that speaker text max
lenght is about 333 characters per turn.
The setup was not that painful. Just get the api key, enable the TTS
service and do the last step you mentioned.
It's multiple clicks but once done you have a better quality and cheaper
option to ElevenLabs.
Thanks for your feedback.
…On Wed, Nov 6, 2024, 7:24 AM Evan Dempsey ***@***.***> wrote:
Ah, you think they're using this?
https://cloud.google.com/text-to-speech/docs/create-dialogue-with-multispeakers
It sounds great. It sounds a bit more natural than what I was able to
achieve in my experiments by burning through Elevenlabs credits.
My concern about setting it as the default is the rather irritating GCloud
setup you are forcing on people then. But it should definitely be an option.
Have you found out the maximum length of audio you can synthesize with
this? It doesn't seem to be documented.
—
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADTMY3IYRA43B6QE2BRQWPDZ7HU5RAVCNFSM6AAAAABQAFJAOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZGIZTQMRSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
OK, done. |
Thank you so much for the feedback! Google's Multispeaker and Journey models have been released: v0.4.0. All sample audio in README have been updated to use the new TTS Model. Added some longform podcasts too. I've updated python notebook describing longform podcast + new Google TTS model work: https://github.com/souzatharsis/podcastfy/blob/main/podcastfy.ipynb Would love your feedback! |
I've added the support for google's tts voices (tested: studio, journey, wavenet, neural, standard).
The new method also shows how to render each speaker as a separate audio track and how to combine both tracks into a single output. The tracks are also saved separately to facilitate workflows with animating AI avatars.
I haven't touched pyproject.toml (or requirements.txt) as poetry had issues with pyprojects.
Please feel free to adjust the code as you wish to merge it as in next few days I might have less time for FOSS projects.