Speech-based GPT-3 assistant, implemented using Deepgram for speech-to-text and ElevenLabs for text-to-speech.
Google Text-to-Speech (gTTS) is also supported as a faster, free, local alternative to ElevenLabs.
Python 3.7 or higher is required.
- Install dependencies:
pip install openai deepgram-sdk python-dotenv pydub requests gTTS
-
Create a
.env
file in the root directory with your API keys for OpenAI, ElevenLabs, and Deepgram, as well as the ElevenLabs voice ID you want to use for the assistant. Refer to.env.example
for an example. -
Start the assistant:
python assistant_driver.py
Google Text-to-Speech (gTTS) does not have as high-quality voices as ElevenLabs, but it is free, faster, and can run locally.
You can modify assistant_driver.py
to import the text_to_speech
method from text_to_speech/google.py
instead of text_to_speech/elevenlabs.py
. For example:
# from text_to_speech.elevenlabs import text_to_speech
from text_to_speech.google import text_to_speech
The speech recongition implementation was modified from @saharmor. I referred to Deepgram's documentation for help implementing speech-to-text. I referred to ElevenLabs' documentation for help setting up text-to-speech. I also used ChatGPT to help me generate and refactor code.