Skip to content

Commit

Permalink
Merge pull request #130 from aedocw/main
Browse files Browse the repository at this point in the history
Trigger container build
  • Loading branch information
aedocw authored Dec 25, 2023
2 parents 0581d40 + 38983ee commit 024ae19
Show file tree
Hide file tree
Showing 12 changed files with 503 additions and 159 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ var
sdist
develop-eggs
.installed.cfg
.vscode

# Installer logs
pip-log.txt
Expand Down
3 changes: 0 additions & 3 deletions .vscode/settings.json

This file was deleted.

9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@ This script takes an epub (or text file) and reads it to an m4b audiobook file,

I recognize this is not very user friendly, but I wanted to share in case folks thought it was useful. If there are a few more people than myself that find this is useful I will keep working on turning it into something that could be used by someone without dev experience.

**NOTE: BIG UPDATE for XTTS!** The Coqui team released v2 of their XTTS model and the quality is amazing! This latest release includes significant refactoring, and uses streaming inference for XTTS. Suggested usage is to include up to three wav file speaker samples, up to 30 seconds each. Check out the XTTS sample to get an idea of the quality you can expect.
**NOTE:** The Coqui team released their curated XTTS voice models recently, and they sound great. A recent update here
allows you to use these voices. You can generate samples of all the voices by running `python utils/generate-speaker-samples.py`. Check these voices out, they're allmost all amazing sounding! (GPU required)

Example usage: `epub2tts my-book.epub --engine xtts --speaker "Damien Black"`

**NOTE:** The Coqui team released v2 of their XTTS model and the quality is amazing! This latest release includes significant refactoring, and uses streaming inference for XTTS. Suggested usage is to include up to three wav file speaker samples, up to 30 seconds each. Check out the XTTS sample to get an idea of the quality you can expect. Also take a look in the utils directory for notes on finetuning your model for exceptional results. (GPU required)

Example usage: `epub2tts my-book.epub --start 4 --end 20 --xtts shadow-1.wav,shadow-2.wav,shadow-3.wav`

Expand All @@ -17,7 +22,7 @@ Usage:

TEXT: `epub2tts my-book.txt`

To use Coqui XTTS, add: `--xtts <sample-1.wav>,<sample-2.wav>,<sample-3.wav>` (GPU required, slow but sounds amazing!)
To use Coqui XTTS, add: `--xtts <sample-1.wav>,<sample-2.wav>,<sample-3.wav> --language 'en' book.epub` (slow but sounds amazing!)

To use OpenAI TTS, add: `--openai <your API key>` (Use speaker option to specify voice other than onyx: `--speaker shimmer`)

Expand Down
Loading

0 comments on commit 024ae19

Please sign in to comment.