Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing generated audio samples to wave files #363

Merged
merged 2 commits into from
Oct 13, 2023

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Oct 13, 2023

Usage:

$ ./build/bin/sherpa-onnx-offline-tts --help
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:PrintUsage:402

Offline text-to-speech with sherpa-onnx

./bin/sherpa-onnx-offline-tts \
 --vits-model /path/to/model.onnx \
 --vits-lexicon /path/to/lexicon.txt \
 --vits-tokens /path/to/tokens.txt
 --output-filename ./generated.wav \
 'some text within single quotes'

It will generate a file ./generated.wav as specified by --output-filename.

You can download a test model from
https://huggingface.co/csukuangfj/vits-ljs

For instance, you can use:
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/vits-ljs.onnx
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/lexicon.txt
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/tokens.txt

./bin/sherpa-onnx-offline-tts \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'

Options:
  --provider                  : Specify a provider to use: cpu, cuda, coreml (string, default = "cpu")
  --debug                     : true to print model information while loading it. (bool, default = false)
  --vits-lexicon              : Path to lexicon.txt for VITS models (string, default = "")
  --output-filename           : Path to save the generated audio (string, default = "./generated.wav")
  --num-threads               : Number of threads to run the neural network (int, default = 1)
  --vits-tokens               : Path to tokens.txt for VITS models (string, default = "")
  --vits-model                : Path to VITS model (string, default = "")

Standard options:
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")

./build/bin/sherpa-onnx-offline-tts   \
  --vits-model=./vits-ljs.onnx   \
  --vits-lexicon=./lexicon.txt   \
  --vits-tokens=./tokens.txt   \
  --output-filename=./generated.wav   \
'liliana, the most beautiful and lovely assistant of our team!'
n$ soxi ./generated.wav

Input File     : './generated.wav'
Channels       : 1
Sample Rate    : 22050
Precision      : 16-bit
Duration       : 00:00:04.52 = 99584 samples ~ 338.721 CDDA sectors
File Size      : 199k
Bit Rate       : 353k
Sample Encoding: 16-bit Signed Integer PCM
$ ls -lh ./generated.wav
-rw-r--r--  1 fangjun  staff   195K Oct 13 23:29 ./generated.wav
$ file ./generated.wav
./generated.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

generated.mov

@csukuangfj csukuangfj merged commit 1ac2232 into k2-fsa:master Oct 13, 2023
134 of 144 checks passed
@csukuangfj csukuangfj deleted the wave-writer branch October 13, 2023 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant