Support writing generated audio samples to wave files #363

csukuangfj · 2023-10-13T15:29:52Z

Usage:

$ ./build/bin/sherpa-onnx-offline-tts --help
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:PrintUsage:402

Offline text-to-speech with sherpa-onnx

./bin/sherpa-onnx-offline-tts \
 --vits-model /path/to/model.onnx \
 --vits-lexicon /path/to/lexicon.txt \
 --vits-tokens /path/to/tokens.txt
 --output-filename ./generated.wav \
 'some text within single quotes'

It will generate a file ./generated.wav as specified by --output-filename.

You can download a test model from
https://huggingface.co/csukuangfj/vits-ljs

For instance, you can use:
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/vits-ljs.onnx
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/lexicon.txt
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/tokens.txt

./bin/sherpa-onnx-offline-tts \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'

Options:
  --provider                  : Specify a provider to use: cpu, cuda, coreml (string, default = "cpu")
  --debug                     : true to print model information while loading it. (bool, default = false)
  --vits-lexicon              : Path to lexicon.txt for VITS models (string, default = "")
  --output-filename           : Path to save the generated audio (string, default = "./generated.wav")
  --num-threads               : Number of threads to run the neural network (int, default = 1)
  --vits-tokens               : Path to tokens.txt for VITS models (string, default = "")
  --vits-model                : Path to VITS model (string, default = "")

Standard options:
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")

./build/bin/sherpa-onnx-offline-tts   \
  --vits-model=./vits-ljs.onnx   \
  --vits-lexicon=./lexicon.txt   \
  --vits-tokens=./tokens.txt   \
  --output-filename=./generated.wav   \
'liliana, the most beautiful and lovely assistant of our team!'

n$ soxi ./generated.wav

Input File     : './generated.wav'
Channels       : 1
Sample Rate    : 22050
Precision      : 16-bit
Duration       : 00:00:04.52 = 99584 samples ~ 338.721 CDDA sectors
File Size      : 199k
Bit Rate       : 353k
Sample Encoding: 16-bit Signed Integer PCM

$ ls -lh ./generated.wav
-rw-r--r--  1 fangjun  staff   195K Oct 13 23:29 ./generated.wav

$ file ./generated.wav
./generated.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz

generated.mov

csukuangfj added 2 commits October 13, 2023 23:26

Support writing generated audio samples to wave files

cc1b166

fix a typo

ece11ee

csukuangfj merged commit 1ac2232 into k2-fsa:master Oct 13, 2023
134 of 144 checks passed

csukuangfj deleted the wave-writer branch October 13, 2023 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support writing generated audio samples to wave files #363

Support writing generated audio samples to wave files #363

csukuangfj commented Oct 13, 2023 •

edited

Loading

Support writing generated audio samples to wave files #363

Support writing generated audio samples to wave files #363

Conversation

csukuangfj commented Oct 13, 2023 • edited Loading

Usage:

csukuangfj commented Oct 13, 2023 •

edited

Loading