MeloTTS.cpp

This repository offers a C++ implementation of meloTTS, which is a high-quality, multilingual Text-to-Speech (TTS) library released by MyShell.ai that supports English, Chinese (mixed with English), and various other languages. This implementation is fully integrated with OpenVINO, supporting seamless deployment on CPU, GPU, and NPU devices. Currently, this repository only supports Chinese mixed with English. Support for English model is coming next.

Pipeline Design

The pipeline design is largely consistent with the orignal pytorch Version, comprising three models (BERT, TTS, and DeepFilterNet), with DeepFilterNet added as an additional component.

Legend / Terminology

tokenizer and BERT: The tokenizer and BERT model are bert-base-multilingual-uncased for Chinese and bert-base-uncased for English
g2p: Grapheme-to-Phoneme conversion
phones and tones: represented as pinyin with four tones for Chinese and phonemes with stress marks for English
tone_sandi: class used for handling Chinese scenarios, correcting tokenization and phones
DeepFilterNet: used for denoising (background noise introduced by int8 quantization)

Model-Device Compatibility Table

The table below outlines the supported devices for each model:

Model Name	CPU Support	GPU Support	NPU Support
BERT (Preprocessing)	✅	✅	✅
TTS (Inference)	✅	✅	❌
DeepFilterNet (Post-processing)	✅	✅	✅

Setup and Execution Guide

1. Download OpenVINO C++ Package

To download the OpenVINO C++ package for Windows, please refer to the following link: Install OpenVINO for Windows. For OpenVINO 2024.5 on Windows, you can run the command line in the command prompt (cmd).

curl -O https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.5/windows/w_openvino_toolkit_windows_2024.5.0.17288.7975fa5da0c_x86_64.zip --ssl-no-revoke
tar -xvf w_openvino_toolkit_windows_2024.5.0.17288.7975fa5da0c_x86_64.zip

For Linux, you can download the C++ package from this link: Install OpenVINO for Linux. For OpenVINO 2024.5 on Linux, simply download it from https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.5/linux and unzip the package.

For additional versions and more information about OpenVINO, visit the official OpenVINO Toolkit page: OpenVINO Toolkit Overview.

2. Clone the Repository

git lfs install
git clone https://github.com/apinge/MeloTTS.cpp.git

3. Build and Run

3.1 Windows Build and Run

<OpenVINO_DIR>\setupvars.bat
cd MeloTTS.cpp
cmake -S . -B build && cmake --build build --config Release
.\build\Release\meloTTS_ov.exe --model_dir ov_models --input_file inputs.txt  --output_file audio.wav

3.2 Linux Build and Run

source <OpenVINO_DIR>/setupvars.sh
cd MeloTTS.cpp 
cmake -S . -B build && cmake --build build --config Release
./build/meloTTS_ov --model_dir ov_models --input_file inputs.txt --output_file audio.wav

3.3 Enabling and Disabling DeepFilterNet

DeepFilterNet functionality is currently supported only on Windows and is used to filter noise from int8 quantized models. By default, it is enabled, but you can enable or disable it during the CMake stage using the -DUSE_DEEPFILTERNET option.

For example, to disable the feature, you can use the following line during the CMake generation process:

cmake -S . -B build -DUSE_DEEPFILTERNET=OFF

For more information, please refer to DeepFilterNet.cpp.

4. Arguments Description

You can use run_tts.bat or run_tts.sh as sample scripts to run the models. Below are the meanings of all the arguments you can use with these scripts:

--model_dir: Specifies the folder containing the model files, dictionary files, and third-party resource files, which is ov_models folder within the repo. You may need to adjust the relative path based on your current working directory.
--tts_device: Specifies the OpenVINO device to be used for the TTS model. Supported devices include CPU and GPU (default: CPU).
--bert_device: Specifies the OpenVINO device to be used for the BERT model. Supported devices include CPU, GPU, and NPU (default: CPU).
--nf_device: Specifies the OpenVINO device to be used for the DeepfilterNet model. Supported devices include CPU, GPU, and NPU (default: CPU).
--input_file: Specifies the input text file to be processed. Make sure that the text is in UTF-8 format.
--output_file: Specifies the output *.wav audio file to be generated.
--speed: Specifies the speed of output audio. The default is 1.0.
--quantize: Indicates whether to use an int8 quantized model. The default is false, meaning an fp16 model is used by default.
--disable_bert: Indicates whether to disable the BERT model inference. The default is false.
--disable_nf: Indicates whether to disable the DeepfilterNet model inference (default: false).
--language: Specifies the language for TTS. The default language is Chinese (ZH).

NPU Device Support

The BERT and DeepFilterNet models in the pipeline support NPU as the inference device, utilizing the integrated NPUs in Meteor Lake and Lunar Lake.

Below are the methods to enable this feature and the usage details:

Click here to expand/collapse content

How to Build

-DUSE_BERT_NPU=ON

cmake -DUSE_BERT_NPU=ON -B build -S .

How to Set Arguments

--bert_device NPU

--nf_device NPU

build\Release\meloTTS_ov.exe --bert_device NPU --nf_device NPU --model_dir ov_models --input_file inputs.txt  --output_file audio.wav

Supported Versions

Operating System: Windows, Linux
CPU Architecture: Metor Lake, Lunar Lake, and most Intel CPUs
GPU Architecture: Intel® Arc™ Graphics (Intel Xe, including iGPU)
NPU Architecture: NPU 4, NPU in Meteor Lake or Lunar Lake
OpenVINO Version: >=2024.4
C++ Version: >=C++20

If you're using an AI PC notebook with Windows, GPU and NPU drivers are typically pre-installed. However, Linux users or Windows users who prefer to update to the latest drivers should follow the guidelines below:

For GPU: If using GPU, please refer to Configurations for Intel® Processor Graphics (GPU) with OpenVINO™ to install the GPU driver.
For NPU: If using NPU, please refer to NPU Device to ensure the NPU driver is correctly installed.

Note that all the drivers differs between Windows and Linux, so make sure to follow the instructions for your specific operating system.

Future Development Plan

Here are some features and improvements planned for future releases:

Add English language TTS support:
- Enable English text-to-speech (TTS) functionality, but tokenization for English language input is not yet implemented.
Enhancing Quality in Quantized TTS Models:
- The current INT8 quantized model exhibits slight background noise. As a workaround, we integrated DeepFilterNet for post-processing. Moving forward, we aim to address the noise issue more effectively by the quantization techniques.

Python Version

The Python version of this repository (MeloTTS integrated with OpenVINO) is provided in MeloTTS-OV. The Python version includes methods to convert the model into OpenVINO IR.

Third-Party Code

This repository includes third-party code and libraries for Chinese word segmentation and pinyin processing.

cppjieba
- A Chinese text segmentation library.
cppinyin
- A C++ library supporting conversion between Chinese characters and pinyin
libtorch
- Used to integrate DeepFilterNet

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
benchmark		benchmark
demo_audio		demo_audio
experimental		experimental
images		images
ov_models		ov_models
scripts		scripts
src		src
tests		tests
thirdParty		thirdParty
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
README.zh-TW.md		README.zh-TW.md
inputs.txt		inputs.txt
melo.cpp		melo.cpp
run_tts.bat		run_tts.bat
run_tts.sh		run_tts.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeloTTS.cpp

Pipeline Design

Legend / Terminology

Model-Device Compatibility Table

Setup and Execution Guide

1. Download OpenVINO C++ Package

2. Clone the Repository

3. Build and Run

3.1 Windows Build and Run

3.2 Linux Build and Run

3.3 Enabling and Disabling DeepFilterNet

4. Arguments Description

NPU Device Support

Supported Versions

Future Development Plan

Python Version

Third-Party Code

About

Releases

Packages

Contributors 3

Languages

License

apinge/MeloTTS.cpp

Folders and files

Latest commit

History

Repository files navigation

MeloTTS.cpp

Pipeline Design

Legend / Terminology

Model-Device Compatibility Table

Setup and Execution Guide

1. Download OpenVINO C++ Package

2. Clone the Repository

3. Build and Run

3.1 Windows Build and Run

3.2 Linux Build and Run

3.3 Enabling and Disabling DeepFilterNet

4. Arguments Description

NPU Device Support

Supported Versions

Future Development Plan

Python Version

Third-Party Code

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages