-
Notifications
You must be signed in to change notification settings - Fork 5
Speech Requirements
General
- audio_common (ROS)
- portaudio19-dev
- pyaudio, pynput
RNNoise The version that is being used is here. It is automatically downloaded, built, and added to the code by cmake.
- Linux
- autoconfig
- ar
- make
DeepSpeech2 The one used is the implementation in PaddlePaddle by Baidu. It is a specific forked version here that was copied into the repository. Also, because some problems, there is a specific version needed of PaddlePaddle and no later version of TensorFlow can be there. An example of installing the dependencies is here.
- python2
- paddlepaddle==1.2.1
- pkg-config, libflac-dev, libogg-dev, libvorbis-dev, libboost-dev, swig
- scipy, resampy, SoundFile, python_speech_features
- swig_decoders (A library built by DeepSpeech when installing it; remember to clean the directory after finishing installing it)
- tensorflow==1.12 (not required, but to note that all this breaks later TF versions)
DeepSpeech2 There are two models needed: the speech model and the language model; also some warm up data. The two models can be downloaded from the fork's releases and should be put inside DeepSpeech/models/ folder. The instructions for how to download the warm up data and where to put it and the models, can be found in ros_server.py
Azure SpeechToText API The node uses a file with the API-key and the Azure region. The file should be at action_selectors/src/GLOBAL.txt and an example is at action_selectors/src/_GLOBAL_.txt.