Speech Requirements

Speech: Requirements

Installation Requirements

General

audio_common (ROS)
portaudio19-dev
pyaudio, pynput

RNNoise The version that is being used is here. It is automatically downloaded, built, and added to the code by cmake.

Linux
autoconfig
ar
make

DeepSpeech2 The one used is the implementation in PaddlePaddle by Baidu. It is a specific forked version here that was copied into the repository. Also, because some problems, there is a specific version needed of PaddlePaddle and no later version of TensorFlow can be there. An example of installing the dependencies is here.

python2
paddlepaddle==1.2.1
pkg-config, libflac-dev, libogg-dev, libvorbis-dev, libboost-dev, swig
scipy, resampy, SoundFile, python_speech_features
swig_decoders (A library built by DeepSpeech when installing it; remember to clean the directory after finishing installing it)
tensorflow==1.12 (not required, but to note that all this breaks later TF versions)

Models and Data

DeepSpeech2 There are two models needed: the speech model and the language model; also some warm up data. The two models can be downloaded from the fork's releases and should be put inside DeepSpeech/models/ folder. The instructions for how to download the warm up data and where to put it and the models, can be found in ros_server.py

Azure SpeechToText API The node uses a file with the API-key and the Azure region. The file should be at action_selectors/src/GLOBAL.txt and an example is at action_selectors/src/_GLOBAL_.txt.

Home
System Architecture
Speech
- Requirements
Navigation
- Action Server
- Map Contextualizer
- Base Control
Computers
ROS
- Across multiple machines
Vision
- Object Detection
- Face Recognition
- Clothes Detection
Main Engine and Parser
Robot Structure
Coding
Continuous Integration
Docker Usage
Docs and References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Requirements

Speech: Requirements

Installation Requirements

Models and Data

Clone this wiki locally