Skip to content

Speech Requirements

diegocardozo97 edited this page Jun 1, 2020 · 6 revisions

Speech: Requirements

Installation Requirements

General

RNNoise The version that is being used is here. It is automatically downloaded, built, and added to the code by cmake.

  • Linux
  • autoconfig
  • ar
  • make

DeepSpeech2 The one used is the implementation in PaddlePaddle by Baidu. It is a specific forked version here that was copied into the repository. Also, because some problems, there is a specific version needed of PaddlePaddle and no later version of TensorFlow can be there. An example of installing the dependencies is here.

  • python2
  • paddlepaddle==1.2.1
  • pkg-config, libflac-dev, libogg-dev, libvorbis-dev, libboost-dev, swig
  • scipy, resampy, SoundFile, python_speech_features
  • swig_decoders (A library built by DeepSpeech when installing it; remember to clean the directory after finishing installing it)
  • tensorflow==1.12 (not required, but to note that all this breaks later TF versions)

Models and Data

DeepSpeech2 There are two models needed: the speech model and the language model; also some warm up data. The two models can be downloaded from the fork's releases and should be put inside DeepSpeech/models/ folder. The instructions for how to download the warm up data and where to put it and the models, can be found in ros_server.py

Azure SpeechToText API The node uses a file with the API-key and the Azure region. The file should be at action_selectors/src/GLOBAL.txt and an example is at action_selectors/src/_GLOBAL_.txt.