Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 465 Bytes

README.md

File metadata and controls

22 lines (13 loc) · 465 Bytes

Real-Time Speech Recognition

PoC's for speech recognition and speaker diarization.

Working PoC's

  • rtsr_en.py: PoC using AssemblyAI WebSocket API (english only)
  • rtsr_de.py: PoC using OpenAI Whisper (de, probably multilingual)

Prototypes

Additionally, a handful of prototypes were created using various technologies:

  • librosa
  • NVIDIA NeMo
  • Tensorflow + Keras Model
  • Mel Spectrogram CNN

Credits