Skip to content

Latest commit

 

History

History
7 lines (6 loc) · 505 Bytes

README.md

File metadata and controls

7 lines (6 loc) · 505 Bytes

thai-parliament-speech-dataset

Segmented speech files from the Thai Parliament meetings. Could be used to train ASR/speech-to-text systems.

TODO

  • Exclude files that are too short or too long (0.7s-15.0s should be fine), not clearly heard, or have overlapping speakers or other conditions that could confuse the ASR system
  • Create a new Common Voice-style CSV file for training
  • Transcribe segmented audio. Right now, this repository only contain audio files and no transcriptions (text) yet.