thai-parliament-speech-dataset

Segmented speech files from the Thai Parliament meetings. Could be used to train ASR/speech-to-text systems.

TODO

Exclude files that are too short or too long (0.7s-15.0s should be fine), not clearly heard, or have overlapping speakers or other conditions that could confuse the ASR system
Create a new Common Voice-style CSV file for training
Transcribe segmented audio. Right now, this repository only contain audio files and no transcriptions (text) yet.