Resources for DisfluencySpeech.
The DisfluencySpeech Dataset is a single-speaker studio-quality labeled English speech dataset with paralanguage. A single speaker recreates nearly 10 hours of expressive utterances from the Switchboard-1 Telephone Speech Corpus (Switchboard), simulating realistic informal conversations. To aid the development of a text-to-speech (TTS) model that is able to predictively synthesise paralanguage from text without such components, we provide three different transcripts at different levels of information removal (removal of non-speech events, removal of non-sentence elements, and removal of false starts).
Read the paper here.
Dataset can be found here.
Benchmark models for the three transcripts can be found here: transcript A, transcript B, transcript C.
All audio files are provided as 22,050 hz .wav files. In the metadata.csv file, 4 different transcripts for each file are provided, each at differing levels of information removal:
- transcript_annotated is a full transcript retaining all non-speech event and disfluency annotations;
- transcript_a contains all textual content recorded, including non-sentence elements and restarts.
- Only non-speech events such as laughter and sighs are removed from transcript;
- transcript_b is transcript_a but with filled pauses, explicit editing terms, and discourse markers removed.
- Coordinating conjunctions and asides are left in, as they are non-sentence elements as well, they are often used to convey meaning; and
- transcript_c is transcript_b but with false starts removed. This is the most minimal transcript.
The training set contains 90% of the data, the validation set contains 5% of the data, and the test set contains 5% of the data.
If you use this dataset, please cite the paper in which it is presented:
@misc{wang2024disfluencyspeechsinglespeakerconversational,
title={DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage},
author={Kyra Wang and Dorien Herremans},
year={2024},
eprint={2406.08820},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2406.08820},
}