DisfluencySpeech

Resources for DisfluencySpeech.

The DisfluencySpeech Dataset is a single-speaker studio-quality labeled English speech dataset with paralanguage. A single speaker recreates nearly 10 hours of expressive utterances from the Switchboard-1 Telephone Speech Corpus (Switchboard), simulating realistic informal conversations. To aid the development of a text-to-speech (TTS) model that is able to predictively synthesise paralanguage from text without such components, we provide three different transcripts at different levels of information removal (removal of non-speech events, removal of non-sentence elements, and removal of false starts).

Read the paper here.

Dataset can be found here.

Benchmark models for the three transcripts can be found here: transcript A, transcript B, transcript C.

Dataset Details

All audio files are provided as 22,050 hz .wav files. In the metadata.csv file, 4 different transcripts for each file are provided, each at differing levels of information removal:

transcript_annotated is a full transcript retaining all non-speech event and disfluency annotations;
transcript_a contains all textual content recorded, including non-sentence elements and restarts.
Only non-speech events such as laughter and sighs are removed from transcript;
transcript_b is transcript_a but with filled pauses, explicit editing terms, and discourse markers removed.
Coordinating conjunctions and asides are left in, as they are non-sentence elements as well, they are often used to convey meaning; and
transcript_c is transcript_b but with false starts removed. This is the most minimal transcript.

The training set contains 90% of the data, the validation set contains 5% of the data, and the test set contains 5% of the data.

Citation

If you use this dataset, please cite the paper in which it is presented:

@misc{wang2024disfluencyspeechsinglespeakerconversational,
      title={DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage}, 
      author={Kyra Wang and Dorien Herremans},
      year={2024},
      eprint={2406.08820},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2406.08820}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DisfluencySpeech

Dataset Details

Citation

About

Releases

Packages

License

AMAAI-Lab/DisfluencySpeech

Folders and files

Latest commit

History

Repository files navigation

DisfluencySpeech

Dataset Details

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages