Transcripts & Alignments for the (Lessac) Blizzard Challenge 2013 audiobooks

Recent work in Text-To-Speech focuses on generating an expressive and interesting speech. However, results of all the proposed models depends on high quality audio data with transcripts. A lot of works links to audiobooks from the Blizzard Challenge 2013:

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Hierarchical Generative Modeling for Controllable Speech Synthesis
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

The Blizzard Challenge 2013 brought to us a data-set of high quality audiobooks which were generously provided by The Voice Factory and Lessac Technologies. The data-set contains unsegmented single-speaker audiobooks with a total duration of almost two hundreds hours read in a highly expressive maner by Catherine Byers

Hopefuly, this repository do not harm the License for Blizzard 2013 Materials as it does not contain any data provided by the Blizzard Challenge.

Transcripts of the audiobooks can be found here. We used these transcripts despite the fact that the audios and transcripts do not match exactly.

What is in this repository?

This repository contains:

Normalized transcripts splitted on the sentence level in a way which I hope is suitable for the Text-To-Speech task.
Meta-data for segmentation of the original unsegmented audiobooks provided by Blizzard 2013 Challenge.

The following audiobooks were processed: Emma, Mansfield Park, Persuasion, Pride and Prejudice, Sense and Sensibility, The Emerald City of Oz, The Patchwork Girl of Oz, A Little Princess, The Secret Garden, Through the Looking Glass, The Awakening, Silas Marner, A Room with a View, Far from the Madding Crowd, The Scarlet Letter, The Gift of the Magi, Daisy Miller, Washington Square, The Jungle Book, Carmilla, Black Beauty, Treasure Island, Ethan Frome, Madame de Treymes, Summer.

Total duration of these audibooks should be 149.4 hours.

Getting the data

Visit this website to get a research license agreement. You will be granted in few days.
Download the original audiobooks from there. You should have a file with this name:
```
Lessac_Blizzard2013_CatherineByers_train.tar.bz2
```

Clone this repository:

git clone https://github.com/Tomiinek/Blizzard2013_Segmentation.git target/directory/ && cd target/directory/

Extract the downloaded audiobooks into the repository directory:

tar xjfv ~Downloads/Lessac_Blizzard2013_CatherineByers_train.tar.bz2

Remove redundant audiobooks and acomodate folder structure, convert .mp3 to .wav:
```
./extract.sh .
```
Run segmentation (this will take a long time):
```
./segmentation.sh .
```
Now, you should see folders transcripts and segments with all the transcripts and segmented recordings. The file all.txt contains a subset of utterances (spanning 122.8 hours) which you can find in the transcripts directory (without items with weird ratio between segment duration and utterance length) together with their phonemized variants (IPA) and links to corresponding audio files. A sample row:
```
000011|segments/a_little_princess/01-000012.wav|What is it, darling?|wɒt ɪz ɪt, dɑːlɪŋ?
```

Distribution of the data

Uterrance length distrubition:

Utterance length vs segment duration:

Pitfalls

The Aeneas forced aligner sometimes does not provide a correct alignment. Thus there might be some short utterances (like "Yes?") included also in the audio segment of the following utterance. So please consider removing these very short utterances together with their immediate succesors.

Acknowledgements

Alignments were obtained using Aeneas aligner and ffmpeg; phonemized variants were acquired using Phonemizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Transcripts & Alignments for the (Lessac) Blizzard Challenge 2013 audiobooks

What is in this repository?

Getting the data

Distribution of the data

Pitfalls

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Transcripts & Alignments for the (Lessac) Blizzard Challenge 2013 audiobooks

What is in this repository?

Getting the data

Distribution of the data

Pitfalls

Acknowledgements