SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Get dataset

You will have to regenerate the dataset by yourself. We provide easy-to-follow instructions on how to do it using containers.

Download the original dataset

Download the original dataset (Vilar et al., 2006) from here and unzip it in the dataset folder.

David Vilar, Jia Xu, Luis Fernando D’Haro, and Hermann Ney. 2006. Error Analysis of Statistical Machine Translation Output. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).

Generate the dataset

Use Docker or Apptainer (Singularity) to generate the dataset.

Docker

With GPU:

docker run -it --gpus all \
    -v $(pwd):/home/sga \
    --workdir /home/sga \
    --entrypoint /bin/bash \
    ghcr.io/coqui-ai/tts:e5fb0d96279af9dc620add6c2e69992c8abd7f24 \
    ./generate_dataset.sh

Without GPU (this will take a long time):

docker run -it \
    -v $(pwd):/home/sga \
    --workdir /home/sga \
    --entrypoint /bin/bash \
    ghcr.io/coqui-ai/tts-cpu:e5fb0d96279af9dc620add6c2e69992c8abd7f24 \
    ./generate_dataset.sh

Apptainer (aka Singularity)

With GPU:

apptainer exec --nv \
    docker://ghcr.io/coqui-ai/tts:e5fb0d96279af9dc620add6c2e69992c8abd7f24 \
    bash ./generate_dataset.sh

Without GPU (this will take a long time):

apptainer exec --nv \
    docker://ghcr.io/coqui-ai/tts-cpu:e5fb0d96279af9dc620add6c2e69992c8abd7f24 \
    bash ./generate_dataset.sh

Compute AER

Before obtaining the word-AER (Alignment Error Rate) metric, it is necessary to generate contributions maps for each sentence in the Speech Gold Alignment dataset. These maps should be stored in the .pt format, and each file must be named according to the corresponding sample index in the dataset: {idx}.pt. The sample index should be a three-digit number, such as 001, 011, or 111. The map should not contain contributions for the end of sentence token.

python3 speech_aer/aer.py  --test_set_dir /path/to/folder/ \ # path to the Speech Gold Alignment dataset folder.
                --path_to_contribs /path/to/folder/ \ # path to the folder with token to token contributions. 
                --path_to_tokenized_targets /path/to/text/file \ # path to txt file with tokenized target sentences
                --save_alignment_hyp /path/to/text/file \ # path to save the alignments hypotesis.
                --setting s2s \ # s2s or s2t 
                --translation_direction en-de \ # en-de or de-en

Visualize contributions and hard-alignments

The notebook speech_aer/visualize_alignment.ipynb can be used to obtain heatmaps and visualize the word-word contributions and the hard alignments that are used to compute the AER.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
speech_aer		speech_aer
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_dataset.sh		generate_dataset.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Get dataset

Download the original dataset

Generate the dataset

Docker

Apptainer (aka Singularity)

Compute AER

Visualize contributions and hard-alignments

About

Releases

Packages

Languages

License

mt-upc/speechalign

Folders and files

Latest commit

History

Repository files navigation

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Get dataset

Download the original dataset

Generate the dataset

Docker

Apptainer (aka Singularity)

Compute AER

Visualize contributions and hard-alignments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages