Scripts for adapting large speech foundation models for Northern Sámi ASR
The pre-trained and fine-tuned models are available at Huggingface Hub.
More details on the models are available in the paper.
The scripts shared in this repository are adapted to the AMD hardware of the LUMI supercomputer. To train a wav2vec 2.0 Base model with continued pre-training, run
sbatch /scripts/pretraining/fairseq_train_multinode_w2v2_B_512gpus.sh
Note: you can simulate 512 GPUs by using k GPUs and adding command line parameters (before --config-dir
)
distributed_training.distributed_world_size=k
+optimization.update_freq='[x]'
where x = 512/k
To fine-tune a wav2vec 2.0 Base model using Huggingface Transformers, run
sbatch scripts/finetuning/huggingface_finetune_multinode_w2v2_B_8gpus_full.sh
For extended fine-tuning (see Section 3.3 in the paper), set EXTENDED_FINETUNING
to True in scripts/finetuning/huggingface_run_speech_recognition_ctc_multigpu.py
(line 620)
If you use our models or scripts, please cite our article as:
@inproceedings{getman24b_interspeech,
title = {Exploring adaptation techniques of large speech foundation models
for low-resource ASR: a case study on Northern Sámi},
author = {Yaroslav Getman and Tamas Grosz and Katri Hiovain-Asikainen and Mikko Kurimo},
year = {2024},
booktitle = {Interspeech 2024},
pages = {2539--2543},
doi = {10.21437/Interspeech.2024-479},
}