Multilingual Speech Recognition for Indonesian Languages

Introduction

Automatic Speech Recognition (ASR) enables the recognition and translation of spoken language into text. Typically the ASR Model is trained and used for a specific language. However, Indonesia has more than 700 spoken languages. It is not practicable to provide a speech recognition model for each language.

Therefore, we want to develop a multilingual speech recognition model that can at least support some of the main Indonesian languages without sacrificing model performance for each language.

Objectives

We want to develop and build a multilingual speech recognition model with the Indonesian, Javanese, and Sundanese datasets. The model should perform well in all these three languages. We also train monolingual models for comparison purposes.

Methods

We used the following speech datasets for the training/finetuning:

We used Wav2vec 2.0, a framework for self-supervised learning of speech representations which is now state of the art on the Librispeech benchmark for noisy speech, for Indonesia, Javanese and Sundanese language.

We trained a multilingual Wav2vec 2.0 model with the three languages combined for 200 epochs. We also trained three Wav2vec 2.0 models with a single language for Indonesian, Java, and Sundanese, each for 200 epochs.

Results and Comparison

We built a multilingual Speech Recognition model and publish it as open source model. We also provide a live demo to test the model.

Following is the comparison of the models and the list of its performance evaluation:

The Models Comparison

The following figure is the model comparison by Word Error Rate (WER) for the Test split of Indonesian Common Voice 6.1 (less is better)

Without Language Model

With Language Model

Lastly, we integrated a language model into our speech recognition pipeline, which reduces the WER from 11.57% to 4.27% on the Test split of Indonesian Common Voice 6.1. We also evaluated the performance of Google Speech To Text, its WER for the Test split of Indonesian Common Voice 6.1 is 9.22%.

The detail of the performance evaluation

The performance evaluation can be found here

Conclusion

The experiment shows that the multilingual model can perform on par with a model trained on a single language; the Word Error Rate (WER) difference is maximal 0.6 absolute percent. We also trained the multilingual model with more epochs, and it outperforms the monolingual model.
The monolingual model performs very well in the language we trained for but poorly in other languages.
The multilingual speech recognition model overcomes the need to have a separate model for each language in Indonesia. Therefore, it significantly reduces hardware resources and simplifies the model deployment.

Future Works

We plan following for the future:

Training the model with more data and more Indonesian languages.
~~Integrating Language Model to reduce the WER~~
Compressing the model size for speeding up the inferencing time and reducing hardware resources
Developing real-time speech recognition based on this multilingual model.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
images		images
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
evaluation.md		evaluation.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Speech Recognition for Indonesian Languages

Introduction

Objectives

Methods

Results and Comparison

The Models Comparison

Without Language Model

With Language Model

The detail of the performance evaluation

Conclusion

Future Works

About

Releases

Packages

Languages

License

indonesian-nlp/multilingual-asr

Folders and files

Latest commit

History

Repository files navigation

Multilingual Speech Recognition for Indonesian Languages

Introduction

Objectives

Methods

Results and Comparison

The Models Comparison

Without Language Model

With Language Model

The detail of the performance evaluation

Conclusion

Future Works

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages