This is a project to curate speech dataset for Igbo using the Common Voice platform.
Igbo Radio was founded in 2009 to be an online resource centre for text, audio, video, news, educational and cultural contents in Igbo language. This collaborative project focuses on preserving the Igbo language, which, according to the UNESCO and endangered language scholars, will be extinct in 50 years. This website is designed to be a Learning Management System, and an online language centre where users from all parts of the world can finds news, audio and video contents for both entertainment and educational purposes.
-
17,778 cleaned sentences. They are saved under
igbo_radio_sentences_clean.txt
. -
They were generated by crawling the articles on the Igbo Radio website.
-
The sentences were further filtered based on the validation rules for Igbo.
-
Furthermore, the sentences that did not pass the validation are still documented here (
discarded.txt
) as additional Igbo data for other tasks.
The owner of Igbo Radio, Chidi Nnamdi Igwe, agreed for us to use their content under CC0. The contribution agreement was signed.