Skip to content

Files

Latest commit

 

History

History

igbo-radio

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Igbo-Speech-Dataset (Source of Additional Sentences - Igbo Radio)

This is a project to curate speech dataset for Igbo using the Common Voice platform.

Igbo Radio

Igbo Radio was founded in 2009 to be an online resource centre for text, audio, video, news, educational and cultural contents in Igbo language. This collaborative project focuses on preserving the Igbo language, which, according to the UNESCO and endangered language scholars, will be extinct in 50 years. This website is designed to be a Learning Management System, and an online language centre where users from all parts of the world can finds news, audio and video contents for both entertainment and educational purposes.

Contents from Igbo Radio

  • 17,778 cleaned sentences. They are saved under igbo_radio_sentences_clean.txt.

  • They were generated by crawling the articles on the Igbo Radio website.

  • The sentences were further filtered based on the validation rules for Igbo.

  • Furthermore, the sentences that did not pass the validation are still documented here (discarded.txt) as additional Igbo data for other tasks.

Licence

The owner of Igbo Radio, Chidi Nnamdi Igwe, agreed for us to use their content under CC0. The contribution agreement was signed.