Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 2.68 KB

README.md

File metadata and controls

55 lines (35 loc) · 2.68 KB

Cross Lingual Word Embeddings for Turkic Languages

This repository consists of language resources reported on a paper wih the same title at LREC2020 paper by Elmurod Kuriyozov, Yerai Doval and Carlos Gomez-Rodriguez. The paper itslef is here.

If you use it for your research, please make sure to cite it as follows:

@inproceedings{kuriyozov2020cross,
  title={Cross-Lingual Word Embeddings for Turkic Languages},
  author={Kuriyozov, Elmurod and Doval, Yerai and G{\'o}mez-Rodr{\'\i}guez, Carlos},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={4054--4062},
  year={2020}
}

Bilingual dictionaries

1. From available sources

There are dictionaries obtained from existing resources for Turkish-English and Uzbek-English (Kazakh-English reported at the paper cannot be shared due to licence issues).

Turkish-English dictionary was obtained from MUSE Uzbek-English dictionary was obtained from The Uzbek Glossary Kazakh-English dictionary file cannot be shared diractly, but can be obtained from The Leneshmid Dictionary

2. Dictionaries obtained using Google Translate

There are dictionaries from five Turkic languages: Turkish, Uzbek, Azeri, Kazakh and Kyrgyz to English using Google Translate API. Sizes (in words): Turkish - English: 9350 Uzbek - English: 7958 Azeri - English: 7422 Kazakh - English: 8454 Kyrgyz - English: 7974

Word-embeddings

Pre-trained word embeddings for these five Turkic languages are available already, one of them we used for our experiment is FastText

Apart from that, we trained our own word embeddings with skip-gram model of FastText using Large Corpora of Turkic LanguagesBaisa et al. 2012

All pre-trained word embeddings can be downloaded from links below.

Turkish FastText skip-gram 300d word-embeddings - Download

Uzbek FastText skip-gram 300d word-embeddings - Download

Azeri FastText skip-gram 300d word-embeddings - Download

Kazakh FastText skip-gram 300d word-embeddings - Download

Kyrgyz FastText skip-gram 300d word-embeddings - Download