UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language, Cyrillic<>Latin<>NewLatin
The main goal of this paper is to present a state-of-the-art machine transliteration tool between three common scripts used in low-resource Uzbek language: old Cyrillic, currently official Latin, and newly announced New-Latin alphabets, which was created using a combination of rule-based and statistical approaches. The created tool is available as an open-source Python package, as well as a web-based application including a public API.
Feel free to use the tools presented in this project, a paper about more details on creation and usage here.
If you find it useful, plese make sure to cite the paper:
@article{salaev2022machine,
title={A machine transliteration tool between Uzbek alphabets},
author={Salaev, Ulugbek and Kuriyozov, Elmurod and G{\'o}mez-Rodr{\'\i}guez, Carlos},
journal={arXiv preprint arXiv:2205.09578},
year={2022}
}
Feel free to use the tool presented in this project, and if you find it useful, plese make sure to cite the paper here (coming soon...) Demo of the web-based transliteration tool can be seen here.
In this paper, we presented a Python code, a web tool, and an API created for the Uzbek language that performs machine transliteration between two popularly used Cyrillic and Latin alphabets, as well as a newly reformed version of the Latin alphabet, which, according to the governmental decree, all legal texts will have been completely adapted to by year 2023.
pip install UzTransliterator
Source: https://pypi.org/project/UzTransliterator/
Using
from UzTransliterator import UzTransliterator
obj = UzTransliterator.UzTransliterator()
print(obj.transliterate("маткаб", from_="cyr", to="lat"))
Output: maktab
from_='cyr', to='lat'
from_='cyr', to='nlt'
from_='lat', to='cyr'
from_='lat', to='nlt'
from_='nlt', to='cyr'
from_='nlt', to='lat'
https://nlp.urdu.uz/?menu=translit
URL: https://uz-translit.herokuapp.com/translit
Methods: GET, POST
Parametres: text:str
, from_:str
, to:str
Example Request: https://uz-translit.herokuapp.com/translit?text=мактаб&from_=cyr&to=lat
New latin alphabet has some difference than Latin. Main changing is presented in following as format Latin - New Latin:
“G‘, g‘” — “Ḡ, ḡ”
“O‘, o‘” — “Ō, ō”
“Sh, sh” — “Ş, ş”
“Ch, ch” — “Ç ç”
Programming language used:
These are the major libraries used inside Python:
Distributed under the MIT LICENSE. See LICENSE.txt
for more information.