Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Caucasian Languages #54

Open
Plkmoi opened this issue Apr 23, 2021 · 4 comments
Open

Issues with Caucasian Languages #54

Plkmoi opened this issue Apr 23, 2021 · 4 comments

Comments

@Plkmoi
Copy link

Plkmoi commented Apr 23, 2021

The model for Caucasian Languages are having very less accuracy. The languages of Abkhaz, Adyghe and Chechen have quite a lot of data in the internet and also bilingual corpuses. Chechen has a huge amount of Wikipedia data and Abkhaz, Kabardian, Lezgi, Adyghe, Ingush, Lak, Avar has Wikipedia data but the models using these languages have very less accuracy.

@jorgtied
Copy link
Member

jorgtied commented Nov 4, 2021

Contributions of clean aligned bilingual data sets would be very helpful to push the performance. Would you like to contribute? I welcome contributions to OPUS and that would feed into the NMT models as well ... Thanks!

@jorgtied
Copy link
Member

jorgtied commented Feb 5, 2023

More data would be the most practical thing that could help to improve the quality. I added an issue here to add some more data: Helsinki-NLP/OPUS-ingest#14
If you happen to know more sources of parallel data sets then, please, let me know. Thanks!

@AlexJonesNLP
Copy link

@Plkmoi It's possible that the Google Translate team is working on adding Caucasian languages in the somewhat near future. In fact, I somehow feel pretty good about that hunch. But I can't say any more for legal reasons at the moment (wink wink).

@AlexJonesNLP
Copy link

@dotsuzu I do have a contact! What's a good email so I can reach out and cc you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants