Issues with Caucasian Languages #54

Plkmoi · 2021-04-23T11:44:25Z

The model for Caucasian Languages are having very less accuracy. The languages of Abkhaz, Adyghe and Chechen have quite a lot of data in the internet and also bilingual corpuses. Chechen has a huge amount of Wikipedia data and Abkhaz, Kabardian, Lezgi, Adyghe, Ingush, Lak, Avar has Wikipedia data but the models using these languages have very less accuracy.

jorgtied · 2021-11-04T20:05:18Z

Contributions of clean aligned bilingual data sets would be very helpful to push the performance. Would you like to contribute? I welcome contributions to OPUS and that would feed into the NMT models as well ... Thanks!

jorgtied · 2023-02-05T20:11:24Z

More data would be the most practical thing that could help to improve the quality. I added an issue here to add some more data: Helsinki-NLP/OPUS-ingest#14
If you happen to know more sources of parallel data sets then, please, let me know. Thanks!

AlexJonesNLP · 2024-02-07T22:48:37Z

@Plkmoi It's possible that the Google Translate team is working on adding Caucasian languages in the somewhat near future. In fact, I somehow feel pretty good about that hunch. But I can't say any more for legal reasons at the moment (wink wink).

AlexJonesNLP · 2024-02-08T02:55:05Z

@dotsuzu I do have a contact! What's a good email so I can reach out and cc you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with Caucasian Languages #54

Issues with Caucasian Languages #54

Plkmoi commented Apr 23, 2021 •

edited

Loading

jorgtied commented Nov 4, 2021

jorgtied commented Feb 5, 2023

AlexJonesNLP commented Feb 7, 2024

AlexJonesNLP commented Feb 8, 2024

Issues with Caucasian Languages #54

Issues with Caucasian Languages #54

Comments

Plkmoi commented Apr 23, 2021 • edited Loading

jorgtied commented Nov 4, 2021

jorgtied commented Feb 5, 2023

AlexJonesNLP commented Feb 7, 2024

AlexJonesNLP commented Feb 8, 2024

Plkmoi commented Apr 23, 2021 •

edited

Loading