-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with Caucasian Languages #54
Comments
Contributions of clean aligned bilingual data sets would be very helpful to push the performance. Would you like to contribute? I welcome contributions to OPUS and that would feed into the NMT models as well ... Thanks! |
More data would be the most practical thing that could help to improve the quality. I added an issue here to add some more data: Helsinki-NLP/OPUS-ingest#14 |
@Plkmoi It's possible that the Google Translate team is working on adding Caucasian languages in the somewhat near future. In fact, I somehow feel pretty good about that hunch. But I can't say any more for legal reasons at the moment (wink wink). |
@dotsuzu I do have a contact! What's a good email so I can reach out and cc you? |
The model for Caucasian Languages are having very less accuracy. The languages of Abkhaz, Adyghe and Chechen have quite a lot of data in the internet and also bilingual corpuses. Chechen has a huge amount of Wikipedia data and Abkhaz, Kabardian, Lezgi, Adyghe, Ingush, Lak, Avar has Wikipedia data but the models using these languages have very less accuracy.
The text was updated successfully, but these errors were encountered: