Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multilingual corpus of Caucasian languages #14

Open
jorgtied opened this issue Feb 5, 2023 · 2 comments
Open

Add Multilingual corpus of Caucasian languages #14

jorgtied opened this issue Feb 5, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@jorgtied
Copy link
Member

jorgtied commented Feb 5, 2023

Add multilingual corpus available from https://github.com/danielinux7/Multilingual-Parallel-Corpus

@aterribletime
Copy link

Would it be easier if the data were updated in Tatoeba?

@jorgtied
Copy link
Member Author

I tried to import the data but I have some issues with the TSV files. https://github.com/danielinux7/Multilingual-Parallel-Corpus/blob/master/ab-en/libreoffice.tsv has English in the fist column and https://github.com/danielinux7/Multilingual-Parallel-Corpus/blob/master/ab-en/Ab-En-Syn.tsv in the second.

https://github.com/danielinux7/Multilingual-Parallel-Corpus/blob/master/ab-ru/100-text.tsv has only one language and for the rest of the ab-ru files I don't know which column is the Russian one and which one is the Abkhazian.

This makes an import quite difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants