We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for this excellent tool firstly!
I'm goind to calculate metrics based on some Chinese datasets, like mMARCO. I downloaded Chinese mMARCO via the link (https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/mmarco.zip). When I opened the queries.jsonl, the content of text is malformed as follows:
queries.jsonl
text
// {\\fn华文楷体\\fs16\\1cHE0E0E0} and {\\fn华文楷体\\fs16\\1cHE0E0E0}"} {"_id": "224811", "text": "{\\fn华文楷体\\fs16\\1cHE0E0E0}萤火虫怎么点亮的 {\\fn华文楷体\\fs16\\1cHE0E0E0}"} // many repeated 每桶 {"_id": "473204", "text": "房客建房每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶每桶"} // meaningless and ungrammatical {"_id": "880877", "text": "何国籍何为姓甘?"} // meaningless and repeated $ {"_id": "319885", "text": "$$$$$$$$$$$$$$$ $$$ $$$ $$$ $$$ $$ $$ $ $$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $"} // ???? {"_id": "1035441", "text": "???????"}
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Thanks for this excellent tool firstly!
I'm goind to calculate metrics based on some Chinese datasets, like mMARCO. I downloaded Chinese mMARCO via the link (https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/mmarco.zip). When I opened the
queries.jsonl
, the content oftext
is malformed as follows:The text was updated successfully, but these errors were encountered: