-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow aggregated tasks within benchmarks #1231
Comments
I think that can be added average result for each subset for multilingual datasets |
Not entirely sure what is meant @Samoed - should we add it for multilingual datasets? (isn't that there?) |
Yes, the author of the COIR benchmark wanted an average score for the task. I believe this can be done if all subsets of the task are included in the results. This could also be implemented in the results repository. Currently, there are some tasks where the average is calculated. |
This seems like a quick fix (which I am more than happy to add for now), but it does not specify within benchmark specification within mteb how the scores should be aggregated. |
We currently have only one aggregated task (CQGDupstack), however, we can def. imagien more in the future (e.g. for CoIR in embeddings-benchmark/leaderboard#27).
A proposed solution is to use the benchmark (they are already a group of tasks) and then allow a benchmark to be a
list[task | benchmark]
This will require updated to the
MTEB.MTEB
, as well as thecreate_meta
and potentially for CLI.kThis approach should also solve: #1171
The text was updated successfully, but these errors were encountered: