feat: first batch of results for the MTEB(Medical)
benchmark
#55
+98,825
−3,228
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As a follow up to embeddings-benchmark/mteb#1459, this PR contains the results for a list of 15 open source models in the new MTEB(Medical) benchmark.
The models included are:
name: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
revision: "bf3bf13ab40c3157080a7ab344c831b9ad18b5eb"
name: "BAAI/bge-small-en-v1.5"
revision: "5c38ec7c405ec4b44b94cc5a9bb96e735b38267a"
name: "BAAI/bge-base-en-v1.5"
revision: "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a"
name: "BAAI/bge-large-en-v1.5"
revision: "d4aa6901d3a41ba39fb536a557fa166f842b0e09"
name: "intfloat/multilingual-e5-small"
revision: "fd1525a9fd15316a2d503bf26ab031a61d056e98"
name: "intfloat/multilingual-e5-base"
revision: "d13f1b27baf31030b7fd040960d60d909913633f"
name: "intfloat/multilingual-e5-large"
revision: "ab10c1a7f42e74530fe7ae5be82e6d4f11a719eb"
name: "Alibaba-NLP/gte-multilingual-base"
revision: "7fc06782350c1a83f88b15dd4b38ef853d3b8503"
name: "jinaai/jina-embeddings-v3"
revision: "215a6e121fa0183376388ac6b1ae230326bfeaed"
name: "Snowflake/snowflake-arctic-embed-m-v1.5"
revision: "97eab2e17fcb7ccb8bb94d6e547898fa1a6a0f47"
name: "mixedbread-ai/mxbai-embed-large-v1"
revision: "990580e27d329c7408b3741ecff85876e128e203"
name: "abhinand/MedEmbed-small-v0.1"
revision: "40a5850d046cfdb56154e332b4d7099b63e8d50e"
name: "abhinand/MedEmbed-base-v0.1"
revision: "7a90c50263f620dff743eb9794b89a42bfc5d765"
name: "abhinand/MedEmbed-large-v0.1"
revision: "e621837c7904456dc37d689f97e654424de62318"
name: "nvidia/NV-Embed-v2". # Using the code in this PR
revision: "7604d305b621f14095a1aa23d351674c2859553a"
We also plan to add the following models once the inconsistencies are solved since we also noticed strange results for them:
name: "Alibaba-NLP/gte-Qwen2-1.5B-instruct"
revision: "3276994ba02b26841920728d1adcf115473c88e9"
name: "Alibaba-NLP/gte-Qwen2-7B-instruct"
revision: "e26182b2122f4435e8b3ebecbf363990f409b45b"
Finally, we added a
bm25s
baseline for the retrieval tasks, although there is an issue with clustering and reranking tasks at the moment.My colleague @olivierr42 will take it from here since I will not be available next week.
Feel free to suggest other interesting models and we'll happily run them too 💪