Vietnamese Massive Text Embedding Benchmark

Installation | Evaluation | Leaderboard | Tasks | Acknowledgement |

Installation

V-MTEB is devloped based on MTEB.

Clone this repo and install as editable

git clone https://github.com/Iambestfeed/V-MTEB.git
cd V-MTEB
pip install -e .

Evaluation

Evaluate reranker

python eval_cross_encoder.py --model_name_or_path BAAI/bge-reranker-base

Evaluate embedding model

With scripts Scripts will be updated soon.
With sentence-transformers

You can use V-MTEB easily in the same way as MTEB.

from mteb import MTEB
from V_MTEB import *
from sentence_transformers import SentenceTransformer

# Define the sentence-transformers model name
model_name = "fill-your-model-name"

model = SentenceTransformer(model_name)
evaluation = MTEB(task_langs=['vie'])
results = evaluation.run(model, output_folder=f"vi_results/{model_name}")

Using a custom model
To evaluate a new model, you can load it via sentence_transformers if it is supported by sentence_transformers. Otherwise, models should be implemented like below (implementing an encode function taking as input a list of sentences, and returning a list of embeddings (embeddings can be np.array, torch.tensor, etc.).):

class MyModel():
    def encode(self, sentences, batch_size=32, **kwargs):
        """ Returns a list of embeddings for the given sentences.
        Args:
            sentences (`List[str]`): List of sentences to encode
            batch_size (`int`): Batch size for the encoding

        Returns:
            `List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences
        """
        pass

model = MyModel()
evaluation = MTEB(tasks=["Vietnamese_Student_Topic"])
evaluation.run(model)

Leaderboard

Will be updated soon.

Tasks

An overview of tasks and datasets available in MTEB-chinese is provided in the following table:

Name	Hub URL	Description	Type	Category	Test #Samples

Acknowledgement

We thank the great tool from Massive Text Embedding Benchmark and the open-source datasets from Vietnam NLP community.

Citation

If you find this repository useful, please consider citation this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
V_MTEB.egg-info		V_MTEB.egg-info
V_MTEB		V_MTEB
example		example
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vietnamese Massive Text Embedding Benchmark

Installation | Evaluation | Leaderboard | Tasks | Acknowledgement |

Installation

Evaluation

Evaluate reranker

Evaluate embedding model

Leaderboard

Tasks

Acknowledgement

Citation

About

Releases 1

Packages

Languages

Iambestfeed/V-MTEB

Folders and files

Latest commit

History

Repository files navigation

Vietnamese Massive Text Embedding Benchmark

Installation | Evaluation | Leaderboard | Tasks | Acknowledgement |

Installation

Evaluation

Evaluate reranker

Evaluate embedding model

Leaderboard

Tasks

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages