Skip to content

Iambestfeed/V-MTEB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vietnamese Massive Text Embedding Benchmark

Build

Installation

V-MTEB is devloped based on MTEB.

Clone this repo and install as editable

git clone https://github.com/Iambestfeed/V-MTEB.git
cd V-MTEB
pip install -e .

Evaluation

Evaluate reranker

python eval_cross_encoder.py --model_name_or_path BAAI/bge-reranker-base

Evaluate embedding model

  • With scripts Scripts will be updated soon.

  • With sentence-transformers

You can use V-MTEB easily in the same way as MTEB.

from mteb import MTEB
from V_MTEB import *
from sentence_transformers import SentenceTransformer

# Define the sentence-transformers model name
model_name = "fill-your-model-name"

model = SentenceTransformer(model_name)
evaluation = MTEB(task_langs=['vie'])
results = evaluation.run(model, output_folder=f"vi_results/{model_name}")
  • Using a custom model
    To evaluate a new model, you can load it via sentence_transformers if it is supported by sentence_transformers. Otherwise, models should be implemented like below (implementing an encode function taking as input a list of sentences, and returning a list of embeddings (embeddings can be np.array, torch.tensor, etc.).):
class MyModel():
    def encode(self, sentences, batch_size=32, **kwargs):
        """ Returns a list of embeddings for the given sentences.
        Args:
            sentences (`List[str]`): List of sentences to encode
            batch_size (`int`): Batch size for the encoding

        Returns:
            `List[np.ndarray]` or `List[tensor]`: List of embeddings for the given sentences
        """
        pass

model = MyModel()
evaluation = MTEB(tasks=["Vietnamese_Student_Topic"])
evaluation.run(model)

Leaderboard

Will be updated soon.

Tasks

An overview of tasks and datasets available in MTEB-chinese is provided in the following table:

Name Hub URL Description Type Category Test #Samples

Acknowledgement

We thank the great tool from Massive Text Embedding Benchmark and the open-source datasets from Vietnam NLP community.

Citation

If you find this repository useful, please consider citation this repo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published