Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add sentence trimming to OpenAIWrapper #1526

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

yjoonjang
Copy link
Contributor

@yjoonjang
Copy link
Contributor Author

tiktoken library needs to be installed.

Copy link
Collaborator

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide the results of the model run to ensure that the parameters were passed correctly?

mteb/models/openai_models.py Show resolved Hide resolved
mteb/models/openai_models.py Outdated Show resolved Hide resolved
mteb/model_meta.py Outdated Show resolved Hide resolved
@yjoonjang
Copy link
Contributor Author

Could you provide the results of the model run to ensure that the parameters were passed correctly?

Doing some tests right now.

@yjoonjang
Copy link
Contributor Author

My test code for evaluating Ko-StrategyQA (Korean retrieval task) is:

"""Example script for benchmarking all datasets constituting the MTEB Korean leaderboard & average scores"""
from __future__ import annotations

import os
import logging
from multiprocessing import Process, current_process
import torch

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

import mteb
from mteb import MTEB, get_tasks
from mteb.encoder_interface import PromptType
from mteb.models.sentence_transformer_wrapper import SentenceTransformerWrapper
from mteb.models.instruct_wrapper import instruct_wrapper

import argparse
from dotenv import load_dotenv
# from setproctitle import setproctitle
import traceback

load_dotenv()

logging.basicConfig(level=logging.INFO)

logger = logging.getLogger("main")

TASK_LIST_CLASSIFICATION = []

TASK_LIST_CLUSTERING = []

TASK_LIST_PAIR_CLASSIFICATION = []

TASK_LIST_RERANKING = []

TASK_LIST_RETRIEVAL = [
    "Ko-StrategyQA",
    # "AutoRAGRetrieval",
    # "MIRACLRetrieval",
    # "PublicHealthQA",
    # "BelebeleRetrieval",
    # "MrTidyRetrieval",
    # "MultiLongDocRetrieval",
    # "XPQARetrieval"
]

TASK_LIST_STS = []

TASK_LIST = (
    TASK_LIST_CLASSIFICATION
    + TASK_LIST_CLUSTERING
    + TASK_LIST_PAIR_CLASSIFICATION
    + TASK_LIST_RERANKING
    + TASK_LIST_RETRIEVAL
    + TASK_LIST_STS
)

model_names = [
    "openai/text-embedding-3-large", # 8191
]

def evaluate_model(model_name, gpu_id):
    try:
        os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id)
        
        model = None
        if not os.path.exists(model_name):
            if "m2v" in model_name:
                static_embedding = StaticEmbedding.from_model2vec(model_name)
                model = SentenceTransformer(modules=[static_embedding])
            else:
                model = mteb.get_model(model_name)
        else:
            file_name = os.path.join(model_name, "model.safetensors")
            if os.path.exists(file_name):
                if "m2v" in model_name:
                    static_embedding = StaticEmbedding.from_model2vec(model_name)
                    model = SentenceTransformer(modules=[static_embedding])
                else:
                    model = mteb.get_model(model_name)

        if model:
            # setproctitle(f"{model_name}-{gpu_id}")
            print(f"Running task: {TASK_LIST} / {model_name} on GPU {gpu_id} in process {current_process().name}")
            evaluation = MTEB(
                tasks=get_tasks(tasks=TASK_LIST, languages=["kor-Kore", "kor-Hang", "kor_Hang"])
            )
            # 48GB VRAM 기준 적합한 batch sizes
            if "multilingual-e5" in model_name:
                batch_size = 256
            elif "jina" in model_name:
                batch_size = 8
            elif "bge-m3" in model_name:
                batch_size = 32
            elif "gemma2" in model_name:
                batch_size = 256 
            else:
                batch_size = 64

            if args.quantize: # quantized model의 경우
                evaluation.run(
                    model,
                    output_folder=f"/data_x/EMBEDDING/RESULTS/{model_name}-quantized",
                    encode_kwargs={"batch_size": batch_size, "precision": "binary"},
                )
            else:
                evaluation.run(
                    model,
                    output_folder=f"results/{model_name}",
                    encode_kwargs={"batch_size": batch_size},
                )
    except Exception as ex:
        print(ex)
        traceback.print_exc()

if __name__ == "__main__":
    gpu_id = 0
    evaluate_model(model_names[0], gpu_id)

and got these results:

{
        "precision_at_1": 0.68243,
        "precision_at_10": 0.15,
        "precision_at_100": 0.01711,
        "ndcg_at_1": 0.68243,
        "ndcg_at_10": 0.73607,
        "ndcg_at_100": 0.76238,
        "recall_at_1": 0.44298,
        "recall_at_10": 0.80961,
        "recall_at_100": 0.91126,
}

(Others excluded)

@yjoonjang
Copy link
Contributor Author

After deleting changes for ModelMetadata, I tested on AutoRAGRetrieval, which is also a Korean retrieval task.
The results are:

{
        "precision_at_1": 0.58772,
        "precision_at_10": 0.09386,
        "precision_at_100": 0.00991,
        "ndcg_at_1": 0.58772,
        "ndcg_at_10": 0.76549,
        "ndcg_at_100": 0.77714,
        "recall_at_1": 0.58772,
        "recall_at_10": 0.9386,
        "recall_at_100": 0.99123,
}

(Others excluded)

The results shows that the parameters are passed well.

@Samoed
Copy link
Collaborator

Samoed commented Nov 29, 2024

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants