-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add sentence trimming to OpenAIWrapper #1526
base: main
Are you sure you want to change the base?
Conversation
tiktoken library needs to be installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide the results of the model run to ensure that the parameters were passed correctly?
Doing some tests right now. |
My test code for evaluating Ko-StrategyQA (Korean retrieval task) is: """Example script for benchmarking all datasets constituting the MTEB Korean leaderboard & average scores"""
from __future__ import annotations
import os
import logging
from multiprocessing import Process, current_process
import torch
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding
import mteb
from mteb import MTEB, get_tasks
from mteb.encoder_interface import PromptType
from mteb.models.sentence_transformer_wrapper import SentenceTransformerWrapper
from mteb.models.instruct_wrapper import instruct_wrapper
import argparse
from dotenv import load_dotenv
# from setproctitle import setproctitle
import traceback
load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("main")
TASK_LIST_CLASSIFICATION = []
TASK_LIST_CLUSTERING = []
TASK_LIST_PAIR_CLASSIFICATION = []
TASK_LIST_RERANKING = []
TASK_LIST_RETRIEVAL = [
"Ko-StrategyQA",
# "AutoRAGRetrieval",
# "MIRACLRetrieval",
# "PublicHealthQA",
# "BelebeleRetrieval",
# "MrTidyRetrieval",
# "MultiLongDocRetrieval",
# "XPQARetrieval"
]
TASK_LIST_STS = []
TASK_LIST = (
TASK_LIST_CLASSIFICATION
+ TASK_LIST_CLUSTERING
+ TASK_LIST_PAIR_CLASSIFICATION
+ TASK_LIST_RERANKING
+ TASK_LIST_RETRIEVAL
+ TASK_LIST_STS
)
model_names = [
"openai/text-embedding-3-large", # 8191
]
def evaluate_model(model_name, gpu_id):
try:
os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id)
model = None
if not os.path.exists(model_name):
if "m2v" in model_name:
static_embedding = StaticEmbedding.from_model2vec(model_name)
model = SentenceTransformer(modules=[static_embedding])
else:
model = mteb.get_model(model_name)
else:
file_name = os.path.join(model_name, "model.safetensors")
if os.path.exists(file_name):
if "m2v" in model_name:
static_embedding = StaticEmbedding.from_model2vec(model_name)
model = SentenceTransformer(modules=[static_embedding])
else:
model = mteb.get_model(model_name)
if model:
# setproctitle(f"{model_name}-{gpu_id}")
print(f"Running task: {TASK_LIST} / {model_name} on GPU {gpu_id} in process {current_process().name}")
evaluation = MTEB(
tasks=get_tasks(tasks=TASK_LIST, languages=["kor-Kore", "kor-Hang", "kor_Hang"])
)
# 48GB VRAM 기준 적합한 batch sizes
if "multilingual-e5" in model_name:
batch_size = 256
elif "jina" in model_name:
batch_size = 8
elif "bge-m3" in model_name:
batch_size = 32
elif "gemma2" in model_name:
batch_size = 256
else:
batch_size = 64
if args.quantize: # quantized model의 경우
evaluation.run(
model,
output_folder=f"/data_x/EMBEDDING/RESULTS/{model_name}-quantized",
encode_kwargs={"batch_size": batch_size, "precision": "binary"},
)
else:
evaluation.run(
model,
output_folder=f"results/{model_name}",
encode_kwargs={"batch_size": batch_size},
)
except Exception as ex:
print(ex)
traceback.print_exc()
if __name__ == "__main__":
gpu_id = 0
evaluate_model(model_names[0], gpu_id) and got these results: {
"precision_at_1": 0.68243,
"precision_at_10": 0.15,
"precision_at_100": 0.01711,
"ndcg_at_1": 0.68243,
"ndcg_at_10": 0.73607,
"ndcg_at_100": 0.76238,
"recall_at_1": 0.44298,
"recall_at_10": 0.80961,
"recall_at_100": 0.91126,
} (Others excluded) |
After deleting changes for ModelMetadata, I tested on AutoRAGRetrieval, which is also a Korean retrieval task. {
"precision_at_1": 0.58772,
"precision_at_10": 0.09386,
"precision_at_100": 0.00991,
"ndcg_at_1": 0.58772,
"ndcg_at_10": 0.76549,
"ndcg_at_100": 0.77714,
"recall_at_1": 0.58772,
"recall_at_10": 0.9386,
"recall_at_100": 0.99123,
} (Others excluded) The results shows that the parameters are passed well. |
Great! |
#1494
@Samoed