You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing the LLM2Vec code and the training methods! I’m interested in using my own dataset for supervised contrastive training, but I noticed that there isn't a specific guide for adapting the training procedure to custom datasets.
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered:
My task is to retrieve sentences with similar styles. I am continuing the training based on McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised.
First, I merged the weights of LLaMA-3, LLM2Vec-Meta-Llama-3-8B-Instruct-mntp, and LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised to obtain a single model.
Key changes made:
train_configs/supervised/MetaLlama3.json
(1) The model_name_or_path was updated to the merged model, and the peft_model_name_or_path was removed.
(2) The dataset name and path were updated.
(3) Other parameters were adjusted as needed.
llm2vec/dataset: A new Python script was created for the dataset (using E5data.py as a template).
(1) A dataset similar to E5Data was created, where each entry contains a query, positive, and negative sample.
(2) The prompt was updated.
(3) Parameters in the class E5Data were modified as necessary.
(4) The data organization method is based on the first option in E5Data.py ("allnli_split2"), as my task involves retrieving similar sentences.
Hi @vaibhavad
Thank you for sharing the LLM2Vec code and the training methods! I’m interested in using my own dataset for supervised contrastive training, but I noticed that there isn't a specific guide for adapting the training procedure to custom datasets.
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: