-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot reproduce the results reported in the Espresso paper #80
Comments
hi @Alex357853 thanks for following our work. Since ESE is under review, we didn't provide many details.
The evaluation script is as follows: BiLLM_START_INDEX=0 CUDA_VISIBLE_DEVICES=0 python eval_ese_nli.py --pooling_strategy avg --model_name_or_path Qwen/Qwen1.5-0.5B --lora_weight WhereIsAI/ese-qwen-0.5b-nli --billm_model_class Qwen2ForCausalLM BTW, you can try to increase the |
BTW, you can have a try using the newly released |
Hi @SeanLee97, thanks for your prompt reply! I am still struggling with the code. I noticed that your trainer can not train using the "last pooling" strategy. The potential bug I found is in Lines 667 to 672 in 191ca1b
For example, after Lines 661 to 666 in 191ca1b
we already get features['attention_mask'] = tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]). However, after lines L667-L692, it becomes tensor([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643]). I think this may affect the model's performance at the beginning, including other pooling strategies. Could you please clarify whether this is an issue in your code? Thank you for your time and help! |
@Alex357853 Thank you for reporting this issue! It is indeed a bug. It uses the pad token to pad attention mask, however, the pad token is I am fixing this issue on the PR: #89 Thank you again! |
Hi, this is a really good and useful codebase. I tried to reproduce the results reported in the paper but failed. I used the code in
README_ESE.md
:However, it only gave the following results:
I also change
--cosine_w 0.
to--cosine_w 1.0
and--ibn_w 10.0
to--ibn_w 35.0
, but the results were even worse.The results reported in your paper are:
If I purely evaluate the
WhereIsAI/UAE-Large-V1
model, the results are:This means fine-tuning gave me worse performance. In addition, I noticed that the more epochs I train, the worse the performance gets.
Besides, I also tried the code in
examples/NLI/README.md
to trainQwen1.5-0.5B
:It gave me an average score of 70.23, whereas the paper reports 82.82.
I wonder whether these scripts are the ones you used to train your model, especially regarding the parameter values. It would be really helpful if you could assist me in reproducing the results so I can use this codebase. I really appreciate your time and help! Thank you!
The text was updated successfully, but these errors were encountered: