You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to evaluate lognlora on LongBench (https://github.com/THUDM/LongBench) using the checkpoint of LongAlpaca-7B (https://huggingface.co/Yukang/LongAlpaca-7B). I load the model directly in LongBench evaluation benchmark following the same procedure in your repository and set test length in LongBench to be 31500 and use their default prompt template (as it has the context length of 32K). My results are a bit different than reported...
Name
avg.
Single-Doc QA
Multi-Doc QA
Summarization
Few-shot
Synthetic
Code
Report
36.8
28.7
28.1
27.8
63.9
16.7
56.0
Our Reprd.
22.7
14.8
9.5
24.5
41.9
4.9
40.8
I suspect my code for preprocessing (data loading, filtering, matric calculation, etc.) or instruction construction may differ from yours, but I can not find any reference code for such configuration. In this case, could you please share a copy of evaluation code on LongBench? My personal email is [email protected].
Thanks a lot!
The text was updated successfully, but these errors were encountered:
Hi Authors,
Thanks for the great work!
I tried to evaluate lognlora on LongBench (https://github.com/THUDM/LongBench) using the checkpoint of LongAlpaca-7B (https://huggingface.co/Yukang/LongAlpaca-7B). I load the model directly in LongBench evaluation benchmark following the same procedure in your repository and set test length in LongBench to be 31500 and use their default prompt template (as it has the context length of 32K). My results are a bit different than reported...
I suspect my code for preprocessing (data loading, filtering, matric calculation, etc.) or instruction construction may differ from yours, but I can not find any reference code for such configuration. In this case, could you please share a copy of evaluation code on LongBench? My personal email is [email protected].
Thanks a lot!
The text was updated successfully, but these errors were encountered: