-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to evaluate SAIL? #8
Comments
Hi, thanks for asking! Details about evaluation can be found in https://aclanthology.org/2023.findings-emnlp.242.pdf |
Thanks for reply. In section 3.1, the tile is "Automatic Evaluation with GPT-4", but I didn't see the evaluation details. Have you evaluated the results of all the test data? This will require a significant amount of time (and money). |
I see, GPT4 is only used for evaluating with Question-80. More details can be found here but I believe they have upgraded many things since I used it. |
Thanks for reply. I wonder if EM may cause misjudgment during evaluation? This situation seems unavoidable |
Hello, I am curious about how SAIL was evaluated, and was it evaluated using GPT4? Did all benchmark data be used for evaluation?
The text was updated successfully, but these errors were encountered: