Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human evaluation results from Google Sheet not reproducible? #6

Open
nils-hde opened this issue Aug 11, 2023 · 0 comments
Open

Human evaluation results from Google Sheet not reproducible? #6

nils-hde opened this issue Aug 11, 2023 · 0 comments

Comments

@nils-hde
Copy link

I am wondering how the human evaluation scores were computed exactly in this sheet https://docs.google.com/spreadsheets/d/1THEh9MRPWQCC1v4DH5WTw0Gq8TyV9zncWWUL08drtUY/edit#gid=452616194

For reference, here is what we end up (most-right column) with when taking the results from the current master branch (furthermore, Team 7 is missing entirely): https://docs.google.com/spreadsheets/d/1oEtzLyouTR-numPKS4WtMPSQTD6m9IzutXbtGwNNY5A/edit#gid=452616194

The absolute values and also the rankings are different. We compute the average over all generated responses and then multiply by the Detection F1-Score as provided in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant