You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The absolute values and also the rankings are different. We compute the average over all generated responses and then multiply by the Detection F1-Score as provided in the paper.
The text was updated successfully, but these errors were encountered:
I am wondering how the human evaluation scores were computed exactly in this sheet https://docs.google.com/spreadsheets/d/1THEh9MRPWQCC1v4DH5WTw0Gq8TyV9zncWWUL08drtUY/edit#gid=452616194
For reference, here is what we end up (most-right column) with when taking the results from the current master branch (furthermore, Team 7 is missing entirely): https://docs.google.com/spreadsheets/d/1oEtzLyouTR-numPKS4WtMPSQTD6m9IzutXbtGwNNY5A/edit#gid=452616194
The absolute values and also the rankings are different. We compute the average over all generated responses and then multiply by the Detection F1-Score as provided in the paper.
The text was updated successfully, but these errors were encountered: