Question about length bias mitigation in pairwise evaluation #20

liamjxu · 2024-11-13T22:25:54Z

Hi, thanks for open-sourcing the code, it looks great.

I have a question about the length bias mitigation part in the WB reward metric.

It seems that the code is shortening both reference output and model output to a fixed word count.

I'm curious about how to implement

converting outcomes of “slightly better/worse” to “tie” if the winner’s response exceeds the loser’s by more than K characters.

as mentioned in the paper. Or am I missing something here?

Thank you in advance!

liamjxu mentioned this issue Nov 28, 2024

Adding WildBench stanford-crfm/helm#3150

Merged

2 tasks

Provide feedback