You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
num_common is a count of unique overlapping tokens. When calculating precision, the denominator used is the length of prediction_tokens. prediction_tokens does not seem to consist of only unique tokens.
May I know:
Is there a reason for using unique tokens in the numerator but all tokens in the denominator in the calculation of precision?
if considering set(prediction_tokens) as the denominator will affect its correlation with human judgement? If so, is it less correlated or more?
The text was updated successfully, but these errors were encountered:
Refering to
instruct-qa/instruct_qa/evaluation/faithfulness_metrics.py
Line 528 in 89118ad
num_common
is a count of unique overlapping tokens. When calculating precision, the denominator used is the length ofprediction_tokens
.prediction_tokens
does not seem to consist of only unique tokens.May I know:
set(prediction_tokens)
as the denominator will affect its correlation with human judgement? If so, is it less correlated or more?The text was updated successfully, but these errors were encountered: