You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have addressed this concern and have updated the scoring logic in src/data/count.py. The current evaluation metric now takes into account both dimensions:
The timing of the model's response
The accuracy of the model's response content
You can find these changes in the latest version of the code. The updated implementation ensures a more comprehensive evaluation of the model's proactive output performance.
hi,您好,
我在阅读您提供的官方代码过程中有一点疑问,想请教一下。
StreamingBench/src/data/count.py文件中,第38-55行的部分,您对proactive output任务的得分进行了统计,但我看到您只计算了模型回答问题的时间范围,并没有核对模型回答的答案内容。请问proactive output任务是只需要校验模型回答的时机,不需要校验模型回答的内容吗?
期待您的回复。
The text was updated successfully, but these errors were encountered: