-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor benchmark backend code #540
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
Could you please check if this page works after this fix has been applied? https://dev.explainaboard.inspiredco.ai/benchmark?id=globalbench_ner
Hi @neubig , I got this error when loading this benchmark:
This is because this benchmark try to find all ner systems with One way to deal with undefined datasets is to ignore undefined datasets in benchmark. I think we cannot merge systems with undefined datasets because they may be for different tasks and have different metrics. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I am not familiar with this code but the changes look good! I just noticed one small thing regarding the type annotation.
): | ||
sys_info = unwrap(sys_info_tmp) | ||
sys = unwrap(system_tmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the type annotation, it seems that this unwrap
is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is another issue not related to this refactor PR. I'll merge this PR and record this problem in another issue. |
This PR aims to fix issue #533
Previously,
benchmark_db_utils
needs to loadSystemInfo
which depends on SDK versioning. However, since current benchmark only needsSystem
properties such as overall results for the benchmark plots or tables, we can refactor the code to avoid computing and storing additionalSystemInfo
in cache which will expire due to SDK upgrading.