Release Berkeley Function Calling Leaderboard Updates (v1.2) · ShishirPatil/gorilla

Highlights

🏆 Berkeley Function Calling Leaderboard V3 with Multi-step and Multi-turn function call evaluation

What's Changed

[BFCL] Package the Codebase by @devanshamin in #565
Added python script named as raft_local.py to raft directory to run script completely locally using HF models by @himanshushukla12 in #605
RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval by @cedricvidal in #604
Fix/merge commit #605 and #604 by @ShishirPatil in #609
Fix issue #614: [BFCL] ModuleNotFoundError after commit 70d6722 by @kobe0938 in #615
Fix some bugs in test case prompts/ground truths by @aw632 in #608
[BFCL] Dataset and Possible Answer Fix by @HuanzhiMao in #600
Add Salesforce xLAM model series by @zuxin666 in #616
Update gemini_handler.py to better handle NL+FC model output by @vandyxiaowei in #617
[BFCL] Fix Decoding Issue in Nvidia Handler by @HuanzhiMao in #623
[BFCL] Fix Llama Handler by @HuanzhiMao in #626
[BFCL] add MadeAgents/Hammer-7b handler by @linqq9 in #627
[BFCL] Refactor Model Handler into OSS and Proprietary Components by @devanshamin in #612
[BFCL] Hot Fix to Remove Extra Parameters for NoAPIKeyError by @HuanzhiMao in #636
fix: bug for glm prompt format by @zhangch-ss in #638
[BFCL] Add New Model o1-preview-2024-09-12 and o1-mini-2024-09-12 by @HuanzhiMao in #635
[BFCL] BFCL v3 by @HuanzhiMao in #644
removed unnecessary comments in raft/raft_local.py by @himanshushukla12 in #654
[BFCL] Chore: Separate Change Log. by @HuanzhiMao in #648
[BFCL] Bug Fix inference_single_turn_FC function for base_handler by @HuanzhiMao in #656
[BFCL] Bug Fix parse_nested_value function for model_handler utils by @VishnuSuresh27 in #660
added Phi-3 handlers by @AndyChenYH in #640
Update agent arena frontend and evals by @NithikYekollu in #666
[BFCL] Speed Up Locally-hosted Model Inference Process by @HuanzhiMao in #671
[BFCL] Fix Hanging Inference for OSS Models on GPU Platforms by @HuanzhiMao in #663
[BFCL] Add gemini-1.5-pro-002, gemini-1.5-pro-002-FC, gemini-1.5-pro-001, gemini-1.5-pro-001-FC, gemini-1.5-flash-002, gemini-1.5-flash-002-FC, gemini-1.0-pro-002, gemini-1.0-pro-002-FC by @HuanzhiMao in #658
[BFCL] Add Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct by @HuanzhiMao in #657
[BFCL] Add ToolACE handler for BFCL-v3 by @XuHwang in #653
Add Qwen handler and fix mean_latency calculation error for OSS models by @zhangch-ss in #642
update README.md by @leosun12 in #669
[BFCL] Chore: Various Improvements and Adjustments by @HuanzhiMao in #673
[BFCL] Chore: Refactor File Path Handling and Automate apply_function_credential_config.py by @HuanzhiMao in #675
docs: update README.md by @eltociear in #676
[BFCL-v3] Multi-Turn Possible Answer Order Change by @Fanjia-Yan in #679
update hammer handler and add Hammer2.0 model by @linqq9 in #667
[BFCL] Chore: Improve Multi Turn Error Logs by @HuanzhiMao in #689
Update google-cloud-aiplatform dependency by @jieru-hu in #677
add minicpm3 4b by @Cppowboy in #633
[BFCL-v2] Dataset and Possible Answer Fix by @HuanzhiMao in #661
[BFCL] Add Gemma-2 models by @jacovkim in #696
add a basic bfcl command-line interface by @mattf in #621
Fixing BFCL-v3 multi-turn apps by @virginie-do in #701
[BFCL v1] Update Executable Ground Truth for REST Category by @CharlieJCJ in #708
[BFCL v1] Rephrase Question for Better Clarity for Java & JavaScript Categories by @HuanzhiMao in #709
[BFCL] Add SGLang Backend Support for OSS Local Inference by @hnyls2002 in #587
(typo):I've made some corrections to your repository to improve clarity by @PrathameshSPawar in #713
docs: Centered the Image by @bhargavshirin in #680
[BFCL] Multi Turn Dataset and Possible Answer Fix by @HuanzhiMao in #683
[BFCL] Chore: Separate out Func Doc for Multi-Turn Categories by @HuanzhiMao in #717
[BFCL] Multi Turn Dataset and Possible Answer Fix (Base Category) by @HuanzhiMao in #719
[BFCL] Multi Turn Dataset Fix (Function Doc) by @HuanzhiMao in #722
[BFCL] Multi Turn Dataset Fix (Base Category) by @HuanzhiMao in #723
[BFCL] Multi Turn Pipeline Robustness Patch by @HuanzhiMao in #724
[BFCL] Small typo in variable name in travel_booking.py by @daanaea in #731
[BFCL] Patch #724 by @HuanzhiMao in #730
[BFCL] Multi Turn Dataset Fix (Miss Func & Long Context) by @HuanzhiMao in #728
[BFCL] Multi Turn Dataset Fix (Miss Param) by @HuanzhiMao in #732
[BFCL] Update Eval Metric for Multi Turn Irrelevance Scenarios by @HuanzhiMao in #725
[BFCL] Remove duplicate in eval_runner.py by @ThomasRochefortB in #735
[BFCL] Support Dynamic max_tokens for Locally-Hosted Models by @HuanzhiMao in #712
[BFCL] Refine Evaluation Metric for Multi Turn Categories by @HuanzhiMao in #733
[BFCL] Adding New Model GoGoAgent by @RogueTensor in #720
[BFCL] Chore: Improve Inference Log Readability by @HuanzhiMao in #746
[BFCL Dataset Revamp 1/n] Multi-Turn (Part 1) by @Fanjia-Yan in #740
[BFCL] Robustness Patch for _multi_threaded_inference by @HuanzhiMao in #754
[BFCL] Prompt Caching for Claude Models by @VishnuSuresh27 in #751
[BFCL Dataset Revamp 2/n] Live Dataset Fix (Simple, Parallel, Parallel Multiple) by @Fanjia-Yan in #737
[BFCL Dataset Revamp 3/n] Live Dataset Fix (Multiple) by @Fanjia-Yan in #739
Update google-cloud-aiplatform version to 1.72.0 by @gabrielibagon in #760
[BFCL] Minor Grammatical Corrections to DEFAULT_SYSTEM_PROMPT by @HuanzhiMao in #747
[BFCL] Remove Llama-3.2-3B-Instruct-FC and Llama-3.2-1B-Instruct-FC from Leaderboard by @HuanzhiMao in #749
[BFCL Chore] Supply data_multi_turn.csv for Multi-Turn Evaluation Results by @HuanzhiMao in #762
[BFCL] Remove Workaround Patch for Vertex AI Package by @HuanzhiMao in #761
Add exponential retry logic for gemini models by @gabrielibagon in #764
[BFCL] Remove Duplicate Line in record_cost_latency by @HuanzhiMao in #767
Fix handling of examples with no tools in Gemini by @gabrielibagon in #770
Remove stop condition in gemini retry logic by @gabrielibagon in #769
Skip adding empty content from gemini by @gabrielibagon in #768
[BFCL] Add the option to log to WandB during bfcl evaluate by @ThomasRochefortB in #736
[BFCL] Add claude-3-5-haiku-20241022, claude-3-5-haiku-20241022-FC, claude-3-5-sonnet-20241022, claude-3-5-sonnet-20241022-FC by @HuanzhiMao in #750
[BFCL Dataset Revamp 4/n] Live Irrelevance by @Fanjia-Yan in #763
[BFCL Dataset Revamp 5/n] Multi-Turn Base WrapUp by @Fanjia-Yan in #772
[BFCL] Add Unit Test to Check for Illegal Python Parameter Name by @HuanzhiMao in #777
[BFCL] Dataset and Possible Answer Fix (Live Categories) for Illegal Python Parameter Name by @HuanzhiMao in #778
[BFCL] Add Support for Regeneration, Specific Test Entry IDs, and Custom Directory Locations by @Raymond112514 in #743
[BFCL] some tiny fix in possible_answer by @zhangch-ss in #786
[RAFT] Add link to Azure RAFT Distillation Recipe by @cedricvidal in #758
[BFCL] Add New Model Qwen/Qwen2.5-72B-Instruct by @HuanzhiMao in #787
[BFCL] Add DeepSeek-V2.5, DeepSeek-Coder-V2-Instruct-0724, DeepSeek-Coder-V2-Lite-Instruct, DeepSeek-V2-Chat-0628, DeepSeek-V2-Lite-Chat by @moonlight1431 in #697
Add minicpm3 4b FC model handler by @Cppowboy in #718
[BFCL] Add support for Writer models and Palmyra X 004 by @samjulien in #755
[BFCL Chore] Add @final and @overrides Decorators to Class Methods in Model Handler by @VishnuSuresh27 in #790
[BFCL Chore] Support Multiple Models and Test Category Input for BFCL CLI by @vsvaidya27 in #795
[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler by @HuanzhiMao in #796
[BFCL Chore] Quick fix change of decorators from @overrides to @override by @VishnuSuresh27 in #797
[BFCL Chore] Add Retry Mechanism with Backoff for Rate Limit Handling Across Proprietary Models by @HuanzhiMao in #781
[BFCL] Bug Fix for Execution_Result_Message Construction for Prompt Caching Feature in Claude Handler by @HuanzhiMao in #805
[BFCL Dataset Revamp 7/n] Augmented Multi-turn Dataset Fix by @Fanjia-Yan in #804
[BFCL Dataset Revamp 6/n] Live Relevance Data Fix by @Fanjia-Yan in #789
Add Weaviate APIs to Gorilla API Zoo by @CShorten in #783
[BFCL] Improve Latency Measurement Accuracy and Enable Default State Logging by @HuanzhiMao in #808
[BFCL] Replace 'class' with '_class' to Avoid Function Calling Formatting Error by @Fanjia-Yan in #811
[BFCL] Added Grok Handler by @amitojsingh2022 in #810
[BFCL] Resolve Issue in Gemini Model When No Model Output by @HuanzhiMao in #809
[BFCL] Add Amazon Models nova-pro-v1.0, nova-lite-v1.0, and nova-micro-v1.0 by @HuanzhiMao in #815
[BFCL Chore] Revamp README.md for Clearer Instructions by @HuanzhiMao in #819
[BFCL] Update gpt-4o Snapshot Version from 2024-08-06 to 2024-11-20 by @HuanzhiMao in #822
fix some enum type errors in datasets by @zhangch-ss in #826
Fix Merge Conflict From #826 by @HuanzhiMao in #829
[BFCL Chore] Add Unit Test for Valid Func Doc Format by @HuanzhiMao in #828
update hammer handler and add Hammer2.1 model by @linqq9 in #832
[BFCL] Add New Model Llama-3.3-70B-Instruct, Llama-3.3-70B-Instruct-FC by @HuanzhiMao in #837
[BFCL] Add o1-2024-12-17 and o1-2024-12-17-FC by @HuanzhiMao in #840
Add Cohere Command R7B, replace older Command R+ handler by @harry-cohere in #835
[BFCL Dataset] Ground Truth Error Fix by @Fanjia-Yan in #846
[BFCL] Add Qwen2.5-0.5B-Instruct, Qwen2.5-3B-Instruct, Qwen2.5-14B-Instruct, Qwen2.5-32B-Instruct by @HuanzhiMao in #842
[BFCL] Add New Model watt-tool-8B and watt-tool-70B by @zhanghanduo in #847
[BFCL] Skip Executable Categories When API Keys Missing by @HuanzhiMao in #848
[BFCL] Add gemini-2.0-flash-exp-FC, gemini-2.0-flash-exp, gemini-exp-1206-FC, gemini-exp-1206 by @HuanzhiMao in #843
Check and fix some parameter type errors in possible answers by @zhangch-ss in #838
[BFCL] Use N/A in Score Report for Unevaluated Categories by @HuanzhiMao in #849
[BFCL] possible answer fix: - reigion ->region by @sghyan16 in #852
[BFCL] Add Mistral Local Serving Handler and Add New Model mistralai/Ministral-8B-Instruct-2410 by @HuanzhiMao in #855
[BFCL] Add New Model DeepSeek-V3 by @HuanzhiMao in #857
[BFCL] Rename Directories: proprietary_model->api_inference, oss_model->local_inference for Better Clarity by @HuanzhiMao in #859
[BFCL] Support for pre-existing completion endpoint by @ThomasRochefortB in #864
[BFCL Chore] Ensure Correct Input Format for Eval Checker by @HuanzhiMao in #860

New Contributors

@devanshamin made their first contribution in #565
@himanshushukla12 made their first contribution in #605
@kobe0938 made their first contribution in #615
@aw632 made their first contribution in #608
@linqq9 made their first contribution in #627
@zhangch-ss made their first contribution in #638
@VishnuSuresh27 made their first contribution in #660
@AndyChenYH made their first contribution in #640
@XuHwang made their first contribution in #653
@leosun12 made their first contribution in #669
@jieru-hu made their first contribution in #677
@Cppowboy made their first contribution in #633
@jacovkim made their first contribution in #696
@mattf made their first contribution in #621
@virginie-do made their first contribution in #701
@hnyls2002 made their first contribution in #587
@PrathameshSPawar made their first contribution in #713
@bhargavshirin made their first contribution in #680
@daanaea made their first contribution in #731
@ThomasRochefortB made their first contribution in #735
@RogueTensor made their first contribution in #720
@gabrielibagon made their first contribution in #760
@moonlight1431 made their first contribution in #697
@samjulien made their first contribution in #755
@vsvaidya27 made their first contribution in #795
@CShorten made their first contribution in #783
@amitojsingh2022 made their first contribution in #810
@zhanghanduo made their first contribution in #847
@sghyan16 made their first contribution in #852

Full Changelog: v1.1...v1.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Berkeley Function Calling Leaderboard Updates (v1.2)

Highlights

What's Changed

New Contributors

Contributors