Highlights
🏆 Berkeley Function Calling Leaderboard V3 with Multi-step and Multi-turn function call evaluation
What's Changed
- [BFCL] Package the Codebase by @devanshamin in #565
- Added python script named as raft_local.py to raft directory to run script completely locally using HF models by @himanshushukla12 in #605
- RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval by @cedricvidal in #604
- Fix/merge commit #605 and #604 by @ShishirPatil in #609
- Fix issue #614: [BFCL] ModuleNotFoundError after commit 70d6722 by @kobe0938 in #615
- Fix some bugs in test case prompts/ground truths by @aw632 in #608
- [BFCL] Dataset and Possible Answer Fix by @HuanzhiMao in #600
- Add Salesforce xLAM model series by @zuxin666 in #616
- Update gemini_handler.py to better handle NL+FC model output by @vandyxiaowei in #617
- [BFCL] Fix Decoding Issue in Nvidia Handler by @HuanzhiMao in #623
- [BFCL] Fix Llama Handler by @HuanzhiMao in #626
- [BFCL] add MadeAgents/Hammer-7b handler by @linqq9 in #627
- [BFCL] Refactor Model Handler into OSS and Proprietary Components by @devanshamin in #612
- [BFCL] Hot Fix to Remove Extra Parameters for NoAPIKeyError by @HuanzhiMao in #636
- fix: bug for glm prompt format by @zhangch-ss in #638
- [BFCL] Add New Model
o1-preview-2024-09-12
ando1-mini-2024-09-12
by @HuanzhiMao in #635 - [BFCL] BFCL v3 by @HuanzhiMao in #644
- removed unnecessary comments in raft/raft_local.py by @himanshushukla12 in #654
- [BFCL] Chore: Separate Change Log. by @HuanzhiMao in #648
- [BFCL] Bug Fix inference_single_turn_FC function for base_handler by @HuanzhiMao in #656
- [BFCL] Bug Fix parse_nested_value function for model_handler utils by @VishnuSuresh27 in #660
- added Phi-3 handlers by @AndyChenYH in #640
- Update agent arena frontend and evals by @NithikYekollu in #666
- [BFCL] Speed Up Locally-hosted Model Inference Process by @HuanzhiMao in #671
- [BFCL] Fix Hanging Inference for OSS Models on GPU Platforms by @HuanzhiMao in #663
- [BFCL] Add gemini-1.5-pro-002, gemini-1.5-pro-002-FC, gemini-1.5-pro-001, gemini-1.5-pro-001-FC, gemini-1.5-flash-002, gemini-1.5-flash-002-FC, gemini-1.0-pro-002, gemini-1.0-pro-002-FC by @HuanzhiMao in #658
- [BFCL] Add Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct by @HuanzhiMao in #657
- [BFCL] Add ToolACE handler for BFCL-v3 by @XuHwang in #653
- Add Qwen handler and fix mean_latency calculation error for OSS models by @zhangch-ss in #642
- update README.md by @leosun12 in #669
- [BFCL] Chore: Various Improvements and Adjustments by @HuanzhiMao in #673
- [BFCL] Chore: Refactor File Path Handling and Automate apply_function_credential_config.py by @HuanzhiMao in #675
- docs: update README.md by @eltociear in #676
- [BFCL-v3] Multi-Turn Possible Answer Order Change by @Fanjia-Yan in #679
- update hammer handler and add Hammer2.0 model by @linqq9 in #667
- [BFCL] Chore: Improve Multi Turn Error Logs by @HuanzhiMao in #689
- Update google-cloud-aiplatform dependency by @jieru-hu in #677
- add minicpm3 4b by @Cppowboy in #633
- [BFCL-v2] Dataset and Possible Answer Fix by @HuanzhiMao in #661
- [BFCL] Add Gemma-2 models by @jacovkim in #696
- add a basic bfcl command-line interface by @mattf in #621
- Fixing BFCL-v3 multi-turn apps by @virginie-do in #701
- [BFCL v1] Update Executable Ground Truth for REST Category by @CharlieJCJ in #708
- [BFCL v1] Rephrase Question for Better Clarity for Java & JavaScript Categories by @HuanzhiMao in #709
- [BFCL] Add SGLang Backend Support for OSS Local Inference by @hnyls2002 in #587
- (typo):I've made some corrections to your repository to improve clarity by @PrathameshSPawar in #713
- docs: Centered the Image by @bhargavshirin in #680
- [BFCL] Multi Turn Dataset and Possible Answer Fix by @HuanzhiMao in #683
- [BFCL] Chore: Separate out Func Doc for Multi-Turn Categories by @HuanzhiMao in #717
- [BFCL] Multi Turn Dataset and Possible Answer Fix (Base Category) by @HuanzhiMao in #719
- [BFCL] Multi Turn Dataset Fix (Function Doc) by @HuanzhiMao in #722
- [BFCL] Multi Turn Dataset Fix (Base Category) by @HuanzhiMao in #723
- [BFCL] Multi Turn Pipeline Robustness Patch by @HuanzhiMao in #724
- [BFCL] Small typo in variable name in travel_booking.py by @daanaea in #731
- [BFCL] Patch #724 by @HuanzhiMao in #730
- [BFCL] Multi Turn Dataset Fix (Miss Func & Long Context) by @HuanzhiMao in #728
- [BFCL] Multi Turn Dataset Fix (Miss Param) by @HuanzhiMao in #732
- [BFCL] Update Eval Metric for Multi Turn Irrelevance Scenarios by @HuanzhiMao in #725
- [BFCL] Remove duplicate in eval_runner.py by @ThomasRochefortB in #735
- [BFCL] Support Dynamic max_tokens for Locally-Hosted Models by @HuanzhiMao in #712
- [BFCL] Refine Evaluation Metric for Multi Turn Categories by @HuanzhiMao in #733
- [BFCL] Adding New Model GoGoAgent by @RogueTensor in #720
- [BFCL] Chore: Improve Inference Log Readability by @HuanzhiMao in #746
- [BFCL Dataset Revamp 1/n] Multi-Turn (Part 1) by @Fanjia-Yan in #740
- [BFCL] Robustness Patch for
_multi_threaded_inference
by @HuanzhiMao in #754 - [BFCL] Prompt Caching for Claude Models by @VishnuSuresh27 in #751
- [BFCL Dataset Revamp 2/n] Live Dataset Fix (Simple, Parallel, Parallel Multiple) by @Fanjia-Yan in #737
- [BFCL Dataset Revamp 3/n] Live Dataset Fix (Multiple) by @Fanjia-Yan in #739
- Update google-cloud-aiplatform version to 1.72.0 by @gabrielibagon in #760
- [BFCL] Minor Grammatical Corrections to DEFAULT_SYSTEM_PROMPT by @HuanzhiMao in #747
- [BFCL] Remove
Llama-3.2-3B-Instruct-FC
andLlama-3.2-1B-Instruct-FC
from Leaderboard by @HuanzhiMao in #749 - [BFCL Chore] Supply
data_multi_turn.csv
for Multi-Turn Evaluation Results by @HuanzhiMao in #762 - [BFCL] Remove Workaround Patch for Vertex AI Package by @HuanzhiMao in #761
- Add exponential retry logic for gemini models by @gabrielibagon in #764
- [BFCL] Remove Duplicate Line in
record_cost_latency
by @HuanzhiMao in #767 - Fix handling of examples with no tools in Gemini by @gabrielibagon in #770
- Remove stop condition in gemini retry logic by @gabrielibagon in #769
- Skip adding empty content from gemini by @gabrielibagon in #768
- [BFCL] Add the option to log to WandB during bfcl evaluate by @ThomasRochefortB in #736
- [BFCL] Add
claude-3-5-haiku-20241022
,claude-3-5-haiku-20241022-FC
,claude-3-5-sonnet-20241022
,claude-3-5-sonnet-20241022-FC
by @HuanzhiMao in #750 - [BFCL Dataset Revamp 4/n] Live Irrelevance by @Fanjia-Yan in #763
- [BFCL Dataset Revamp 5/n] Multi-Turn Base WrapUp by @Fanjia-Yan in #772
- [BFCL] Add Unit Test to Check for Illegal Python Parameter Name by @HuanzhiMao in #777
- [BFCL] Dataset and Possible Answer Fix (Live Categories) for Illegal Python Parameter Name by @HuanzhiMao in #778
- [BFCL] Add Support for Regeneration, Specific Test Entry IDs, and Custom Directory Locations by @Raymond112514 in #743
- [BFCL] some tiny fix in possible_answer by @zhangch-ss in #786
- [RAFT] Add link to Azure RAFT Distillation Recipe by @cedricvidal in #758
- [BFCL] Add New Model
Qwen/Qwen2.5-72B-Instruct
by @HuanzhiMao in #787 - [BFCL] Add DeepSeek-V2.5, DeepSeek-Coder-V2-Instruct-0724, DeepSeek-Coder-V2-Lite-Instruct, DeepSeek-V2-Chat-0628, DeepSeek-V2-Lite-Chat by @moonlight1431 in #697
- Add minicpm3 4b FC model handler by @Cppowboy in #718
- [BFCL] Add support for Writer models and Palmyra X 004 by @samjulien in #755
- [BFCL Chore] Add
@final
and@overrides
Decorators to Class Methods in Model Handler by @VishnuSuresh27 in #790 - [BFCL Chore] Support Multiple Models and Test Category Input for BFCL CLI by @vsvaidya27 in #795
- [BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler by @HuanzhiMao in #796
- [BFCL Chore] Quick fix change of decorators from
@overrides
to@override
by @VishnuSuresh27 in #797 - [BFCL Chore] Add Retry Mechanism with Backoff for Rate Limit Handling Across Proprietary Models by @HuanzhiMao in #781
- [BFCL] Bug Fix for Execution_Result_Message Construction for Prompt Caching Feature in Claude Handler by @HuanzhiMao in #805
- [BFCL Dataset Revamp 7/n] Augmented Multi-turn Dataset Fix by @Fanjia-Yan in #804
- [BFCL Dataset Revamp 6/n] Live Relevance Data Fix by @Fanjia-Yan in #789
- Add Weaviate APIs to Gorilla API Zoo by @CShorten in #783
- [BFCL] Improve Latency Measurement Accuracy and Enable Default State Logging by @HuanzhiMao in #808
- [BFCL] Replace 'class' with '_class' to Avoid Function Calling Formatting Error by @Fanjia-Yan in #811
- [BFCL] Added Grok Handler by @amitojsingh2022 in #810
- [BFCL] Resolve Issue in Gemini Model When No Model Output by @HuanzhiMao in #809
- [BFCL] Add Amazon Models
nova-pro-v1.0
,nova-lite-v1.0
, andnova-micro-v1.0
by @HuanzhiMao in #815 - [BFCL Chore] Revamp
README.md
for Clearer Instructions by @HuanzhiMao in #819 - [BFCL] Update gpt-4o Snapshot Version from 2024-08-06 to 2024-11-20 by @HuanzhiMao in #822
- fix some enum type errors in datasets by @zhangch-ss in #826
- Fix Merge Conflict From #826 by @HuanzhiMao in #829
- [BFCL Chore] Add Unit Test for Valid Func Doc Format by @HuanzhiMao in #828
- update hammer handler and add Hammer2.1 model by @linqq9 in #832
- [BFCL] Add New Model
Llama-3.3-70B-Instruct
,Llama-3.3-70B-Instruct-FC
by @HuanzhiMao in #837 - [BFCL] Add
o1-2024-12-17
ando1-2024-12-17-FC
by @HuanzhiMao in #840 - Add Cohere Command R7B, replace older Command R+ handler by @harry-cohere in #835
- [BFCL Dataset] Ground Truth Error Fix by @Fanjia-Yan in #846
- [BFCL] Add
Qwen2.5-0.5B-Instruct
,Qwen2.5-3B-Instruct
,Qwen2.5-14B-Instruct
,Qwen2.5-32B-Instruct
by @HuanzhiMao in #842 - [BFCL] Add New Model
watt-tool-8B
andwatt-tool-70B
by @zhanghanduo in #847 - [BFCL] Skip Executable Categories When API Keys Missing by @HuanzhiMao in #848
- [BFCL] Add
gemini-2.0-flash-exp-FC
,gemini-2.0-flash-exp
,gemini-exp-1206-FC
,gemini-exp-1206
by @HuanzhiMao in #843 - Check and fix some parameter type errors in possible answers by @zhangch-ss in #838
- [BFCL] Use
N/A
in Score Report for Unevaluated Categories by @HuanzhiMao in #849 - [BFCL] possible answer fix: - reigion ->region by @sghyan16 in #852
- [BFCL] Add Mistral Local Serving Handler and Add New Model
mistralai/Ministral-8B-Instruct-2410
by @HuanzhiMao in #855 - [BFCL] Add New Model
DeepSeek-V3
by @HuanzhiMao in #857 - [BFCL] Rename Directories:
proprietary_model
->api_inference
,oss_model
->local_inference
for Better Clarity by @HuanzhiMao in #859 - [BFCL] Support for pre-existing completion endpoint by @ThomasRochefortB in #864
- [BFCL Chore] Ensure Correct Input Format for Eval Checker by @HuanzhiMao in #860
New Contributors
- @devanshamin made their first contribution in #565
- @himanshushukla12 made their first contribution in #605
- @kobe0938 made their first contribution in #615
- @aw632 made their first contribution in #608
- @linqq9 made their first contribution in #627
- @zhangch-ss made their first contribution in #638
- @VishnuSuresh27 made their first contribution in #660
- @AndyChenYH made their first contribution in #640
- @XuHwang made their first contribution in #653
- @leosun12 made their first contribution in #669
- @jieru-hu made their first contribution in #677
- @Cppowboy made their first contribution in #633
- @jacovkim made their first contribution in #696
- @mattf made their first contribution in #621
- @virginie-do made their first contribution in #701
- @hnyls2002 made their first contribution in #587
- @PrathameshSPawar made their first contribution in #713
- @bhargavshirin made their first contribution in #680
- @daanaea made their first contribution in #731
- @ThomasRochefortB made their first contribution in #735
- @RogueTensor made their first contribution in #720
- @gabrielibagon made their first contribution in #760
- @moonlight1431 made their first contribution in #697
- @samjulien made their first contribution in #755
- @vsvaidya27 made their first contribution in #795
- @CShorten made their first contribution in #783
- @amitojsingh2022 made their first contribution in #810
- @zhanghanduo made their first contribution in #847
- @sghyan16 made their first contribution in #852
Full Changelog: v1.1...v1.2