Skip to content

Commit

Permalink
Automated leaderboard update
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Dec 27, 2024
1 parent 8bb6e57 commit 11a74e1
Showing 1 changed file with 6 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,25 @@ gemma-2-9b-it-WPO-HB,76.72506842726064,77.82503168985093,2285,https://huggingfac
SelfMoA + gemma-2-9b-it-SimPO,75.04950944068965,71.9958856144492,1930,https://github.com/wenzhe-li/Self-MoA/,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/SelfMoA_gemma-2-9b-it-SimPO/model_outputs.json,community
Blendax.AI-gm-l3-v35,73.37270365010379,73.41035740244067,2186,https://www.blendax.ai/post/blendaxai-gm-l3-v35,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/blendaxai-gm-l3-v35/model_outputs.json,community
gemma-2-9b-it-SimPO,72.3508446939842,65.86422561532919,1833,https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gemma-2-9b-it-SimPO/model_outputs.json,community
FuseChat-Gemma-2-9B-Instruct,70.18106263911686,70.49713534560247,2155,https://huggingface.co/FuseAI/FuseChat-Gemma-2-9B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/FuseChat-Gemma-2-9B-Instruct/model_outputs.json,community
OpenPipe MoA GPT-4 Turbo,68.37866250336802,63.15493451236265,1856,https://openpipe.ai/blog/mixture-of-agents,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openpipe-moa-gpt-4-turbo-v1/model_outputs.json,community
gemma-2-9b-it-DPO,67.6620382198043,65.35922380122982,2016,https://huggingface.co/princeton-nlp/gemma-2-9b-it-DPO,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gemma-2-9b-it-DPO/model_outputs.json,community
FuseChat-Llama-3.1-8B-Instruct,65.38623116037492,63.33158292362734,2033,https://huggingface.co/FuseAI/FuseChat-Llama-3.1-8B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/FuseChat-Llama-3.1-8B-Instruct/model_outputs.json,community
Together MoA,65.37996976852163,59.8688062333292,1825,https://github.com/togethercomputer/moa,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Together-MoA/model_outputs.json,community
FuseChat-Qwen-2.5-7B-Instruct,63.58298649463735,64.64069997299381,2173,https://huggingface.co/FuseAI/FuseChat-Qwen-2.5-7B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/FuseChat-Qwen-2.5-7B-Instruct/model_outputs.json,community
Llama3 PBM Nova 70B,62.39078292806358,62.95129983494411,2207,https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Llama3-PBM-Nova-70B/model_outputs.json,community
Storm-7B (best-of-64),61.63789557199839,63.04099075186919,2340,https://huggingface.co/jieliu/Storm-7B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Storm-7B-best-of-64/model_outputs.json,community
Together MoA-Lite,59.1415240989275,56.593045622273294,1968,https://github.com/togethercomputer/moa,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Together-MoA-Lite/model_outputs.json,community
Aligner 2B+GPT-4 Turbo (04/09),58.33130206276722,46.77089325668323,1370,https://github.com/AlignInc/aligner-replication,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/aligner-2b_gpt-4-turbo-2024-04-09/model_outputs.json,community
GPT-4 Omni (05/13),57.45682883335095,51.32757578249279,1873,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt-4o-2024-05-13/model_outputs.json,minimal
Higgs-Llama-3-70B V2,56.76317433000503,68.63519246435168,2657,https://boson.ai/,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/higgs-llama-3-70b-v2/model_outputs.json,community
GPT-4 Turbo (04/09),55.01530093647852,46.11526538763708,1802,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt-4-turbo-2024-04-09/model_outputs.json,minimal
FuseChat-Llama-3.2-3B-Instruct,53.99883748344241,51.29667710101864,1976,https://huggingface.co/FuseAI/FuseChat-Llama-3.2-3B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/FuseChat-Llama-3.2-3B-Instruct/model_outputs.json,community
SPPO-Gemma-2-9B-It-PairRM,53.96983730150777,48.23404468746583,1803,https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/SPPO-Gemma-2-9B-It-PairRM/model_outputs.json,community
Llama-3-Instruct-8B-WPO-HB-v2,53.37264268894168,57.33198613024009,2472,https://huggingface.co/wzhouad/Llama3-Instruct-8B-WPO-HB-v2,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Llama-3-Instruct-8B-WPO-HB-v2/model_outputs.json,community
Claude 3.5 Sonnet (06/20),52.36675427146999,40.56021409682828,1488,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-3-5-sonnet-20240620/model_outputs.json,community
Yi-Large Preview,51.894415134099546,57.46724251946292,2335,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/yi-large-preview/model_outputs.json,verified
Llama-3-Instruct-8B-RainbowPO,51.66066005580552,47.91794368953007,1878,https://huggingface.co/BraceZHY/Llama-3-8B-Instruct-RainbowPO,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Llama-3-Instruct-8B-RainbowPO/model_outputs.json,community
GPT-4o Mini (07/18),50.727144855901976,44.65413862507926,1861,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt-4o-mini-2024-07-18/model_outputs.json,minimal
Storm-7B,50.45110959343775,50.26886905528583,2045,https://huggingface.co/jieliu/Storm-7B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Storm-7B/model_outputs.json,community
GPT-4 Preview (11/06),50.0,50.0,2049,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_1106_preview/model_outputs.json,minimal
Expand Down Expand Up @@ -77,6 +82,7 @@ ExPO + Tulu-2-DPO-70B,25.72330817134933,22.98061970610497,1738,https://huggingfa
Claude Instant 1.2,25.61225902543337,16.12739962159006,1112,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-instant-1.2/model_outputs.json,community
Infinity-Instruct-3M-0613-Mistral-7B,25.501557794727287,15.747828130770788,1180,https://huggingface.co/BAAI/Infinity-Instruct-3M-0613-Mistral-7B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Infinity-Instruct-3M-0613-Mistral-7B/model_outputs.json,community
DBRX Instruct,25.37544974044448,18.44834898407453,1450,https://huggingface.co/databricks/dbrx-instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/dbrx-instruct/model_outputs.json,verified
FuseChat-Llama-3.2-1B-Instruct,25.27098247880791,29.9219322658882,2259,https://huggingface.co/FuseAI/FuseChat-Llama-3.2-1B-Instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/FuseChat-Llama-3.2-1B-Instruct/model_outputs.json,community
Claude 2.1,25.251943886133027,15.733506736409938,1096,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2.1/model_outputs.json,verified
Nanbeige2 8B Chat,25.24207090175315,39.35450207219922,2709,https://huggingface.co/Nanbeige/Nanbeige2-8B-Chat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Nanbeige2-8B-Chat/model_outputs.json,community
XwinLM 70b V0.1,24.649686057119272,21.812957073875776,1775,https://github.com/Xwin-LM/Xwin-LM,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/xwinlm-70b-v0.1/model_outputs.json,community
Expand Down

0 comments on commit 11a74e1

Please sign in to comment.