Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
YannDubs authored Aug 26, 2024
2 parents 86aaf94 + 9136c7f commit 5ed3b18
Show file tree
Hide file tree
Showing 13 changed files with 142,837 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
name,length_controlled_winrate,win_rate,avg_length,link,samples,filter
Blendax.AI-gm-l6-vo31,76.91981221023656,69.11033492869565,1809,https://www.blendax.ai/post/blendaxai-gm-l6-vo31,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/blendaxai-gm-l6-vo31/model_outputs.json,community
gemma-2-9b-it-WPO-HB,76.72506842726064,77.82503168985093,2285,https://huggingface.co/wzhouad/gemma-2-9b-it-WPO-HB,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gemma-2-9b-it-WPO-HB/model_outputs.json,community
Blendax.AI-gm-l3-v35,73.37270365010379,73.41035740244067,2186,https://www.blendax.ai/post/blendaxai-gm-l3-v35,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/blendaxai-gm-l3-v35/model_outputs.json,community
gemma-2-9b-it-SimPO,72.3508446939842,65.86422561532919,1833,https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gemma-2-9b-it-SimPO/model_outputs.json,community
OpenPipe MoA GPT-4 Turbo,68.37866250336802,63.15493451236265,1856,https://openpipe.ai/blog/mixture-of-agents,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openpipe-moa-gpt-4-turbo-v1/model_outputs.json,community
gemma-2-9b-it-DPO,67.6620382198043,65.35922380122982,2016,https://huggingface.co/princeton-nlp/gemma-2-9b-it-DPO,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gemma-2-9b-it-DPO/model_outputs.json,community
Together MoA,65.37996976852163,59.8688062333292,1825,https://github.com/togethercomputer/moa,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Together-MoA/model_outputs.json,community
Llama3 PBM Nova 70B,62.39078292806358,62.95129983494411,2207,https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Llama3-PBM-Nova-70B/model_outputs.json,community
Storm-7B (best-of-64),61.63789557199839,63.04099075186919,2340,https://huggingface.co/jieliu/Storm-7B,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Storm-7B-best-of-64/model_outputs.json,community
Together MoA-Lite,59.1415240989275,56.593045622273294,1968,https://github.com/togethercomputer/moa,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Together-MoA-Lite/model_outputs.json,community
Aligner 2B+GPT-4 Turbo (04/09),58.33130206276722,46.77089325668323,1370,https://github.com/AlignInc/aligner-replication,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/aligner-2b_gpt-4-turbo-2024-04-09/model_outputs.json,community
Expand Down
4,832 changes: 4,832 additions & 0 deletions results/Llama3-PBM-Nova-70B/model_outputs.json

Large diffs are not rendered by default.

65,929 changes: 65,929 additions & 0 deletions results/Llama3-PBM-Nova-70B/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

5,637 changes: 5,637 additions & 0 deletions results/blendaxai-gm-l6-vo31/model_outputs.json

Large diffs are not rendered by default.

66,383 changes: 66,383 additions & 0 deletions results/blendaxai-gm-l6-vo31/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion src/alpaca_eval/decoders/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,11 @@ def _get_price_per_token(model, price_per_token=None):
"""Returns the price per token for a given model"""
if price_per_token is not None:
return float(price_per_token)
if "gpt-4-turbo" in model:
if "gpt-4o-mini-2024-07-18" in model:
return (
0.15 / 1_000_000
) # that's not completely true because decoding is 0.03 but close enough given that most is context
elif "gpt-4-turbo" in model:
return 0.01 / 1000
elif "gpt-4-1106" in model:
return (
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
weighted_alpaca_eval_gpt-4o-mini-2024-07-18:
prompt_template: "alpaca_eval_clf_gpt4_turbo/alpaca_eval_clf.txt"
fn_completions: "openai_completions"
completions_kwargs:
model_name: "gpt-4o-mini-2024-07-18"
max_tokens: 1
temperature: 1 # temperature should be applied for sampling, so that should make no effect.
logprobs: true
top_logprobs: 5
fn_completion_parser: "logprob_parser"
completion_parser_kwargs:
numerator_token: "m"
denominator_tokens: ["m", "M"]
is_binarize: false
completion_key: "completions_all"
batch_size: 1
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,discrete_win_rate,mode,avg_length,length_controlled_winrate,lc_standard_error
Shopee-SlimMoA-v1,75.61428659805350,1.2706274059194700,621,184,0,805,77.14285714285720,community,1994,77.4515432873834,0.43017522149239600
blendaxai-gm-l6-vo31,69.11033492869565,1.3280735654354863,562,242,1,805,69.87577639751554,community,1809,76.91981221023656,0.5725365663132986
gemma-2-9b-it-WPO-HB,77.82503168985093,1.2355857177790277,640,163,2,805,79.62732919254658,community,2285,76.72506842726064,0.4242603928637889
blendaxai-gm-l3-v35,73.41035740244067,1.254951147343878,607,196,2,805,75.527950310559,community,2186,73.37270365010379,0.6163911450738288
gemma-2-9b-it-SimPO,65.86422561532919,1.423459922555078,540,264,1,805,67.14285714285714,community,1833,72.3508446939842,0.5167873784867067
openpipe-moa-gpt-4-turbo-v1,63.15493451236265,1.422980098799326,515,283,7,805,64.40993788819875,community,1856,68.37866250336802,0.7309418614587613
gemma-2-9b-it-DPO,65.35922380122982,1.402802336467638,536,268,1,805,66.64596273291924,community,2016,67.6620382198043,0.6605613085864308
Together-MoA,59.8688062333292,1.434305604543079,490,314,1,805,60.93167701863354,community,1825,65.37996976852163,0.7392392836781445
Llama3-PBM-Nova-70B,62.95129983494411,1.3965649883206293,512,293,0,805,63.60248447204969,community,2207,62.39078292806358,0.7630318008010619
Storm-7B-best-of-64,63.04099075186919,1.4253258915161846,519,286,0,805,64.472049689441,community,2340,61.63789557199839,
Together-MoA-Lite,56.593045622273294,1.4464848562244548,456,347,2,805,56.77018633540373,community,1968,59.1415240989275,0.7580510219326322
aligner-2b_gpt-4-turbo-2024-04-09,46.77089325668323,1.3378060774476594,371,417,17,805,40.18633540372671,community,1370,58.33130206276722,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ gpt4_turbo_cot_logprob,67.86974910317902,5.397145061728395,1568.9484159171295,0.
gpt4_turbo_cot_clf,67.59689922480621,5.3972248062015495,1528.4046718706977,0.6666666666666667,0.6326057742256878,,,0.5936794582392777,0.5855855855855856,0.5255813953488373,645,verified
claude_ranking,67.5925925925926,4.954578395061729,218.4230414438272,0.9,0.90848221004591,,,0.7303370786516854,0.6576576576576577,0.4552469135802468,648,verified
alpaca_eval_llama3_70b_fn,67.53091913784353,0.41207197526091993,208.69685160402955,0.9,0.8577236113497642,32.25308641975309,8.204334365325078,0.7910112359550562,0.6576576576576577,0.47931967529957475,2587,minimal
weighted_alpaca_eval_gpt-4o-mini-2024-07-18,0.33674775667807266,12.90736111111111,93.54821923706267,0.9833333333333333,0.9389828560875118,32.24432530355238,14.380747136032564,0.7094594594594594,0.6306306306306306,0.5017959384038282,2592,minimal
gpt4,66.93672839506173,12.452592592592593,1036.788589334915,0.8833333333333333,0.8668599990267735,31.481481481481488,14.621913580246911,0.647191011235955,0.6666666666666666,0.5397376543209877,2592,minimal
alpaca_farm_greedy_gpt4,66.43518518518519,15.28163425925926,877.6250469425926,0.8499999999999999,0.7481465609199582,30.246913580246915,19.290123456790123,0.597752808988764,0.6486486486486487,0.5362654320987654,2592,minimal
weighted_alpaca_eval_gpt4_turbo,65.73198824263118,4.323981481481481,227.7462866895061,0.7833333333333333,0.7688872243700914,33.89896126543981,23.652705035108028,0.6058558558558559,0.5727272727272728,0.5282783420419752,2592,minimal
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,4 @@ Qwen2-72B-Instruct,-1.6674930210615639,0.9244007518196494,-0.5299232192745307
gpt-4o-mini-2024-07-18,-1.4396243284854136,0.8239981543339437,0.1463734386267150
Mistral-7B-Instruct-v0.3,-1.5007159011881868,0.9845683091847074,-1.7652759895328634
Shopee-SlimMoA-v1,-0.6930943742294789,0.5778443790027642,1.4506276222723822
blendaxai-gm-l6-vo31,-1.4827230167114802,0.8256378421072179,1.5942312525409852
15 changes: 15 additions & 0 deletions src/alpaca_eval/models_configs/Llama3-PBM-Nova-70B/configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Llama3-PBM-Nova-70B:
prompt_template: "Llama3-PBM-Nova-70B/prompt.txt"
fn_completions: "openai_completions"
completions_kwargs:
model_name: "PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B"
requires_chatml: True
temperature: 0.7
num_procs: 8
max_tokens: 4096
top_p: 0.8
price_per_token: 9e-7
client_kwargs:
base_url: "https://api.together.xyz/v1"
pretty_name: "Llama3 PBM Nova 70B"
link: "https://huggingface.co/PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B"
5 changes: 5 additions & 0 deletions src/alpaca_eval/models_configs/Llama3-PBM-Nova-70B/prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
blendaxai-gm-l6-vo31:
prompt_template: "gpt4_1106_preview/chatml_prompt.txt"
fn_completions: null
completions_kwargs:
model_name: "blendaxai-gm-l6-vo31"
max_tokens: 4096
temperature: 1.0
pretty_name: "Blendax.AI-gm-l6-vo31"
link: "https://www.blendax.ai/post/blendaxai-gm-l6-vo31"

0 comments on commit 5ed3b18

Please sign in to comment.