Skip to content

Commit

Permalink
minor test fix
Browse files Browse the repository at this point in the history
  • Loading branch information
YannDubs committed Mar 20, 2024
1 parent 241e1f6 commit 37cff76
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/data_AlpacaEval/alpaca_eval_gpt4_leaderboard.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name,length_controlled_winrate,win_rate,avg_length,link,samples,filter
GPT-4 Perview,89.85849210429464,97.69900497512438,2049,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_1106_preview/model_outputs.json,minimal
GPT-4 Preview,89.85849210429464,97.69900497512438,2049,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_1106_preview/model_outputs.json,minimal
XwinLM 70b V0.3,94.01522563893708,97.636815920398,2113,https://github.com/Xwin-LM/Xwin-LM,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/xwinlm-70b-v0.3/model_outputs.json,community
Mistral Medium,91.54314285144824,96.83229813664596,1500,https://mistral.ai/news/la-plateforme/,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/mistral-medium/model_outputs.json,minimal
XwinLM 70b V0.1,,95.56803995,1775,https://github.com/Xwin-LM/Xwin-LM,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/xwinlm-70b-v0.1/model_outputs.json,community
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name,length_controlled_winrate,win_rate,avg_length,link,samples,filter
GPT-4 Perview,50.0,50.0,2049,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_1106_preview/model_outputs.json,minimal
GPT-4 Preview,50.0,50.0,2049,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_1106_preview/model_outputs.json,minimal
Claude 3 Opus (02/29),40.39177606350116,29.04176413403727,1388,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-3-opus-20240229/model_outputs.json,minimal
GPT-4,38.12808974440021,23.576789314782605,1365,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json,minimal
Qwen1.5 72B Chat,36.571754111987296,26.49828339562733,1549,https://huggingface.co/Qwen/Qwen1.5-72B-Chat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Qwen1.5-72B-Chat/model_outputs.json,community
Expand Down
4 changes: 2 additions & 2 deletions src/alpaca_eval/metrics/glm_winrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ def get_length_controlled_winrate(
df = utils.convert_to_dataframe(annotations)

if save_weights_dir == "auto":
assert len(annotations["annotator"].unique()) == 1
save_weights_dir = Path(__file__).parent / "weights" / annotations["annotator"].unique()[0]
assert len(df["annotator"].unique()) == 1
save_weights_dir = Path(__file__).parent / "weights" / df["annotator"].unique()[0]

assert len(df["generator_2"].unique()) == 1
model_name = list(df["generator_2"].unique())[0]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ gpt4_1106_preview:
completions_kwargs:
model_name: "gpt-4-1106-preview"
max_tokens: 4096
pretty_name: "GPT-4 Perview"
pretty_name: "GPT-4 Preview"

0 comments on commit 37cff76

Please sign in to comment.