Skip to content

Commit

Permalink
Merge pull request #97 from lm-sys/category-hard
Browse files Browse the repository at this point in the history
Category hard
  • Loading branch information
CodingWithTim authored Jun 25, 2024
2 parents 93e05d1 + a601f52 commit efe12dd
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 1 deletion.
20 changes: 19 additions & 1 deletion blog/2024-04-19-arena-hard.md
Original file line number Diff line number Diff line change
Expand Up @@ -813,7 +813,7 @@ We hope to study deeper into the above limitations and biases in the later techn
We thank Matei Zaharia, Yann Dubois, Anastasios Angelopoulos, Lianmin Zheng, Lewis Tunstall, Nathan Lambert, Xuechen Li, Naman Jain, Ying Sheng, Maarten Grootendorst for their valuable feedback. We thank Siyuan Zhuang and Dacheng Li for the valuable review and debug of the code. We thank Microsoft [AFMR](https://www.microsoft.com/en-us/research/collaboration/accelerating-foundation-models-research/) for Azure OpenAI credits support. We also thank Together.ai & Anyscale for open model endpoint support.

## Citation
If you find Arena-Hard-Auto useful, please cite our paper below.
If you find Arena-Hard-Auto or BenchBuilder useful, please cite our papers below.
```
@misc{li2024crowdsourced,
title={From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline},
Expand All @@ -823,8 +823,26 @@ If you find Arena-Hard-Auto useful, please cite our paper below.
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{chiang2024chatbot,
title={Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference},
author={Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Hao Zhang and Banghua Zhu and Michael Jordan and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2403.04132},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
@misc{arenahard2024,
title = {From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline},
url = {https://lmsys.org/blog/2024-04-19-arena-hard/},
author = {Tianle Li*, Wei-Lin Chiang*, Evan Frick, Lisa Dunlap, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica},
month = {April},
year = {2024}
}
```


## Appendix
<img src="/images/blog/arena_hard/heatmap.png" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 120%"></img>
<p style="color:gray; text-align: center;">Appendix Figure 1: Similarity Heatmap of 50 Arena Hard Auto v0.1 Clusters</p>
Expand Down
18 changes: 18 additions & 0 deletions blog/2024-05-17-category-hard.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,24 @@ We have also open-sourced this de-duplication script on [Github](https://github.

## Citation
```
@misc{li2024crowdsourced,
title={From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline},
author={Tianle Li and Wei-Lin Chiang and Evan Frick and Lisa Dunlap and Tianhao Wu and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2406.11939},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{chiang2024chatbot,
title={Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference},
author={Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Hao Zhang and Banghua Zhu and Michael Jordan and Joseph E. Gonzalez and Ion Stoica},
year={2024},
eprint={2403.04132},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
@misc{arenahard2024,
title = {From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline},
url = {https://lmsys.org/blog/2024-04-19-arena-hard/},
Expand Down

0 comments on commit efe12dd

Please sign in to comment.