Merge pull request #97 from lm-sys/category-hard

Category hard
lm-sys · Jun 25, 2024 · efe12dd · efe12dd
2 parents 93e05d1 + a601f52
commit efe12dd
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 1 deletion.
diff --git a/blog/2024-04-19-arena-hard.md b/blog/2024-04-19-arena-hard.md
@@ -813,7 +813,7 @@ We hope to study deeper into the above limitations and biases in the later techn
 We thank Matei Zaharia, Yann Dubois, Anastasios Angelopoulos, Lianmin Zheng, Lewis Tunstall, Nathan Lambert, Xuechen Li, Naman Jain, Ying Sheng, Maarten Grootendorst for their valuable feedback. We thank Siyuan Zhuang and Dacheng Li for the valuable review and debug of the code. We thank Microsoft [AFMR](https://www.microsoft.com/en-us/research/collaboration/accelerating-foundation-models-research/) for Azure OpenAI credits support. We also thank Together.ai & Anyscale for open model endpoint support.
 
 ## Citation
-If you find Arena-Hard-Auto useful, please cite our paper below.
+If you find Arena-Hard-Auto or BenchBuilder useful, please cite our papers below.
 ```
 @misc{li2024crowdsourced,
       title={From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline}, 
@@ -823,8 +823,26 @@ If you find Arena-Hard-Auto useful, please cite our paper below.
       archivePrefix={arXiv},
       primaryClass={cs.LG}
 }
+
+@misc{chiang2024chatbot,
+    title={Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference},
+    author={Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Hao Zhang and Banghua Zhu and Michael Jordan and Joseph E. Gonzalez and Ion Stoica},
+    year={2024},
+    eprint={2403.04132},
+    archivePrefix={arXiv},
+    primaryClass={cs.AI}
+}
+
+@misc{arenahard2024,
+    title = {From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline},
+    url = {https://lmsys.org/blog/2024-04-19-arena-hard/},
+    author = {Tianle Li*, Wei-Lin Chiang*, Evan Frick, Lisa Dunlap, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica},
+    month = {April},
+    year = {2024}
+}
 ```
 
+
 ## Appendix
 <img src="/images/blog/arena_hard/heatmap.png" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 120%"></img>
 <p style="color:gray; text-align: center;">Appendix Figure 1: Similarity Heatmap of 50 Arena Hard Auto v0.1 Clusters</p>

diff --git a/blog/2024-05-17-category-hard.md b/blog/2024-05-17-category-hard.md
@@ -88,6 +88,24 @@ We have also open-sourced this de-duplication script on [Github](https://github.
 
 ## Citation
 ```
+@misc{li2024crowdsourced,
+      title={From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline}, 
+      author={Tianle Li and Wei-Lin Chiang and Evan Frick and Lisa Dunlap and Tianhao Wu and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
+      year={2024},
+      eprint={2406.11939},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG}
+}
+
+@misc{chiang2024chatbot,
+    title={Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference},
+    author={Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Hao Zhang and Banghua Zhu and Michael Jordan and Joseph E. Gonzalez and Ion Stoica},
+    year={2024},
+    eprint={2403.04132},
+    archivePrefix={arXiv},
+    primaryClass={cs.AI}
+}
+
 @misc{arenahard2024,
     title = {From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline},
     url = {https://lmsys.org/blog/2024-04-19-arena-hard/},