From 0b84ea390e9c8a899955238e96cbe12616d4fc84 Mon Sep 17 00:00:00 2001 From: Wei-Lin Chiang Date: Mon, 20 May 2024 09:52:45 -0700 Subject: [PATCH] update --- blog/2024-05-17-category-hard.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2024-05-17-category-hard.md b/blog/2024-05-17-category-hard.md index 10c88633..7de00bad 100644 --- a/blog/2024-05-17-category-hard.md +++ b/blog/2024-05-17-category-hard.md @@ -82,7 +82,7 @@ We are commited to continuously enhance the Chatbot Arena leaderboard and share ### Note: Enhancing Quality Through De-duplication -To improve the overall quality of prompts in Chatbot Arena, we also implement a de-duplication pipeline. This new pipeline aims to remove overly redundant user prompts that might skew the distribution and affect the accuracy of our leaderboard. During our analysis, we noticed that many first-time users tend to ask similar greeting prompts, such as "hello," leading to an over-representation of these types of queries. To address this, we down-sample the top 0.01% most common prompts (approximately 100 prompts, mostly greetings in different languages) to the 99.99% percentile frequency (approximately 150 occurrences). After this process, about 6% of the votes are removed. We believe this helps maintain a diverse and high-quality set of prompts for evaluation. +To improve the overall quality of prompts in Chatbot Arena, we also implement a de-duplication pipeline. This new pipeline aims to remove overly redundant user prompts that might skew the distribution and affect the accuracy of our leaderboard. During our analysis, we noticed that many first-time users tend to ask similar greeting prompts, such as "hello," leading to an over-representation of these types of queries. To address this, we down-sample the top 0.01% most common prompts (approximately 1000 prompts, mostly greetings in different languages) to the 99.9% percentile frequency (25 occurrences). After this process, about 8.6% of the votes are removed. We believe this helps maintain a diverse and high-quality set of prompts for evaluation. We hope to encourage users to submit more unique & fresh prompts to reduce the risk of contamination. We have also open-sourced this de-duplication script on [Github](https://github.com/lm-sys/FastChat/tree/main/fastchat/serve/monitor) and publish the vote data with de-duplication tags in the [notebook](https://colab.research.google.com/drive/1KdwokPjirkTmpO_P1WByFNFiqxWQquwH#scrollTo=CP35mjnHfpfN). We will continue to monitor the impact of this de-duplication process on the leaderboard and make adjustments as necessary to ensure the diversity and quality of our dataset.