From 7861ee7bd8f2c063ae10f179828112b677c868d3 Mon Sep 17 00:00:00 2001 From: efrick2002 Date: Sun, 26 Nov 2023 22:10:14 -0800 Subject: [PATCH] fixed 2 words --- blog/starling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/starling.md b/blog/starling.md index 2543351..5b202bc 100644 --- a/blog/starling.md +++ b/blog/starling.md @@ -77,7 +77,7 @@ The most challenging aspect of creating Nectar was mitigating the positional bia To address this, as shown in the second figure, we instructed GPT-4 to first conduct pairwise comparisons for all response pairs before compiling a 7-wise ranking. This approach moderately reduced the positional bias. We have also explored having GPT-4 score or judge each prompt individually before summarizing in a 7-wise ranking, but this method did not effectively diminish the bias. -Further reduction of positional bias came with the introduction of a specific, and then a randomized, pairwise evaluation order, as demonstrated in the third and fourth figures, respectively. This approach proved most effective in counteracting positional bias, leading to the final methodology employed in curating the Nectar dataset. Further details regarding dataset preparation and analysis will be elaborated in our upcoming paper. +Further reduction of positional bias came with the introduction of a specific, and then a randomized pairwise evaluation order, as demonstrated in the third and fourth figures, respectively. This approach proved most effective in counteracting positional bias, leading to the final methodology employed in curating the Nectar dataset. Further details regarding dataset preparation and analysis will be elaborated in our upcoming paper. We believe that Nectar will be a valuable resource for developers aiming to train more effective models using RLHF / RLAIF. It also offers high-quality responses for a diverse range of prompts, and can provide researchers with deeper insights into RLHF / RLAIF and the interplay between synthetic and human data.