What's the reference policy of Preferred-FT in Figure 2? #70

zetian1025 · 2024-03-04T12:50:09Z

First of all, thanks for your great work!
I'm confused as you said you use the SFT model or Preferred-FT model as the reference policy when operating DPO training.
But for Preferred-FT in Figure 2, what's its reference policy? Or how the KL-Divergence is computed, Is the reference policy aligned?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the reference policy of Preferred-FT in Figure 2? #70

What's the reference policy of Preferred-FT in Figure 2? #70

zetian1025 commented Mar 4, 2024 •

edited

Loading

What's the reference policy of Preferred-FT in Figure 2? #70

What's the reference policy of Preferred-FT in Figure 2? #70

Comments

zetian1025 commented Mar 4, 2024 • edited Loading

zetian1025 commented Mar 4, 2024 •

edited

Loading