diff --git a/index.html b/index.html index 08683c1..d50602e 100644 --- a/index.html +++ b/index.html @@ -153,7 +153,7 @@

Abstract

Self-Play Fine-Tuning (SPIN)

- +

@@ -220,7 +220,15 @@

Results

Average score of SPIN at different iterations on the HuggingFace Open LLM leaderboard.

- + +
+ + +

+ Test performance of SPIN based on zephyr-7b-sft-full across HuggingFace Open LLM Leaderboard datasets. + We also denote the average improvement over last iteration in the Average column. +

+
@@ -237,6 +245,67 @@

+ +
+
+

Ablation Studies

+
+

+ We examine the effect of synthetic dataset size and training epochs within an iteration. + Our analysis demonstrates the effectiveness of the synthetic data used by SPIN compared to + the SFT data, as well as the necessity of iterative training in SPIN. Furthermore, to comprehensively + assess the performance improvements of SPIN, we perform additional evaluations on benchmark + tasks distinct from those in the Open LLM leaderboard. +

+
+
+
+ + +
+
+
+ +
+
+

+ + +
diff --git a/static/images/ablation1.png b/static/images/ablation1.png new file mode 100644 index 0000000..42c60ff Binary files /dev/null and b/static/images/ablation1.png differ diff --git a/static/images/ablation2.png b/static/images/ablation2.png new file mode 100644 index 0000000..1416021 Binary files /dev/null and b/static/images/ablation2.png differ diff --git a/static/images/tab.png b/static/images/tab.png new file mode 100644 index 0000000..f0d565d Binary files /dev/null and b/static/images/tab.png differ diff --git a/static/images/tab2.png b/static/images/tab2.png new file mode 100644 index 0000000..8817f1d Binary files /dev/null and b/static/images/tab2.png differ