From cdcb07f26142472fe2567ab99d57657863d45b3d Mon Sep 17 00:00:00 2001
From: Andrei Fajardo <andrei@nerdai.io>
Date: Mon, 23 Sep 2024 13:54:33 -0400
Subject: [PATCH] README:

---
 README.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index b5ca87a..d2c9f93 100644
--- a/README.md
+++ b/README.md
@@ -59,7 +59,7 @@ ARC task.
   <img height="500" src="https://d3ddy8balm3goa.cloudfront.net/arc-task-solver-st-demo/arc-task-solver-app.svg" alt="cover">
 </p>
 
-### Solving An ARC Task
+### Solving an ARC Task
 
 Each ARC task consists of training examples, each of which consist of input and
 output pairs. There exists a common pattern between these input and output pairs,
@@ -91,7 +91,7 @@ designated text area. You can choose to use this Critique or supply your own by
 overwriting the text and applying the change. Once ready to produce the next
 prediction, hit the `Continue` button.
 
-### Saving Solutions For Fine-Tuning
+### Saving solutions for fine-tuning
 
 Any collaboration session involving the LLM and human can be saved and used to
 finetune an LLM. In this app, we use OpenAI LLMs, and so the finetuning examples
@@ -102,3 +102,79 @@ example that can be used for fine-tuning.
 <p align="center">
   <img height="500" src="https://d3ddy8balm3goa.cloudfront.net/arc-task-solver-st-demo/finetuning-arc-example.svg" alt="cover">
 </p>
+
+## Fine-tuning (with `arc-finetuning-cli`)
+
+After you've created your finetuning examples (you'll need at least 10 of them),
+you can submit a job to OpenAI to finetune an LLM on them. To do so, we have a
+convenient command line tool, that is powered by LlamaIndex plugins such as
+`llama-index-finetuning`.
+
+```sh
+arc finetuning cli tool.
+
+options:
+  -h, --help            show this help message and exit
+
+commands:
+  {evaluate,finetune,job-status}
+    evaluate            Evaluation of ARC Task predictions with LLM and ARCTaskSolverWorkflow.
+    finetune            Finetune OpenAI LLM on ARC Task Solver examples.
+    job-status          Check the status of finetuning job.
+```
+
+### Submitting a fine-tuning job
+
+To submit a fine-tuning job, use any of the following three `finetune` command:
+
+```sh
+# submit a new finetune job using the specified llm
+arc-finetuning-cli finetune --llm gpt-4o-2024-08-06
+
+# submit a new finetune job that continues from previously finetuned model
+arc-finetuning-cli finetune --llm gpt-4o-2024-08-06 --start-job-id ftjob-TqJd5Nfe3GIiScyTTJH56l61
+
+# submit a new finetune job that continues from the most recent finetuned model
+arc-finetuning-cli finetune --continue-latest
+```
+
+The commands above will take care of compiling all of the single finetuning json
+examples (i.e. stored in `finetuning_examples/`) into a single `jsonl` file that
+is then passed to OpenAI finetuning API.
+
+### Checking the status of a fine-tuning job
+
+After submitting a job, you can check its status using the below cli commands:
+
+```sh
+arc-finetuning-cli job-status -j ftjob-WYySY3iGYpfiTbSDeKDZO0YL -m gpt-4o-2024-08-06
+
+# or check status of the latest job submission
+arc-finetuning-cli job-status --latest
+```
+
+## Evaluation
+
+You can evaluate the `ARCTaskSolverWorkflow` and a specified LLM on the ARC test
+dataset. You can even supply a fine-tuned LLM here.
+
+```sh
+# evaluate ARCTaskSolverWorkflow single attempt with gpt-4o
+arc-finetuning-cli evaluate --llm gpt-4o-2024-08-06
+
+# evaluate ARCTaskSolverWorkflow single attempt with a previously fine-tuned gpt-4o
+arc-finetuning-cli evaluate --llm gpt-4o-2024-08-06 --start-job-id ftjob-TqJd5Nfe3GIiScyTTJH56l61
+```
+
+You can also specify certain parameters to control the speed of the execution so
+as to not run into `RateLimitError`'s from OpenAI.
+
+```sh
+arc-finetuning-cli evaluate --llm gpt-4o-2024-08-06 --batch-size 5 --num-workers 3 --sleep 10
+```
+
+In the above command, `batch-size` refers to the number of test cases handled in
+single batch. In total, there are 400 test cases. Moreover, `num-workers` is the
+maximum number of async calls allowed to be made to OpenAI API at any given moment.
+Finally, `sleep` is the amount of time in seconds the execution halts before moving
+onto the next batch of test cases.