From c01efa191a10c5aec5715014963638ae8ddd4f71 Mon Sep 17 00:00:00 2001 From: Ambika Joshi Date: Wed, 11 Sep 2024 15:01:10 +0000 Subject: [PATCH] GITBOOK-171: bulk runner revisions --- guides/understanding-bulk-runner-and-evaluation/README.md | 6 +++--- .../how-to-set-up-bulk-runner.md | 2 ++ .../how-to-set-up-evaluations.md | 2 ++ 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/guides/understanding-bulk-runner-and-evaluation/README.md b/guides/understanding-bulk-runner-and-evaluation/README.md index f2eb599..4cf0dde 100644 --- a/guides/understanding-bulk-runner-and-evaluation/README.md +++ b/guides/understanding-bulk-runner-and-evaluation/README.md @@ -91,9 +91,9 @@ The iterative bulk runs and systematic comparison provide a framework for improv ## When do you use Bulk vs Bulk+Eval vs Eval? -* **Bulk workflow only** - If you want to test your Copilot’s functionality for regression tests, monitoring and observability, and bugs. -* **Bulk and Eval** - If you are testing improvements on your prompts, or updating your documents, want to consider A/B testing. -* **Eval** **workflow only**- If you already have test data and want to use “LLM as Judge” to evaluate it +* [**Bulk workflow only**](https://gooey.ai/bulk/farmerchat-bulk-evaluator-regression-only-ggzy9gld1eae/) - If you want to test your Copilot’s functionality for regression tests, monitoring and observability, and bugs. +* [**Bulk and Eval** ](https://gooey.ai/bulk/farmerchat-bulk-evaluator-gpt-4o-mixtral-claude-vs-gemini-pro-15-b0o8aos3rj8y/)- If you are testing improvements on your prompts, or updating your documents, want to consider A/B testing. +* [**Eval** **workflow only**](https://gooey.ai/eval/copilot-evaluator-artpuhzwvily/)- If you already have test data and want to use “LLM as Judge” to evaluate it ### Common terms diff --git a/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md b/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md index f418173..d47a1e4 100644 --- a/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md +++ b/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-bulk-runner.md @@ -2,6 +2,8 @@ In this example scenario, we are setting up a simple bulk run to check regression for an AI Copilot in production. +
Check out the example run here: BULK RUNNER (Regression Only)https://gooey.ai/bulk/farmerchat-bulk-evaluator-regression-only-ggzy9gld1eae/
Check out the example run here: BULK RUNNER (Bulk and Evaluation)https://gooey.ai/bulk/farmerchat-bulk-evaluator-gpt-4o-mixtral-claude-vs-gemini-pro-15-b0o8aos3rj8y/
+ ### Step 1: Select Gooey Workflows Choose the “SAVED” run from Gooey.AI Workflows that you would like to use. diff --git a/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-evaluations.md b/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-evaluations.md index 8488d49..9ed6c4c 100644 --- a/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-evaluations.md +++ b/guides/understanding-bulk-runner-and-evaluation/how-to-set-up-evaluations.md @@ -2,6 +2,8 @@ In this example scenario, we are comparing and evaluating the quality of the answers of various AI Copilots that have all the same settings and functionalities except for different LLMs. +
Check out the example run here: Evaluation only https://gooey.ai/eval/copilot-evaluator-artpuhzwvily/
+ ### Step 1: Select Gooey Workflows to evaluate Choose the “SAVED” run from Gooey.AI Workflows that you would like to use.