Skip to content

Commit

Permalink
Merge pull request #118 from Azure-Samples/testeval6
Browse files Browse the repository at this point in the history
Add TOC to eval markdown
  • Loading branch information
pamelafox authored Oct 23, 2024
2 parents 1929ba1 + ca9a3fd commit 4d0e801
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 7 deletions.
16 changes: 11 additions & 5 deletions .github/workflows/evaluate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -170,20 +170,26 @@ jobs:
- name: Summarize results
if: ${{ success() }}
run: |
echo "📊 Evaluation Results" >> $GITHUB_STEP_SUMMARY
python -m evaltools summary evals/results --output=markdown >> eval-results.md
cat eval-results.md >> $GITHUB_STEP_SUMMARY
echo "## Evaluation results" >> eval-summary.md
python -m evaltools summary evals/results --output=markdown >> eval-summary.md
echo "## Answer differences across runs" >> run-diff.md
python -m evaltools diff evals/results/baseline evals/results/pr${{ github.event.issue.number }} --output=markdown >> run-diff.md
cat eval-summary.md >> $GITHUB_STEP_SUMMARY
cat run-diff.md >> $GITHUB_STEP_SUMMARY
- name: Comment on pull request
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const summaryPath = "eval-results.md";
const summaryPath = "eval-summary.md";
const summary = fs.readFileSync(summaryPath, 'utf8');
const runId = process.env.GITHUB_RUN_ID;
const repo = process.env.GITHUB_REPOSITORY;
const actionsUrl = `https://github.com/${repo}/actions/runs/${runId}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: summary
body: `${summary}\n\n[Check the Actions tab for more details](${actionsUrl}).`
})
11 changes: 9 additions & 2 deletions docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

Follow these steps to evaluate the quality of the answers generated by the RAG flow.

* [Deploy a GPT-4 model](#deploy-a-gpt-4-model)
* [Setup the evaluation environment](#setup-the-evaluation-environment)
* [Generate ground truth data](#generate-ground-truth-data)
* [Run bulk evaluation](#run-bulk-evaluation)
* [Review the evaluation results](#review-the-evaluation-results)
* [Run bulk evaluation on a PR](#run-bulk-evaluation-on-a-pr)

## Deploy a GPT-4 model


Expand Down Expand Up @@ -45,7 +52,7 @@ python evals/generate_ground_truth_data.py

Review the generated data after running that script, removing any question/answer pairs that don't seem like realistic user input.
## Evaluate the RAG answer quality
## Run bulk evaluation
Review the configuration in `evals/eval_config.json` to ensure that everything is correctly setup. You may want to adjust the metrics used. See [the ai-rag-chat-evaluator README](https://github.com/Azure-Samples/ai-rag-chat-evaluator) for more information on the available metrics.
Expand All @@ -72,6 +79,6 @@ Compare answers across runs by running the following command:
python -m evaltools diff evals/results/baseline/
```
## Run the evaluation on a PR
## Run bulk evaluation on a PR
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR.

0 comments on commit 4d0e801

Please sign in to comment.