Add AI Evaluation tests for eShopSupport. #49

peterwald · 2024-11-18T20:18:10Z

Add EvaluationTests project that demonstrates how to use Microsoft.Extensions.AI.Evaluation to run evaluations on LLM prompts and responses.

Testing updates and cleanup. Latest updates Update library Update evaluations. Add JSON cleanup Add different kinds of evaluation tests, defer to a eval handler Fix writing of scenario results Update to latest Evaluation library API Update Evaluation API Simplify the API Update packages to public feed Add required data dep to Tokenizers Add check for diagnostic errors Revert changes to run against AzureOpenAI More fixes Fix using

shyamnamboodiripad · 2024-11-19T22:23:19Z

@peterwald Should we add a README.md file under the test/EvaluationTests folder that contains instructions for running the tests / viewing the report?

test/EvaluationTests/EvaluationTests.cs

shyamnamboodiripad · 2024-11-19T22:43:17Z

@luisquintanilla @peterwald should we also remove the src\Evaluator project now that we have EvaluationTests? This could be something to look into later on (post merge) as this could help to make the guidance on how to run evaluations less ambiguous overall.

I am guessing we don't want to do this yet since not all tests from the src\Evaluator have been ported over yet?

test/EvaluationTests/AnswerScoringEvaluator.cs

test/EvaluationTests/EvaluationTests.csproj

test/EvaluationTests/EvaluationTests.cs

luisquintanilla · 2024-11-20T04:58:49Z

@luisquintanilla @peterwald should we also remove the src\Evaluator project now that we have EvaluationTests? This could be something to look into later on (post merge) as this could help to make the guidance on how to run evaluations less ambiguous overall.

I am guessing we don't want to do this yet since not all tests from the src\Evaluator have been ported over yet?

Since we're recommending using the evaluations in the test directory, I think it's fine to remove it.

peterwald · 2024-11-20T15:47:28Z

@luisquintanilla @peterwald should we also remove the src\Evaluator project now that we have EvaluationTests? This could be something to look into later on (post merge) as this could help to make the guidance on how to run evaluations less ambiguous overall.
I am guessing we don't want to do this yet since not all tests from the src\Evaluator have been ported over yet?

Since we're recommending using the evaluations in the test directory, I think it's fine to remove it.

There is more in the Evaluator project than what is done by the tests. I'll leave it here for now to not eliminate those options.

shyamnamboodiripad · 2024-11-22T23:41:15Z

@luisquintanilla @peterwald should we also remove the src\Evaluator project now that we have EvaluationTests? This could be something to look into later on (post merge) as this could help to make the guidance on how to run evaluations less ambiguous overall.
I am guessing we don't want to do this yet since not all tests from the src\Evaluator have been ported over yet?

Since we're recommending using the evaluations in the test directory, I think it's fine to remove it.

There is more in the Evaluator project than what is done by the tests. I'll leave it here for now to not eliminate those options.

Sounds good - should we log an issue to port over the remaining functionality and remove this eventually? @luisquintanilla

luisquintanilla · 2024-11-25T13:38:04Z

Sounds good - should we log an issue to port over the remaining functionality and remove this eventually? @luisquintanilla

Yeah. That would be great.

Tracking here #50. Feel free to update / add details as needed.

luisquintanilla

LGTM. Thanks @peterwald @shyamnamboodiripad

One last question before merging. Is this using the latest version of the libraries? If not, can you bump the version and then I'll merge.

Thanks

peterwald added 2 commits November 18, 2024 15:15

Update to 0.9.5-preview package version

c135f0c

shyamnamboodiripad approved these changes Nov 19, 2024

View reviewed changes

shyamnamboodiripad requested a review from luisquintanilla November 19, 2024 22:12

timheuer reviewed Nov 19, 2024

View reviewed changes

test/EvaluationTests/EvaluationTests.cs Outdated Show resolved Hide resolved

shyamnamboodiripad reviewed Nov 19, 2024

View reviewed changes

test/EvaluationTests/EvaluationTests.cs Outdated Show resolved Hide resolved

luisquintanilla reviewed Nov 19, 2024

View reviewed changes

test/EvaluationTests/AnswerScoringEvaluator.cs Outdated Show resolved Hide resolved

test/EvaluationTests/EvaluationTests.csproj Outdated Show resolved Hide resolved

test/EvaluationTests/EvaluationTests.cs Outdated Show resolved Hide resolved

peterwald added 2 commits November 20, 2024 10:38

Add appsettings for configuration of EvaluationTests.

7f02e24

Add report generation to README

3ffb99e

luisquintanilla reviewed Nov 25, 2024

View reviewed changes

peterwald added 2 commits November 25, 2024 10:33

Updates for new M.E.AI APIs

1c9e548

Don't assert no errors, allow the report to display these

c69b413

luisquintanilla merged commit 690d081 into dotnet:main Nov 25, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI Evaluation tests for eShopSupport. #49

Add AI Evaluation tests for eShopSupport. #49

peterwald commented Nov 18, 2024

shyamnamboodiripad commented Nov 19, 2024

shyamnamboodiripad commented Nov 19, 2024 •

edited

Loading

luisquintanilla commented Nov 20, 2024

peterwald commented Nov 20, 2024

shyamnamboodiripad commented Nov 22, 2024

luisquintanilla commented Nov 25, 2024 •

edited

Loading

luisquintanilla left a comment •

edited

Loading

Add AI Evaluation tests for eShopSupport. #49

Add AI Evaluation tests for eShopSupport. #49

Conversation

peterwald commented Nov 18, 2024

shyamnamboodiripad commented Nov 19, 2024

shyamnamboodiripad commented Nov 19, 2024 • edited Loading

luisquintanilla commented Nov 20, 2024

peterwald commented Nov 20, 2024

shyamnamboodiripad commented Nov 22, 2024

luisquintanilla commented Nov 25, 2024 • edited Loading

luisquintanilla left a comment • edited Loading

Choose a reason for hiding this comment

shyamnamboodiripad commented Nov 19, 2024 •

edited

Loading

luisquintanilla commented Nov 25, 2024 •

edited

Loading

luisquintanilla left a comment •

edited

Loading