Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test prompt change and temperature change #106

Closed
wants to merge 4 commits into from
Closed

Conversation

pamelafox
Copy link
Contributor

Purpose

Test evaluation

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[ ] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff manually on my code.

@pamelafox
Copy link
Contributor Author

/evaluate

Copy link

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

Copy link

metric stat baseline pr106
gpt_groundedness mean_rating 5.0 5.0
pass_rate 1.0 1.0
gpt_relevance mean_rating 5.0 5.0
pass_rate 1.0 1.0
answer_length mean 978.9 991.5
latency mean 2.51 2.05
citation_match rate 1.0 1.0
num_questions total 10 10

@pamelafox
Copy link
Contributor Author

/evaluate

Copy link

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

Copy link

metric stat baseline pr106
gpt_groundedness mean_rating 5.0 5.0
pass_rate 1.0 1.0
gpt_relevance mean_rating 5.0 5.0
pass_rate 1.0 1.0
answer_length mean 978.9 1051.1
latency mean 2.51 2.18
citation_match rate 1.0 1.0
num_questions total 10 10

@pamelafox
Copy link
Contributor Author

/evaluate

Copy link

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

Copy link

metric stat baseline pr106
gpt_groundedness mean_rating 5.0 5.0
pass_rate 1.0 1.0
gpt_relevance mean_rating 5.0 5.0
pass_rate 1.0 1.0
answer_length mean 978.9 984.9
latency mean 2.51 2.13
citation_match rate 1.0 1.0
num_questions total 10 10

@pamelafox pamelafox closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant