Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[98] Check parenting chatbot evaluation results #99

Merged
merged 4 commits into from
Oct 13, 2023
Merged

Conversation

beingkk
Copy link
Contributor

@beingkk beingkk commented Oct 10, 2023

Closes #98

Small PR to check results from our evaluation experiment.

All the code is in a notebook - I think it's fine for now, don't need to refactor it further.

To view the notebook properly on GitHub, you can enable Rich notebook diffs, by clicking on your picture and selecting "Feature previews", and clicking "Enable"

Screenshot 2023-10-12 at 12 00 24 Screenshot 2023-10-12 at 12 00 33

@beingkk beingkk changed the title [98] Check parenting chatbot evaluation [98] Check parenting chatbot evaluation results Oct 12, 2023
@beingkk beingkk requested a review from RFOxbury October 12, 2023 11:07
@beingkk
Copy link
Contributor Author

beingkk commented Oct 12, 2023

Hi @RFOxbury, hope all is well! Whenever you have time, would be great if you can sense-check that I've not made any mistakes when summarising the evaluation results.

There's no rush - end of this week or early next week will be fine.

Thank you!

Copy link
Contributor

@RFOxbury RFOxbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through and looked most closely at get_win_matrix() and get_question_matrix(). I cannot find holes in the logic so I think it's all good!

@beingkk beingkk merged commit ae9c6cd into dev Oct 13, 2023
1 check passed
@beingkk beingkk deleted the 98_check_evals branch October 13, 2023 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check evaluation results
2 participants