Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLMEvaluator that evaluates model's output with LLM #268

Open
deep-diver opened this issue Dec 7, 2023 · 2 comments
Open

LLMEvaluator that evaluates model's output with LLM #268

deep-diver opened this issue Dec 7, 2023 · 2 comments

Comments

@deep-diver
Copy link
Contributor

This is a custom TFX component project idea.
hope to get some feedbacks from (@rcrowe-google , @hanneshapke , @sayakpaul , @casassg)

Temporary Name of the component: LLMEvaluator

Behaviour
: LLMEvaluator evaluates trained model's performance via designated LLM service (i.e. PaLM, Gemini, ChatGPT, ...) by comparing the outputs of the model and the labels provided from ExampleGen.
: LLMEvaluator takes a parameter instruction which let you specify the prompt to the model. Since each LLM service could not interpret the same prompt in the same way, and it should be differentiated from task to task.

Why
: It is common sense to leverage LLM service to evaluate the model these days (especially when we fine-tune one of the open source LLM such as LLaMA).

@hanneshapke
Copy link
Contributor

@deep-diver Great component idea. How will you handle the different prompts for optimal performance?
Do you have code you could share?

@rcrowe-google
Copy link
Collaborator

Could this be used for HELM? https://crfm.stanford.edu/helm/latest/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants