LLMEvaluator that evaluates model's output with LLM #268

deep-diver · 2023-12-07T02:43:01Z

This is a custom TFX component project idea.
hope to get some feedbacks from (@rcrowe-google , @hanneshapke , @sayakpaul , @casassg)

Temporary Name of the component: LLMEvaluator

Behaviour
: LLMEvaluator evaluates trained model's performance via designated LLM service (i.e. PaLM, Gemini, ChatGPT, ...) by comparing the outputs of the model and the labels provided from ExampleGen.
: LLMEvaluator takes a parameter instruction which let you specify the prompt to the model. Since each LLM service could not interpret the same prompt in the same way, and it should be differentiated from task to task.

Why
: It is common sense to leverage LLM service to evaluate the model these days (especially when we fine-tune one of the open source LLM such as LLaMA).

The text was updated successfully, but these errors were encountered:

hanneshapke · 2023-12-08T16:41:37Z

@deep-diver Great component idea. How will you handle the different prompts for optimal performance?
Do you have code you could share?

rcrowe-google · 2023-12-11T23:41:38Z

Could this be used for HELM? https://crfm.stanford.edu/helm/latest/

rcrowe-google added the Project: Idea label Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMEvaluator that evaluates model's output with LLM #268

LLMEvaluator that evaluates model's output with LLM #268

deep-diver commented Dec 7, 2023

hanneshapke commented Dec 8, 2023

rcrowe-google commented Dec 11, 2023

LLMEvaluator that evaluates model's output with LLM #268

LLMEvaluator that evaluates model's output with LLM #268

Comments

deep-diver commented Dec 7, 2023

hanneshapke commented Dec 8, 2023

rcrowe-google commented Dec 11, 2023