A short program that performs a basic RAG evaluation
This model generation model. Accurate retrieval is critical for generating contextually relevant responses.
- Retrieval Evaluation:
- Focuses on evaluating the relevance of retrieved documents.
- Metrics: Precision@k, Recall@k, Mean Reciprocal Rank (MRR), etc.
- Generation Evaluation:
- Measures the quality of generated text based on retrieved documents.
- Metrics: BLEU, ROUGE, BERTScore, etc.
- End-to-End Evaluation:
- Directly evaluates the RAG system's output for correctness and relevance.
- Metrics: Human evaluation or task-specific benchmarks.
The evaluation here focuses on retrieval because the quality of retrieved documents is foundational to the RAG framework's success. Precision@1 and MRR are used as metrics for this evaluation.