This is a pipeline to run different LLM evaluators from DeepEval on a set of customer service enquiries to see how well a RAG pipeline performs on them.
The model used as the judge LLM is Azure OpenAI GPT4o and the RAG pipeline to generate answers is from: https://github.com/yjching/Banking-RAG-Chatbot-Demo