Evaluating my Haystack QA system returns the answer column most with an empty string for most entires in the testing dataset entries #3371
Replies: 1 comment 1 reply
-
Hi @MuatazMohammed great to hear that you are trying out Haystack with your custom dataset. The values for F1 and EM could be low for many different reasons and we would need a bit more information from you to help you out. 🙂 Maybe you could also post a small example from your dataset? You don't need to share the full dataset. One example would be enough. How large is your dataset by the way? How many queries are in your dataset? |
Beta Was this translation helpful? Give feedback.
-
evaluating the Haystack QA with my testing dataset via pipeline.eval(params) returning very low F1 and EM scores.
What is wrong?
for knowledge i created a new documentstore with my testing data to evaluate. is this true?
Beta Was this translation helpful? Give feedback.
All reactions