Evaluating my Haystack QA system returns the answer column most with an empty string for most entires in the testing dataset entries #3371

MuatazMohammed · 2022-10-12T18:05:33Z

MuatazMohammed
Oct 12, 2022

evaluating the Haystack QA with my testing dataset via pipeline.eval(params) returning very low F1 and EM scores.
What is wrong?

for knowledge i created a new documentstore with my testing data to evaluate. is this true?

julian-risch · 2022-10-13T09:03:01Z

julian-risch
Oct 13, 2022
Maintainer

Hi @MuatazMohammed great to hear that you are trying out Haystack with your custom dataset. The values for F1 and EM could be low for many different reasons and we would need a bit more information from you to help you out. 🙂
If you haven't seen our tutorial about evaluation, I would recommend to go through it: https://github.com/deepset-ai/haystack-tutorials/blob/main/tutorials/05_Evaluation.ipynb
For example, the tutorials shows how to generate an evaluation report and it would be good if you could post the report here. You can search in the tutorial for pipeline.print_eval_report(saved_eval_result)

Maybe you could also post a small example from your dataset? You don't need to share the full dataset. One example would be enough. How large is your dataset by the way? How many queries are in your dataset?

1 reply

MuatazMohammed Oct 15, 2022
Author

Firstly, Thanks @julian-risch for replying,
i have about 6k english articles and about 7.5k Q&A annotated using haysack annotation tool. This is about the training dataset.
Although, i have about 1.5k articles and 2k Q&A testing dataset.

My QA model is retriever-reader-based models [Note: used the pre-trained DPR and fine-tuned the distilBERT-base].
After finishing the training and when tried to test it with random question, it returned result approximately correct.
When i made an eval. using the tutorial 5 with my test dataset. the reader report was returned empty cells for the answers in the CSV file.
[Note: I used the pipeline of the training but with creating a new document store using my testing dataset for the evaluation.]
here a snapshots from the report.

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating my Haystack QA system returns the answer column most with an empty string for most entires in the testing dataset entries #3371

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Evaluating my Haystack QA system returns the answer column most with an empty string for most entires in the testing dataset entries #3371

MuatazMohammed Oct 12, 2022

Replies: 1 comment · 1 reply

julian-risch Oct 13, 2022 Maintainer

MuatazMohammed Oct 15, 2022 Author

MuatazMohammed
Oct 12, 2022

Replies: 1 comment 1 reply

julian-risch
Oct 13, 2022
Maintainer

MuatazMohammed Oct 15, 2022
Author