Do medium and small models really have the same accuracy? #13199
-
For en_core_web_md and en_core_web_lg (3.7.1), the reported accuracy is identical by every metric (POS tagging, sentence segmentation, deps, ENTS_P, ENTS_F) except for one: ENTS_R. For the Korean models (3.7.0), everything is identical except for those related to named entities. And many other models as of the time of writing also report identical accuracy on most metrics except for those related to named entities. This is obviously good news if it means that an application that doesn't deal with named entities can downgrade to a more efficient model and expect no measurable difference in accuracy on the tagger/parser/lemmatizer. But I would like to confirm, is there in fact no measurable difference in accuracy? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The small and medium models usually don't have similar accuracy, but the medium and large models are often very similar. You can see the raw scores in And since the types of errors may not be identical for two different models even if the accuracy is the same, you probably still want to run a detailed evaluation for your own data/task. |
Beta Was this translation helpful? Give feedback.
The small and medium models usually don't have similar accuracy, but the medium and large models are often very similar. You can see the raw scores in
nlp.meta["performance"]
or for all published models in themeta/
directory of https://github.com/explosion/spacy-models. (Be aware that the reported scores are for the dev set for most resources.)And since the types of errors may not be identical for two different models even if the accuracy is the same, you probably still want to run a detailed evaluation for your own data/task.