v1.2.0
-
For fast tokenizers, use the offset mapping provided by the tokenizer (#338).
Using the offset mapping instead of the heuristic alignment from
spacy-alignments
resolves unexpected and missing alignments such as those discussed in explosion/spaCy#6563, explosion/spaCy#10794 and explosion/spaCy#12023.⚠️ Slow and fast tokenizers will no longer give identical results due to potential differences in the alignments between transformer tokens and spaCy tokens. We recommend retraining all models with fast tokenizers for use withspacy-transformers
v1.2. -
Serialize the tokenizer
use_fast
setting (#339).