Skip to content

v1.2.0

Compare
Choose a tag to compare
@adrianeboyd adrianeboyd released this 13 Jan 07:46
· 51 commits to master since this release
8b587a4
  • For fast tokenizers, use the offset mapping provided by the tokenizer (#338).

    Using the offset mapping instead of the heuristic alignment from spacy-alignments resolves unexpected and missing alignments such as those discussed in explosion/spaCy#6563, explosion/spaCy#10794 and explosion/spaCy#12023.

    ⚠️ Slow and fast tokenizers will no longer give identical results due to potential differences in the alignments between transformer tokens and spaCy tokens. We recommend retraining all models with fast tokenizers for use with spacy-transformers v1.2.

  • Serialize the tokenizer use_fast setting (#339).