Skip to content

Releases: openvinotoolkit/openvino_tokenizers

2024.5.0.0

20 Nov 11:07
e30c99f
Compare
Choose a tag to compare

What's Changed

New and Reimplemented Operations

Improvements and Compatibility

Build Changes

Full Changelog: 2024.4.1.0...2024.5.0.0

2024.4.1.0

30 Sep 13:20
74186cd
Compare
Choose a tag to compare
2024.4.1.0 Pre-release
Pre-release

OpenVINO patch release.

What's Changed

Full Changelog: 2024.4.0.0...2024.4.1.0

2024.4.0.0

19 Sep 09:49
3dde884
Compare
Choose a tag to compare

What's Changed

Full Changelog: 2024.3.0.0...2024.4.0.0

2024.3.0.0

31 Jul 10:47
fb0157c
Compare
Choose a tag to compare

What's Changed

Improvements

Changes

  • Switch default skip tokens flag behavior by @slyalin in #160

Build, Packaging and CI

Full Changelog: 2024.2.0.0...2024.3.0.0

2024.2.0.0

17 Jun 13:19
c615ec5
Compare
Choose a tag to compare

What's Changed

  • Add support for left padding in Wordpiece, BPE and tiktoken-based tokenizers
  • Enhanced handling of special tokens
  • Add support for padding to a particular length
  • New option to add or not add special tokens during the tokenization
  • Support Punctuation Pretokenizer
  • Enchanse tokenizer postprocessing parser for better model coverage
  • Add StringToHashBucketFast Tensorflow Translator
  • Optimize EqualStr and VocabEncoder Operations
  • Add Benchmarking Script

Full Changelog: 2024.1.0.2...2024.2.0.0

2024.1.0.2

10 May 09:42
c754503
Compare
Choose a tag to compare

What's Changed

Full Changelog: 2024.1.0.1...2024.1.0.2

2024.1.0.1

08 May 15:59
37d20ce
Compare
Choose a tag to compare

What's Changed

  • Llama3 Tokenizer Support
  • Add not-add-special-tokens flag to CLI conversion tool

Full Changelog: 2024.1.0.0...2024.1.0.1

2024.1.0.0

25 Apr 13:04
ad37623
Compare
Choose a tag to compare

What's Changed

  • New operations:
    • TrieTokenizer
    • VocabEncoder
    • EqualStr
    • RaggedToSparse
    • RaggedToRagged
    • FuzeRagged
  • Update existing operations:
    • Add max_splits argument to RegexSplit
    • Add encoding argument to CaseFold
  • Add new and update existing TensorFlow translators for TextVectorization layer partial support.
  • RWKV tokenizer support.
  • New way to get OpenVINO Tokenizers - build from files. Supports RWKV tokenizer.
  • Update tokenizer operation caching mechanism for OpenVINO model caching support
  • SentencePiece tokenizer changes and fixes:
    • Update to 0.2.0 version
    • Use constant 0 as mask hide token by @as-suvorov in #90
    • Sentencepiece BOS Token Detection
  • Fix multi-input model merging by @yas-sim in #53

New Contributors

Full Changelog: 2024.0.0.0...2024.1.0.0

2024.0.0.0

21 Mar 14:38
aa0587d
Compare
Choose a tag to compare

What's Changed

  • Improve Regex Support - filter lookarounds, unsupported by re2
  • Improve model coverage - T5 tokenizers, QWEN2
  • Add tokenizer metadata to rt_info - EOS token id
  • Support TensorFlow Text MUSE model conversion and inference

New Contributors