2024.3.0.0
What's Changed
Improvements
- Fix Tokenization of Special Tokens in Sentencepiece by @apaniukov in #173
- Add Left Padding and Padding to Max Length by @apaniukov in #152
- Sentencepiece Tokenization Improvements by @apaniukov in #176
- BPE Fallback for Sentencepiece by @apaniukov in #181
- Update Sentencepiece Parsing by @apaniukov in #185
- Fix Decoding For Long Tokens by @apaniukov in #187
- Sentencepiece Left Padding by @apaniukov in #186
- Update Remaining Inputs Detection During Model Connection by @apaniukov in #190
- Update rt_info by @apaniukov in #191
- Truncate Left Side When Left Padding Is Used by @apaniukov in #192
- Add Separate Special Token Handling To Sentencepiece by @apaniukov in #198
- Support GLM-4 Tokenizer by @apaniukov in #202
- use PCRE2 fallback for RegexNormalization @pavel-esir in #203
Changes
Build, Packaging and CI
- Package into correct dirs by @Wovchena in #148
- Set cmake policies by @ilya-lavrenov in #157
- Fix usage of protobuf_MODULE_COMPATIBLE by @ilya-lavrenov in #158
- Build release by default (#162) by @ilya-lavrenov in #163
- Fixed conda-forge on Windows (#164) by @ilya-lavrenov in #165
- Package into correct dirs (#155) by @Wovchena in #167
- New python build scheme by @ilya-lavrenov in #166
- Support build from OpenVINO wheel only by @mryzhov in #178
- Configure cmake similar to GenAI (#175) by @Wovchena in #180
- Patch icu external project by @mryzhov in #184
- [CI] Build from OV wheel by @mryzhov in #183
- [GHA] Set permissions read-all by @mryzhov in #189
- [CI] Fixed Jenkins artifacts by @mryzhov in #195
- [MERGE] reduced icudt.dll (#196) by @mryzhov in #201
Full Changelog: 2024.2.0.0...2024.3.0.0