What's Changed
New and Reimplemented Operations
- Add Skip Tokens Node by @apaniukov in #264
- Optimize CombineSegments by @pavel-esir in #265
- Add Charsmap Operation by @apaniukov in #267
- Improve BPE by @pavel-esir in #281
- Reimplement WordPiece tokenization by @pavel-esir in #298
Improvements and Compatibility
- Store tokenizer conversion params in rt_info / refactor passing params by @pavel-esir in #268
- add packages versions to rt_info by @pavel-esir in #292
- Fix GLM4 Tokenization by @apaniukov in #280
Build Changes
- Dynamic linking with msvc runtime by @mryzhov in #260
- Linking with sentencepiece_train by @mryzhov in #272
Full Changelog: 2024.4.1.0...2024.5.0.0