Skip to content

Releases: gbenson/dom-tokenizers

0.0.17

19 Jun 20:01
0.0.17
Compare
Choose a tag to compare
Handle multiply-escaped input

0.0.16: - Handle more apostrophe surrogates

10 Jun 22:40
0.0.16
Compare
Choose a tag to compare

0.0.15

07 Jun 22:29
0.0.15
Compare
Choose a tag to compare
Consolidation

0.0.13

29 May 21:18
0.0.13
Compare
Choose a tag to compare
Don't lowercase special tokens

0.0.12: - Change tokenization to be more like HTML

25 May 00:20
0.0.12
Compare
Choose a tag to compare
- Switch back to uncased base model

0.0.11

23 May 23:54
0.0.11
Compare
Choose a tag to compare
Major refactor

0.0.10

21 May 21:50
0.0.10
Compare
Choose a tag to compare
Introduce `DOMSnapshotPreTokenizer.hook_into()`

0.0.9

21 May 00:10
0.0.9
Compare
Choose a tag to compare
Transliterate non-ASCII input texts

0.0.8

20 May 00:20
0.0.8
Compare
Choose a tag to compare
Tokenizer comparison script

0.0.7

19 May 14:34
0.0.7
Compare
Choose a tag to compare
Whole dataset tokenizer, for comparisons