Move `TransformersTokenizer` back to the Python package #81

rlouf · 2024-10-23T21:48:19Z

After #52, outlines-core no longer has tokenizer support, aside from the two copies of TransformerTokenizer in the test and benchmark code. What's the plan wrt. this?

If the plan is to use adapt_tokenizer to patch transformers tokenizers, it's not clear how that's an improvement over a custom tokenizer wrapper classes and a conditional transformers dependency, for example. In general, we could move TransformerTokenizer back to outlines-core and make transformers optional, then outlines-core will be usable with llama-based tokenizers and we won't need two copies for testing.

Originally posted by @brandonwillard in #2 (comment)

The text was updated successfully, but these errors were encountered:

rlouf · 2024-10-23T21:49:56Z

The clean solution is actually to use the tokenizers crate to remove the dependency on transformers in the Python package. In the meantime, it is unreasonable to ask downstream libraries to implement their own version of adapt_tokenizer since this is always required to use the package.

rlouf · 2024-11-18T19:09:16Z

This is made irrelevant by #91

rlouf assigned torymur Oct 23, 2024

rlouf mentioned this issue Oct 23, 2024

Simplify the code in python/outlines-core #2

Closed

4 tasks

torymur mentioned this issue Nov 5, 2024

Extend Vocabulary #88

Merged

4 tasks

rlouf closed this as completed Nov 18, 2024

rlouf closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `TransformersTokenizer` back to the Python package #81

Move `TransformersTokenizer` back to the Python package #81

rlouf commented Oct 23, 2024

rlouf commented Oct 23, 2024 •

edited

Loading

rlouf commented Nov 18, 2024

Move TransformersTokenizer back to the Python package #81

Move TransformersTokenizer back to the Python package #81

Comments

rlouf commented Oct 23, 2024

rlouf commented Oct 23, 2024 • edited Loading

rlouf commented Nov 18, 2024

Move `TransformersTokenizer` back to the Python package #81

Move `TransformersTokenizer` back to the Python package #81

rlouf commented Oct 23, 2024 •

edited

Loading