Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move TransformersTokenizer back to the Python package #81

Closed
rlouf opened this issue Oct 23, 2024 · 2 comments
Closed

Move TransformersTokenizer back to the Python package #81

rlouf opened this issue Oct 23, 2024 · 2 comments
Assignees

Comments

@rlouf
Copy link
Member

rlouf commented Oct 23, 2024

After #52, outlines-core no longer has tokenizer support, aside from the two copies of TransformerTokenizer in the test and benchmark code. What's the plan wrt. this?

If the plan is to use adapt_tokenizer to patch transformers tokenizers, it's not clear how that's an improvement over a custom tokenizer wrapper classes and a conditional transformers dependency, for example. In general, we could move TransformerTokenizer back to outlines-core and make transformers optional, then outlines-core will be usable with llama-based tokenizers and we won't need two copies for testing.

Originally posted by @brandonwillard in #2 (comment)

@rlouf
Copy link
Member Author

rlouf commented Oct 23, 2024

The clean solution is actually to use the tokenizers crate to remove the dependency on transformers in the Python package. In the meantime, it is unreasonable to ask downstream libraries to implement their own version of adapt_tokenizer since this is always required to use the package.

@rlouf
Copy link
Member Author

rlouf commented Nov 18, 2024

This is made irrelevant by #91

@rlouf rlouf closed this as completed Nov 18, 2024
@rlouf rlouf closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants