-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Vocabulary #88
Conversation
a27d8e2
to
1446c76
Compare
2b6f87d
to
c3b4430
Compare
With the exception of adding some less important tests I have in mind and maybe further improving docs, this is ready to be pre-reviewed. There are few TODOs here in vocabulary, for which I will follow up separately, since they change already defined interface of vocabulary, so it would be better to do it separately and this PR is already massive. A couple of questions for following PR, I'm planning to change:
Any thoughts on these above of not doing it? Or things to watch out for from python/outlines or any other perspectives? @brandonwillard @umut-sahin Maybe you can help me with these? |
6610052
to
5eb350d
Compare
It's because the same token can have multiple entries in the tokenizer (e.g., in llama like tokenizers |
@umut-sahin Appreciate you taking a look here!
Yep, I understand this point from token as a String perspective, but if we'll move on to token as bytes? |
Of course 🙌
It's the same there, in that case we'll have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One or two stylistic questions, but looks good!
8565981
to
10f31fd
Compare
ERROR tests/fsm/test_regex.py - RuntimeError: Failed to import transformers.models.auto.tokenization_auto because of the following error (look up to see its traceback): Failed to import transformers.generation.utils because of the following error (look up to see its traceback): numpy.core is deprecated and has been renamed to numpy._core. The numpy._core namespace contains private NumPy internals and its use is discouraged, as NumPy internals can change without warning in any release. In practice, most real-world usage of numpy.core is to access functionality in the public NumPy API. If that is the case, use the public NumPy API. If not, you are using NumPy internals. If you would still like to access an internal attribute, use numpy._core.multiarray.
10f31fd
to
fb833ae
Compare
3971b46
to
2420742
Compare
7acc2ed
to
5e0177a
Compare
78e4fb8
to
50c4225
Compare
This PR is now ready to get a review mark ✔️ Missed lines in combined coverage were checked and can be ignored. |
50c4225
to
b85523e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost ready to be merged 🙌
@umut-sahin @rlouf All addressed, thanks for taking a look! 🙌 |
50b57ff
to
741f59c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
This is Part1 to partially address #81, closes #91.
In this PR only new logic is introduced with minimal changes to the current interface: