You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a pure Rust function—call it build_regex_index—that only takes a regex string and a vocabulary or tokenizer object and (possibly) returns a HashMap version of the current Python index.
There are few things that need to be converted in order to do this:
Determine the exact form of the return value.
What works best for returning a Python object and future Rust-only use (i.e. Rust at run-time)?
Determine if a vocabulary or tokenizer is better as input.
Do we need to allow both?
We eventually want tighter integration with tokenizers; should we start there? Also, could we get a Rust-based tokenizer from a Python-created transformers tokenizer object (that uses tokenizers under the hood, of course)?
We need a pure Rust function—call it
build_regex_index
—that only takes a regex string and a vocabulary or tokenizer object and (possibly) returns aHashMap
version of the current Python index.This is a step towards #1.
There are few things that need to be converted in order to do this:
What works best for returning a Python object and future Rust-only use (i.e. Rust at run-time)?
Do we need to allow both?
We eventually want tighter integration with
tokenizers
; should we start there? Also, could we get a Rust-based tokenizer from a Python-createdtransformers
tokenizer object (that usestokenizers
under the hood, of course)?Start by finding a suitable Rust crate for this functionality (e.g.
regex-automata
).reduced_vocabulary
.The text was updated successfully, but these errors were encountered: