Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Rust-only index construction function #9

Open
4 tasks
Tracked by #11
brandonwillard opened this issue Aug 21, 2024 · 0 comments · May be fixed by #125
Open
4 tasks
Tracked by #11

Create Rust-only index construction function #9

brandonwillard opened this issue Aug 21, 2024 · 0 comments · May be fixed by #125
Assignees
Labels
enhancement New feature or request TGI Related to the integration with `text-generation-inference`

Comments

@brandonwillard
Copy link
Member

brandonwillard commented Aug 21, 2024

We need a pure Rust function—call it build_regex_index—that only takes a regex string and a vocabulary or tokenizer object and (possibly) returns a HashMap version of the current Python index.

This is a step towards #1.

There are few things that need to be converted in order to do this:

  • Determine the exact form of the return value.
    What works best for returning a Python object and future Rust-only use (i.e. Rust at run-time)?
  • Determine if a vocabulary or tokenizer is better as input.
    Do we need to allow both?
    We eventually want tighter integration with tokenizers; should we start there? Also, could we get a Rust-based tokenizer from a Python-created transformers tokenizer object (that uses tokenizers under the hood, of course)?
  • Handle regex-to-FSM conversion in Rust #10
    Start by finding a suitable Rust crate for this functionality (e.g. regex-automata).
  • Convert reduced_vocabulary.
@brandonwillard brandonwillard added enhancement New feature or request help wanted Extra attention is needed labels Aug 21, 2024
@brandonwillard brandonwillard self-assigned this Aug 21, 2024
@brandonwillard brandonwillard added the TGI Related to the integration with `text-generation-inference` label Aug 21, 2024
@torymur torymur linked a pull request Dec 20, 2024 that will close this issue
12 tasks
@torymur torymur removed the help wanted Extra attention is needed label Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request TGI Related to the integration with `text-generation-inference`
Projects
None yet
3 participants