-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle regex-to-FSM conversion in Rust #10
Comments
Prior to outlines-core I started hacking together a port of interegular that I've started to add here #8, additionally I have a portion of the regex to fsm parsing written that I can refactor and add to that branch as well. These changes would enable parsing to regex and returning the fsm from Not sure if this is the best approach (especially long term) however it may be a feasible path to a working mvp, maybe we can assess |
Our main concern is the expansion in scope and costs that come with ports of dependencies. A Rust port of In other words, since the overhead for porting a dependency like this is relatively high, we're safer prioritizing attempts at using existing crates so that we can justify a port sooner. We could incrementally move forward—mostly in parallel—with the conversion of For example, we could almost certainly convert N.B. |
Looking into |
Just updating...
|
We need to perform the conversion of a regex string into a usable FSM in Rust. This involves all the functionality covered by
interegular
and the functions inoutlines_core.fsm.regex
.Regarding the
interegular
functionality, let's start by finding a suitable Rust crate replacement.regex-automata
appears to be the most promising so far.To determine suitability, we need to
outlines
are supported, andIf not, we'll need to port
make_deterministic_fsm
.If we can't find a suitable existing crate, then we can write our own implementations/extensions, but ideally with a limited scope (e.g. use an existing crate for FSMs, but handle parts of the regex-to-FSM conversion ourselves).
As mentioned above, we also need to port the functions in
outlines_core.fsm.regex
. Many of those functions revolve around the conversion of an existing string-based FSM into one that transitions on unicode bytes. Depending on the crate we use, some of those functions may be unnecessary (e.g. see the comment about deterministic labels above).Also, there's at least one interface-related question:
outlines.generate.fsm
?This module offers dispatch functions that take
interegular.fsm.FSM
objects. Can they be changed to take regex strings instead?@rlouf @lapp0
The text was updated successfully, but these errors were encountered: