-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Rule::re_str
public again.
#477
Comments
I've also run into some issues with this However that project is quite the rabbit hole because I wanted to implement the lexer in terms of overlapping matches https://docs.rs/regex-automata/latest/regex_automata/#find-all-overlapping-matches which itself requires a ton of work in the But from that perspective, I cannot help but wonder whether something like making |
@pmfirestone It occurred to me to ask, do you really need the field to be |
@ratmice Good question! I only need to read the field, not write it. |
In retrospect, probably all the fields in |
No objections on my behalf. |
In commit c1992b, @ltratt depubbed a number of functions and fields. The commit message includes the following justification:
I think that day has arrived. I am implementing an incremental parsing algorithm, from this paper, page 34. This is part of my project to reimplement the existing Python version of the Syncode algorithm in Rust. @shubhamugare is the boss of the project, but I'm working on the Rust version.
In order to integrate cleanly with other logic in the implementation so far, I need to be able to convert from a
Lexeme
struct back to a regular expression (this happens in lines 17, 18, and 21 of Algorithm 4 of the linked paper): I then do some crunching on these regexes by turning them into DFAs and advancing one state at a time; this requires the regex representations of the lexemes in the input (cf. page 10: the algorithm simply assumes that this transformation is trivially possible, and in the current Python implementation, it is).I worry that this might be an XY problem: I perhaps I could manually track which regexes go with which
TIdx
s. However, it seems to me that making there_str
field of theRule
structpub
again (instead ofpub(super)
as it is now) would solve the problem for me much more gracefully than introducing such logic into my program. I will already have to useLexerDef::set_rule_ids
to synchronize the rules' ids between the parser and the lexer, but this still doesn't allow me direct access to the underlying regexes. I also don't believe this can be gotten out of thecfgrammar
crate, but maybe an accessor function can be added to theYaccGrammar
struct to matchtoken_epp
,token_name
, andtoken_precedence
.This is a somewhat unusual use case, I admit, but I believe that the particular application we are developing justifies considering the change. On the other hand, do you have any alternative suggestions for getting the regex back from a
TIdx
? It's very possible I've overlooked something! Thanks for the hard work and have a great day 😃.The text was updated successfully, but these errors were encountered: