Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Rule::re_str public again. #477

Open
pmfirestone opened this issue Dec 19, 2024 · 5 comments
Open

Make Rule::re_str public again. #477

pmfirestone opened this issue Dec 19, 2024 · 5 comments

Comments

@pmfirestone
Copy link

In commit c1992b, @ltratt depubbed a number of functions and fields. The commit message includes the following justification:

grmtools intentionally exposes some details that no-one currently uses, but they might one day

I think that day has arrived. I am implementing an incremental parsing algorithm, from this paper, page 34. This is part of my project to reimplement the existing Python version of the Syncode algorithm in Rust. @shubhamugare is the boss of the project, but I'm working on the Rust version.

In order to integrate cleanly with other logic in the implementation so far, I need to be able to convert from a Lexeme struct back to a regular expression (this happens in lines 17, 18, and 21 of Algorithm 4 of the linked paper): I then do some crunching on these regexes by turning them into DFAs and advancing one state at a time; this requires the regex representations of the lexemes in the input (cf. page 10: the algorithm simply assumes that this transformation is trivially possible, and in the current Python implementation, it is).

I worry that this might be an XY problem: I perhaps I could manually track which regexes go with which TIdxs. However, it seems to me that making the re_str field of the Rule struct pub again (instead of pub(super) as it is now) would solve the problem for me much more gracefully than introducing such logic into my program. I will already have to use LexerDef::set_rule_ids to synchronize the rules' ids between the parser and the lexer, but this still doesn't allow me direct access to the underlying regexes. I also don't believe this can be gotten out of the cfgrammar crate, but maybe an accessor function can be added to the YaccGrammar struct to match token_epp, token_name, and token_precedence.

This is a somewhat unusual use case, I admit, but I believe that the particular application we are developing justifies considering the change. On the other hand, do you have any alternative suggestions for getting the regex back from a TIdx? It's very possible I've overlooked something! Thanks for the hard work and have a great day 😃.

@ratmice
Copy link
Collaborator

ratmice commented Dec 19, 2024

I've also run into some issues with this Rule struct and it's private fields, in my case, I believe I was trying to replace the re: Regex type with re: regex_cursor::Regex from regex_cursor

However that project is quite the rabbit hole because I wanted to implement the lexer in terms of overlapping matches https://docs.rs/regex-automata/latest/regex_automata/#find-all-overlapping-matches which itself requires a ton of work in the regex-cursor library. (As such I've never actually gotten to the grmtools changes required).

But from that perspective, I cannot help but wonder whether something like making Rule a trait, with it's Regex type becoming some kind of a generic associated type. Anyhow, no grmtools changes here currently would make my project any more or less actionable, just chiming in that I've also had some issues with the opaqueness/privacy of this particular struct too.

@ratmice
Copy link
Collaborator

ratmice commented Dec 20, 2024

@pmfirestone It occurred to me to ask, do you really need the field to be pub for both read and write access, or would a function that returns the re_str field read-only suffice? I haven't looked through the complete algorithm linked in the paper, but from the description you gave it seemed like you might not need to set the re_str, and maybe something less than making it fully pub might be sufficient.

@pmfirestone
Copy link
Author

@ratmice Good question! I only need to read the field, not write it.

@ltratt
Copy link
Member

ltratt commented Dec 20, 2024

In retrospect, probably all the fields in lexer::Rule should be private but with public read-only accessors. I'm not sure how many people rely on those public fields now, though... Still, adding a read-only accessor for re_str and tok_id can't hurt AFAICS. Any objections?

@ratmice
Copy link
Collaborator

ratmice commented Dec 20, 2024

No objections on my behalf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants