Make `Rule::re_str` public again. #477

pmfirestone · 2024-12-19T15:41:14Z

In commit c1992b, @ltratt depubbed a number of functions and fields. The commit message includes the following justification:

grmtools intentionally exposes some details that no-one currently uses, but they might one day

I think that day has arrived. I am implementing an incremental parsing algorithm, from this paper, page 34. This is part of my project to reimplement the existing Python version of the Syncode algorithm in Rust. @shubhamugare is the boss of the project, but I'm working on the Rust version.

In order to integrate cleanly with other logic in the implementation so far, I need to be able to convert from a Lexeme struct back to a regular expression (this happens in lines 17, 18, and 21 of Algorithm 4 of the linked paper): I then do some crunching on these regexes by turning them into DFAs and advancing one state at a time; this requires the regex representations of the lexemes in the input (cf. page 10: the algorithm simply assumes that this transformation is trivially possible, and in the current Python implementation, it is).

I worry that this might be an XY problem: I perhaps I could manually track which regexes go with which TIdxs. However, it seems to me that making the re_str field of the Rule struct pub again (instead of pub(super) as it is now) would solve the problem for me much more gracefully than introducing such logic into my program. I will already have to use LexerDef::set_rule_ids to synchronize the rules' ids between the parser and the lexer, but this still doesn't allow me direct access to the underlying regexes. I also don't believe this can be gotten out of the cfgrammar crate, but maybe an accessor function can be added to the YaccGrammar struct to match token_epp, token_name, and token_precedence.

This is a somewhat unusual use case, I admit, but I believe that the particular application we are developing justifies considering the change. On the other hand, do you have any alternative suggestions for getting the regex back from a TIdx? It's very possible I've overlooked something! Thanks for the hard work and have a great day 😃.

The text was updated successfully, but these errors were encountered:

ratmice · 2024-12-19T20:48:07Z

I've also run into some issues with this Rule struct and it's private fields, in my case, I believe I was trying to replace the re: Regex type with re: regex_cursor::Regex from regex_cursor

However that project is quite the rabbit hole because I wanted to implement the lexer in terms of overlapping matches https://docs.rs/regex-automata/latest/regex_automata/#find-all-overlapping-matches which itself requires a ton of work in the regex-cursor library. (As such I've never actually gotten to the grmtools changes required).

But from that perspective, I cannot help but wonder whether something like making Rule a trait, with it's Regex type becoming some kind of a generic associated type. Anyhow, no grmtools changes here currently would make my project any more or less actionable, just chiming in that I've also had some issues with the opaqueness/privacy of this particular struct too.

ratmice · 2024-12-20T01:53:20Z

@pmfirestone It occurred to me to ask, do you really need the field to be pub for both read and write access, or would a function that returns the re_str field read-only suffice? I haven't looked through the complete algorithm linked in the paper, but from the description you gave it seemed like you might not need to set the re_str, and maybe something less than making it fully pub might be sufficient.

pmfirestone · 2024-12-20T04:51:58Z

@ratmice Good question! I only need to read the field, not write it.

ltratt · 2024-12-20T07:54:07Z

In retrospect, probably all the fields in lexer::Rule should be private but with public read-only accessors. I'm not sure how many people rely on those public fields now, though... Still, adding a read-only accessor for re_str and tok_id can't hurt AFAICS. Any objections?

ratmice · 2024-12-20T08:05:48Z

No objections on my behalf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `Rule::re_str` public again. #477

Make `Rule::re_str` public again. #477

pmfirestone commented Dec 19, 2024

ratmice commented Dec 19, 2024

ratmice commented Dec 20, 2024

pmfirestone commented Dec 20, 2024

ltratt commented Dec 20, 2024

ratmice commented Dec 20, 2024

Make Rule::re_str public again. #477

Make Rule::re_str public again. #477

Comments

pmfirestone commented Dec 19, 2024

ratmice commented Dec 19, 2024

ratmice commented Dec 20, 2024

pmfirestone commented Dec 20, 2024

ltratt commented Dec 20, 2024

ratmice commented Dec 20, 2024

Make `Rule::re_str` public again. #477

Make `Rule::re_str` public again. #477