Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get full parse tree without consolidations #180

Open
JoelMahon opened this issue Apr 2, 2021 · 1 comment
Open

Can't get full parse tree without consolidations #180

JoelMahon opened this issue Apr 2, 2021 · 1 comment

Comments

@JoelMahon
Copy link

JoelMahon commented Apr 2, 2021

I'd like to parse with a flag (or something of that nature) that results in a FULL parse tree with no consolidations.

By consolidation I mean what occurs in this example:

default_rule = foo
foo = bar
bar = "fizz"

If you parse the string "fizz" with a grammar formed from this PEG your node tree will not contain a single foo or default_rule node as far as I can tell.

The output is this (if I got it right):

<Node matching "fizz">
    <Node called "bar" matching "fizz">

There are also possibly more nodes being missed but I'm less desperate to access them (but I think there should be a flag for them too, either a separate one or included as part of the previously mentioned flag).

foo could have important semantic meaning that is lost, or a visit_foo and this will mean it won't get called (this is the case for my program where I want to highlight all foos with a certain colour but not bars except indirectly when in foos).

I attempted to find where the code does this consolidation but the closest I could find was Node_Visitor.lift_child but overriding that seemed to have no effect and I couldn't see it being used anywhere.


A work around is this:

default_rule = foo ""
foo = bar ""
bar = "fizz"

Parsing fizz we get:

<Node matching "fizz">
    <Node called "default_rule" matching "fizz">
        <Node called "foo" matching "fizz">
            <Node called "bar" matching "fizz">
            <Node matching "">
        <Node matching "">
        <Node matching "">

I get the nodes I want, but unfortunately get some useless ones as well.

@createyourpersonalaccount

The docstring of Grammar mentions:

* It does all kinds of whizzy space- and time-saving optimizations, like
factoring up repeated subexpressions into a single object, which should
increase cache hit ratio. [Is this implemented yet?]

I can't spot the exact place where that optimization takes place either. As noted in the docstring,

You could also just construct a bunch of ``Expression`` objects yourself
and stitch them together into a language, but using a ``Grammar`` has some
important advantages:

which means you can write your own parser and solve this issue. However, there's a little hack to get exactly what you want with no work:

>>> g = Grammar(
... r"""
... foo = bar / tag_this
... bar = "fizz"
... tag_this = !"" ""   # Never matches, useful for ensuring rule shows up in tree
... """
... )
>>> print(g.parse("fizz"))
<Node called "foo" matching "fizz">
    <Node called "bar" matching "fizz">

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants