-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't finish or takes forever with postgres16 grammar #473
Comments
I haven't had a chance to actually look through the grammar yet, but this could perhaps be another occurrence of #290 |
Had a little look at it now I don't think this is another occurrence of that issue I had mentioned above. Here at least it eventually finishes due to the shift/reduce conflicts nimbleparse eventually errors (after about 2 min 10 s) printing the statetable. The state table is pretty big at about 500k lines. However if I panic before the state table is printed Edit: So at least from what I can observe this is more "takes forever" than a case of it never actually finishing. |
It wouldn't surprise me if there's some low-hanging optimisation fruit here: I didn't put much effort into optimising grammar generation time (I did put some effort into reducing the size of the resulting tables though). |
Indeed I haven't looked at it much myself either, as it has never really been a problem for me... A lot of the time is being taken up by It might be that there could be some form of assert or unsafe that could hoist the bounds checking in that function up out of the loop if the optimizer isn't doing so already, such as |
@ratmice Thanks for the analysis! I had a quick look at sparsevec and although I don't remember the algorithm at all (I think @ptersilie implemented this bit), one simple optimisation opportunity which I could do in 20 minutes jumped out at me which I've done in softdevteam/sparsevec#25. On my machine this speeds this example up by over 30x. Warning: only lightly tested! |
A sidenote. Before and after the sparsevec change I notice that nimbleparse borks with:
I think that's a (separate) pretty-printing error that might not be too difficult to fix? |
Yeah, I saw that too, but I have yet to investigate, or form any hypothesis other than maybe a wild guess that somehow a conflict could be occurring with the implicit start rule and thus it doesn't have any span info? But even in that case I would expect it to be a pretty simple fix of removing the assert and doing some special case printing. But other than that I don't have any idea how we could get a production without span. |
Reopening as I believe we still need to bump version numbers etc, before this will take effect in grmtools. |
Because this is a library, if a user runs |
I just did |
Ah, sorry, I wasn't clear: I need to make a |
@mingodad sparsevec is now updated, so hopefully you'll see the speed up! |
Hello @ltratt thank you !
|
Glad to see the performance has improved :) We still need to fix that |
Definitely something weird going on with the span parsing with this file, I believe that the assert is valid, in the sense that the rule on 4840 does have productions, but the list of spans for those productions is empty, so if we e.g. remove the assert it will just fail to print the production in conflict within the rule. In this case though, there is nothing, I would chalk it up to something going wrong in the grammar parsing. The other weirdness I notice is this initial warning, which underlines
Edit: another thing to note is that |
@ratmice I don't suppose you've had any luck with this yet? |
Nope, I spent some time trying to minimize it, but so far none of my attempts have managed to reproduce the problem |
I wonder if the problem is in the yacc parser: maybe fuzzing grammar inputs might make the problem pop out? |
That might be worth a shot, I think it will need to make some custom code that triggers the same assertion with that input I think, because the assertion is in nimbleparse/binary code rather than lrpar/library code. But hopefully that would also be an opportunity to fuzz faster. I don't know when I'll be able to work on that, it being a holiday over here. |
So using The following
The actual fuzzing testcase used is below, one will note the testcase uses a modified version of
I'll have to leave it for another day to try and actually look into what is going on here, but it seems we have a minimal reproducer now. |
That's very cool -- hopefully the fix will pop out to you now! |
I had a little bit of a look at it now, adding a This only became a problem when we added the fancy diagnostic printing code though... I think the right thing to do might be to add a variant |
The only thing that makes me slightly unsure if A possible alternative (which I haven't fully thought through) is to give each production (empty or not) a |
I added PR #476 which tries the It is also worth mentioning that there was no equivalent added for Thoughts about alternative proposal: I'm a little bit hesitant simply because e.g. from the |
While testing this project with a fresh clone and build with a postgres16 grammar the
nimbleparse
takes forever (I've killed it) see attached grammar/lexer/input.This is the output of bison :
Notice that the 8 shift/reduce conflicts are due to the changes in the grammar to be accepted by
nimbleparse
that doesn't accept empty rules with%prec
that are used in the postgres16 grammar:You can also see how https://github.com/BenHanson/parsertl14 handle it and several others with wasm in the browser here https://mingodad.github.io/parsertl-playground/playground/ (select
Postgresql parser (be patient)
fromExamples
then clickParse
to see a parse tree for the content inInput source
).postgres16.zip
The text was updated successfully, but these errors were encountered: