Optimise by filtering out empty values. #25

ltratt · 2024-11-22T08:32:06Z

The fits function (and to a lesser extent apply) previously did vast amounts of pointless work on empty values that could not succeed. This commit does something very simple: it pre-filters out all the empty values so that fits has a lot less work to do.

On small grammars, the quantity of pointless work probably wasn't very noticeable, but on large grammars like Postgresql's, it became punishing. On my machine this commit takes nimbleparse's running time down from 101s to 3s.

Fixes softdevteam/grmtools#473 (comment). @ratmice I have only very lightly tested this, so if you're able to also verify whether this doesn't change anything (other than performance) for your use cases, I'd be very grateful!

src/lib.rs

ratmice · 2024-11-22T23:26:32Z

Cool, well while I'm not really familiar with this code, or algorithm that is the only comments I had.
It appears to me that it should be isomorphic to the original implementation, so if it improves things...

That said I'm happy to give @ptersilie time to weigh in on things, if he feels inclined?

The `fits` function (and to a lesser extent `apply`) previously did vast amounts of pointless work on empty values that could not succeed. This commit does something very simple: it pre-filters out all the empty values so that `fits` has a lot less work to do. On small grammars, the quantity of pointless work probably wasn't very noticeable, but on large grammars like Postgresql's, it became punishing. On my machine this commit takes nimbleparse's running time down from 101s to 3s.

ltratt · 2024-11-23T08:46:17Z

Just to check: this does the right thing on the grammars you care about? If so, you'll be an excellent reviewer and should feel free to merge (I've force pushed a squash).

ratmice · 2024-11-23T09:55:35Z

I've run it on a few grammars I care about, and everything seems to work.
I'd like to do some more testing such as adding a nimbleparse option to dump the state table.
Then run that on my own as well as some of the "Used by" dependents listed on github here https://github.com/softdevteam/grmtools/network/dependents comparing the output with the current output.

Sadly a great portion of the grammars I have laying around don't produce a valid state table,
because they are testing various error conditions, with tool work usurping much of my usage.

So I'll try and do that tomorrow.

ltratt · 2024-11-23T10:02:24Z

I think if it works on the ones you care about, we're probably good to go. I wouldn't object to a PR to add a -d (or whatever) option to nimbleparse to dump the statetable -- it should be straightforward as we already do that for the "bad" cases.

ratmice · 2024-11-23T21:49:51Z

Well, running through things adding the -d flag hasn't quite helped as much as I'd hoped.
The issue is that while the actual state graph stidx's and the like seem to be deterministic,
they are printed from pp_core_states and the like in a non-deterministic order.

As such the below output part of a diff of the -d flag, from 2 runs of nimbeparse both before this patch.

 Stategraph:
 0:   [^ -> . File, {'$'}]
-     'val' -> 3
-     'annotated' -> 4
-     File -> 1
-     'term' -> 6
      D -> 2
+     File -> 1
+     'annotated' -> 4
+     'val' -> 3
      'proof' -> 5
+     'term' -> 6

If I look at that through diff -u st-old.txt st-old2.txt | grep ^[+-] | sort it seems like the state graph is probably unchanged, in that the + lines appear to align with the - ones besides the order of output. But that isn't a very satisfying or thorough test.

Edit: I'll have a short investigation into how hard it would be to make the pp_core_states output deterministic.

ratmice · 2024-11-24T01:37:48Z

So I locally tested with fixed the pretty printing, checked the state tables of my projects and a bunch of projects from github with no hints of any unexpected differences.

ltratt mentioned this pull request Nov 22, 2024

Can't finish or takes forever with postgres16 grammar softdevteam/grmtools#473

Open

ratmice reviewed Nov 22, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

ltratt force-pushed the filter_empty_values branch from 2d37f94 to 6bca7f0 Compare November 23, 2024 08:45

ltratt assigned ratmice Nov 23, 2024

ratmice added this pull request to the merge queue Nov 24, 2024

Merged via the queue into softdevteam:master with commit f285566 Nov 24, 2024
2 checks passed

ratmice mentioned this pull request Nov 24, 2024

nimbleparse: add -d flag to dump state graph softdevteam/grmtools#474

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise by filtering out empty values. #25

Optimise by filtering out empty values. #25

ltratt commented Nov 22, 2024

ratmice commented Nov 22, 2024

ltratt commented Nov 23, 2024

ratmice commented Nov 23, 2024 •

edited

Loading

ltratt commented Nov 23, 2024

ratmice commented Nov 23, 2024 •

edited

Loading

ratmice commented Nov 24, 2024

Optimise by filtering out empty values. #25

Optimise by filtering out empty values. #25

Conversation

ltratt commented Nov 22, 2024

ratmice commented Nov 22, 2024

ltratt commented Nov 23, 2024

ratmice commented Nov 23, 2024 • edited Loading

ltratt commented Nov 23, 2024

ratmice commented Nov 23, 2024 • edited Loading

ratmice commented Nov 24, 2024

ratmice commented Nov 23, 2024 •

edited

Loading

ratmice commented Nov 23, 2024 •

edited

Loading