Size of parser stack items makes parsing slow #10

ExpHP · 2021-02-14T04:39:45Z

One item that appears on benchmarks (currently measuring about 8% on anm-benchmark for th15/title.anm) is memcpy calls in lalrparser::__parse__Anything::__reduce:

After some digging in binja and CE, it appears that the majority of memcpy samples counted were likely in these calls to __symbols.pop():

These (Location, Symbol, Location) tuples are currently 256 bytes large. __Symbol is a union of all kinds of nonterminal:

pub enum __Symbol<'input>
     {
        Variant0(Token<'input>),
        Variant1(&'input str),
        Variant2(::std::option::Option<Token<'input>>),
        Variant3(Sp<i32>),
        Variant4(::std::option::Option<Sp<i32>>),
        Variant5(()),
        ...
        Variant13(Sp<crate::parse::AnythingValue>),  // <-- largest variant
        ...
        Variant85(ast::Var),
        Variant86(ast::VarDeclKeyword),
    }

It's 256 bytes because Stmts are currently 208 bytes. (then there's the 8-byte discriminant in AnythingValue, the 16 bytes added by Sp<...>, another 8 bytes for __Symbol's discriminant, then 8 bytes for each Location).

We could probably speed things up a little bit by making Stmt smaller. I'm not sure how much we can do effectively short term aside from boxing some things, which has it's own cost; in the long term, arenas may help. Also, It would be fantabulous if we could figure out how to fix Spans to only be 8 bytes instead of an awkwardly-padded 12.

More importantly, in the meanwhile.... we want to be careful about letting Stmt grow any bigger.

The text was updated successfully, but these errors were encountered:

ExpHP · 2021-11-14T19:44:39Z

Current status:

Sp<Stmt> is currently 320 bytes
If I use black magic and sorcery to make Span and Option<Span> both 8 bytes, the size reduces to 256.

This 25% size reduction equated to a 10% runtime reduction, when running on all 7 EoSD ECL files. (note: WSL 2, files stored on NTFS)

This "black magic and sorcery" actually makes it quite difficult to reason about things. I might consider sticking some boxes into Stmt instead...

ExpHP · 2021-11-14T21:35:01Z

Boxing Stmt did not help.

I made all Stmt-producing productions return Box<ast::Stmt> or Box<ast::StmtKind> instead and dereferenced each when inserted into an AST node. I copied the symbol enum from generated.rs and verified that the size was now reduced to 192 bytes.

Unfortunately the runtime performance did not change from what it is currently.

Doing this required me to write this travesty in place of what used to be <stmts:(Sp<Stmt>)*>:

#[inline]
SpStmts0: Vec<Sp<ast::Stmt>> = {
    => vec![],
    SpStmts1,
};

SpStmts1: Vec<Sp<ast::Stmt>> = {
    <stmt:Sp<BoxStmt>> => vec![sp!(stmt.span => *stmt.value)],
    <stmts:SpStmts1> <stmt:Sp<BoxStmt>> => util::push(stmts, sp!(stmt.span => *stmt.value)),
};

An aside: It looks like inline rules in the grammar still generate states that are placed on the stack.

I discovered this when I tried to clean up the above code using the following so that I could write <stmts:(Sp<Unbox<BoxStmt>>)*>:

// NOTE:  pub type UnboxType<T> = <T as core::ops::Deref>::Target;
#[inline]
Unbox<Rule>: util::UnboxType<Rule> = <r:Rule> => *r;

Unfortunately despite the #[inline] this still produces a variant:

Variant102(util::UnboxType<Box<ast::Stmt>>),

ExpHP · 2021-11-14T21:46:58Z

From with the boxed Stmts.

Doesn't look like a significant contribution from allocation, it's still all memcpy...

ExpHP · 2021-11-14T23:19:29Z

I tried replacing the boxes with a qcell-based Pool abstraction with a tiny fixed-size pool of two reusable Option<ast::Stmt> and Option<ast::StmtKind>s. The statements in the symbol enum effectively became &'a LCell<Option<T>>; pointer-sized values that could be created from (and returned into) T without any allocation or synchronization.

This still had no effect.

This seems paradoxical. It would leads me to believe that it is perhaps not the size of the Symbol enum that is the problem, but rather the size of the ast structs in general (including Stmt). But, were that the case, then the loop desugaring/decompilation passes should show up much more on here...

I don't know.

Relavant branches:

no-file-id: 2de9d83
try-qcell: 43e0ba1

ExpHP added the performance Something is slow af label Feb 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of parser stack items makes parsing slow #10

Size of parser stack items makes parsing slow #10

ExpHP commented Feb 14, 2021

ExpHP commented Nov 14, 2021 •

edited

Loading

ExpHP commented Nov 14, 2021

ExpHP commented Nov 14, 2021 •

edited

Loading

ExpHP commented Nov 14, 2021 •

edited

Loading

Size of parser stack items makes parsing slow #10

Size of parser stack items makes parsing slow #10

Comments

ExpHP commented Feb 14, 2021

ExpHP commented Nov 14, 2021 • edited Loading

ExpHP commented Nov 14, 2021

ExpHP commented Nov 14, 2021 • edited Loading

ExpHP commented Nov 14, 2021 • edited Loading

ExpHP commented Nov 14, 2021 •

edited

Loading

ExpHP commented Nov 14, 2021 •

edited

Loading

ExpHP commented Nov 14, 2021 •

edited

Loading