CodeGen: Optimize arithmetics for basic identities #1545

zeux · 2024-11-26T15:05:07Z

This change folds:

a * 1 => a
a / 1 => a
a * -1 => -a
a / -1 => -a
a * 2 => a + a
a / 2^k => a * 2^-k
a - 0 => a
a + (-0) => a

Note that the following folds are all invalid:

a + 0 => a (breaks for negative zero)
a - (-0) => a (breaks for negative zero)
a - a => 0 (breaks for Inf/NaN)
0 - a => -a (breaks for negative zero)

Various cases of UNM_NUM could be optimized (eg (-a) * (-b) = a * b),
but that doesn't happen in benchmarks.

While it would be possible to also fold inverse multiplications (k * v),
these do not happen in benchmarks and rarely happen in bytecode due
to type based optimizations. Maybe this can be improved with some sort of
IR canonicalization in the future if necessary.

I've considered moving some of these, like division strength reduction,
to IR translation (as this is where POW is lowered presently) but it didn't
seem better one way or the other.

This change improves performance on some benchmarks, e.g. trig and voxelgen,
and should be a strict uplift as it never generates more instructions or longer
latency chains. On Apple M2, without division->multiplication optimization, both
benchmarks see 0.1-0.2% uplift. Division optimization makes trig 3% faster; I expect
the gains on X64 will be more muted, but on Apple this seems to allow loop iterations
to overlap better by removing the division bottleneck.

This change folds: a * 1 => a a / 1 => a a * -1 => -a a / -1 => -a a * 2 => a + a a / 2^k => a * 2^-k Note that the following folds are all invalid: a + 0 => a (breaks for negative zero) a - (-0) => a (breaks for negative zero) a - a => 0 (breaks for NaN) a - 0 could be folded into a but that doesn't happen in benchmarks. Various cases of UNM_NUM could be optimized (eg (-a) * (-b) = a * b), but that doesn't happen in benchmarks either.

Check various specials that we can't optimize and division by large power of two.

This can sometimes be helpful after all.

zeux · 2024-11-27T01:03:13Z

After testing this on math_map equivalent, I've went ahead and added a-0/a+(-0) folding after all. With it, this code:

local function math_map(x: number, inmin: number, inmax: number, outmin: number, outmax: number): number
	return outmin + (x - inmin) * (outmax - outmin) / (inmax - inmin)
end

local function lerpopt(a, b, t)
	return math_map(t, 0, 1, a, b)
end

gets compiled into this:

 ldr         w17,[x25,#44]
 cmp         w17,#3
 b.ne        .L48
 ldr         d31,[x25,#32]
 str         d31,[x25,#96]
 movz        w17,#3
 str         w17,[x25,#108]
.L49:
 ldr         w17,[x25,#28]
 cmp         w17,#3
 b.ne        .L50
 ldr         w17,[x25,#12]
 cmp         w17,#3
 b.ne        .L50
 ldr         d30,[x25,#16]
 ldr         d29,[x25]
 fsub        d30,d30,d29
 fmul        d31,d31,d30
 fadd        d29,d29,d31
 str         d29,[x25,#48]
 movz        w17,#3
 str         w17,[x25,#60]
 ldr         x0,[x21,#3296]
 cbnz        x0,.L51
.L52:
 ldr         q0,[x25,#48]
 str         q0,[x25,#-16]
 mov         x1,x25
 movz        w2,#1
 b           .L7

... which is almost good, short of an extra dead store in the first basic block that should have been elided; explicit type annotations on lerpopt help fix that though.

0-a can't be simplified with -a as 0 becomes -0 under negation.

vegorov-rbx · 2024-11-27T12:53:20Z

Thank you.

zeux added 4 commits November 26, 2024 21:03

tests: Expand conformance/basic with more arith checks

fb56d8b

Check various specials that we can't optimize and division by large power of two.

CodeGen: Add forgotten a/-1 case

73c88c6

Add forgotten math.h include

301b06c

vegorov-rbx approved these changes Nov 26, 2024

View reviewed changes

Add optimization of a-0 and a+(-0)

d83ff5a

This can sometimes be helpful after all.

zeux changed the title ~~CodeGen: Optimize arithmetics for basic multiplicative identities~~ CodeGen: Optimize arithmetics for basic identities Nov 27, 2024

Add a test for 0-a as well

0daedcd

0-a can't be simplified with -a as 0 becomes -0 under negation.

zeux requested a review from vegorov-rbx November 27, 2024 02:19

vegorov-rbx approved these changes Nov 27, 2024

View reviewed changes

vegorov-rbx merged commit b5801d3 into master Nov 27, 2024
8 checks passed

vegorov-rbx deleted the ncg-arithopt branch November 27, 2024 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeGen: Optimize arithmetics for basic identities #1545

CodeGen: Optimize arithmetics for basic identities #1545

zeux commented Nov 26, 2024 •

edited

Loading

zeux commented Nov 27, 2024 •

edited

Loading

vegorov-rbx commented Nov 27, 2024

CodeGen: Optimize arithmetics for basic identities #1545

CodeGen: Optimize arithmetics for basic identities #1545

Conversation

zeux commented Nov 26, 2024 • edited Loading

zeux commented Nov 27, 2024 • edited Loading

vegorov-rbx commented Nov 27, 2024

zeux commented Nov 26, 2024 •

edited

Loading

zeux commented Nov 27, 2024 •

edited

Loading