Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeGen: Optimize arithmetics for basic identities #1545

Merged
merged 6 commits into from
Nov 27, 2024
Merged

Conversation

zeux
Copy link
Collaborator

@zeux zeux commented Nov 26, 2024

This change folds:

a * 1 => a
a / 1 => a
a * -1 => -a
a / -1 => -a
a * 2 => a + a
a / 2^k => a * 2^-k
a - 0 => a
a + (-0) => a

Note that the following folds are all invalid:

a + 0 => a (breaks for negative zero)
a - (-0) => a (breaks for negative zero)
a - a => 0 (breaks for Inf/NaN)
0 - a => -a (breaks for negative zero)

Various cases of UNM_NUM could be optimized (eg (-a) * (-b) = a * b),
but that doesn't happen in benchmarks.

While it would be possible to also fold inverse multiplications (k * v),
these do not happen in benchmarks and rarely happen in bytecode due
to type based optimizations. Maybe this can be improved with some sort of
IR canonicalization in the future if necessary.

I've considered moving some of these, like division strength reduction,
to IR translation (as this is where POW is lowered presently) but it didn't
seem better one way or the other.

This change improves performance on some benchmarks, e.g. trig and voxelgen,
and should be a strict uplift as it never generates more instructions or longer
latency chains. On Apple M2, without division->multiplication optimization, both
benchmarks see 0.1-0.2% uplift. Division optimization makes trig 3% faster; I expect
the gains on X64 will be more muted, but on Apple this seems to allow loop iterations
to overlap better by removing the division bottleneck.

zeux added 4 commits November 26, 2024 21:03
This change folds:

	a * 1 => a
	a / 1 => a
	a * -1 => -a
	a / -1 => -a
	a * 2 => a + a
	a / 2^k => a * 2^-k

Note that the following folds are all invalid:

	a + 0 => a (breaks for negative zero)
	a - (-0) => a (breaks for negative zero)
	a - a => 0 (breaks for NaN)

a - 0 could be folded into a but that doesn't happen in benchmarks.

Various cases of UNM_NUM could be optimized (eg (-a) * (-b) = a * b),
but that doesn't happen in benchmarks either.
Check various specials that we can't optimize and division by large power of two.
This can sometimes be helpful after all.
@zeux
Copy link
Collaborator Author

zeux commented Nov 27, 2024

After testing this on math_map equivalent, I've went ahead and added a-0/a+(-0) folding after all. With it, this code:

local function math_map(x: number, inmin: number, inmax: number, outmin: number, outmax: number): number
	return outmin + (x - inmin) * (outmax - outmin) / (inmax - inmin)
end

local function lerpopt(a, b, t)
	return math_map(t, 0, 1, a, b)
end

gets compiled into this:

 ldr         w17,[x25,#44]
 cmp         w17,#3
 b.ne        .L48
 ldr         d31,[x25,#32]
 str         d31,[x25,#96]
 movz        w17,#3
 str         w17,[x25,#108]
.L49:
 ldr         w17,[x25,#28]
 cmp         w17,#3
 b.ne        .L50
 ldr         w17,[x25,#12]
 cmp         w17,#3
 b.ne        .L50
 ldr         d30,[x25,#16]
 ldr         d29,[x25]
 fsub        d30,d30,d29
 fmul        d31,d31,d30
 fadd        d29,d29,d31
 str         d29,[x25,#48]
 movz        w17,#3
 str         w17,[x25,#60]
 ldr         x0,[x21,#3296]
 cbnz        x0,.L51
.L52:
 ldr         q0,[x25,#48]
 str         q0,[x25,#-16]
 mov         x1,x25
 movz        w2,#1
 b           .L7

... which is almost good, short of an extra dead store in the first basic block that should have been elided; explicit type annotations on lerpopt help fix that though.

@zeux zeux changed the title CodeGen: Optimize arithmetics for basic multiplicative identities CodeGen: Optimize arithmetics for basic identities Nov 27, 2024
0-a can't be simplified with -a as 0 becomes -0 under negation.
@zeux zeux requested a review from vegorov-rbx November 27, 2024 02:19
@vegorov-rbx vegorov-rbx merged commit b5801d3 into master Nov 27, 2024
8 checks passed
@vegorov-rbx vegorov-rbx deleted the ncg-arithopt branch November 27, 2024 12:44
@vegorov-rbx
Copy link
Collaborator

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants