Fix `bad 'bp' value` warning #2281

EvgeniiMekhanik · 2024-11-15T22:48:12Z

We use rbp register in mpi_mul_mont_mod_p256_x86_64 function to save result of mulxq operation. That breaks breaks frame pointer convention and breaks stack traces. The same problem described here:
https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ Change using of rbp to rdi fixes promlem.

krizhanovsky

Just a comment for now - it seems more attention is needed

krizhanovsky · 2024-11-19T18:47:07Z

tls/bignum_x86-64.S

@@ -1658,7 +1658,6 @@ SYM_FUNC_END(mpi_from_mont_p256_x86_64)
 */


Please update the copyright year

krizhanovsky · 2024-11-19T20:18:26Z

tls/bignum_x86-64.S

+	pushq	%rdi
+	mulxq	(%rcx), %rdi, %rax
+	adoxq	%rdi, %r11
+	popq	%rdi


I'm not sure about push and pop in the middle of the function - please compare the dependency flow of this function and the original with llvm-mca. Also put attention on other performance characteristics of the function, they must not get any worse.

it seems we don't need to store rdi at all. It seems there is a mistake in the function comment as well as for MONT_REDUCE: it seems we do not use the 1st function argument in rdi at all and just use the registers as for result. See for example the computation M = 3(X + Z^2)(X - Z^2) in ecp256_double_jac). Maybe there are other cases and I'm wrong. If I'm right, then we do not need to store rdi results and can just eliminate your push and pop operations. Having that we don't need to bother with rbp, the function should become faster. If so, please fix the function comment and MONT_REDUCE that we use rdx, not rdi, for computation and store results in rdi. It seems MONT_REDUCE also uses rdi for results only.

This is not our code, the modern wolfssl also does the same with rbp (see _sp_256_mont_mul_avx2_4 in wolfcrypt/src/sp_x86_64_asm.S and apparently they don't have the assembly issue - maybe we can just use some flag to eliminate the problem? The thing is that each instruction is very important in assembly crypto (this is super hot code!) and it's usually beneficial if we can use as many registers as we can, so if we don't need rbp in the assembly code, then it could be beneficial to use it for some computation. Maybe, not necessary in this particular place, but also in others.

In https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ they just removed the code as it is not performance crucial - this is very different for our case

In MONT_REDUCE we do movq %r12, (%rdi) so if rdi was changed we write value to incorrect place

About wolfssl it seems that it is not a kernel library, which works in irq context.
Here https://lists.openwall.net/linux-kernel/2017/11/22/255 is described that it is impossible to use rbp register with CONFIG_FRAME_POINTER from kenrel config and correct frame unwinding from interrupts

May be we can not use rbp with CONFIG_FRAME_POINTER and kasan.

old

Iterations: 100 Instructions: 7400 Total Cycles: 4813 Total uOps: 14200 Dispatch Width: 6 uOps Per Cycle: 2.95 IPC: 1.54 Block RThroughput: 23.7

new

Iterations: 100 Instructions: 7400 Total Cycles: 4812 Total uOps: 14200 Dispatch Width: 6 uOps Per Cycle: 2.95 IPC: 1.54 Block RThroughput: 23.7

We use `rbp` register in `mpi_mul_mont_mod_p256_x86_64` function to save result of `mulxq` operation. That breaks frame pointer convention and breaks stack traces. The same problem described here: https://lists.openwall.net/linux-kernel/2017/11/22/255 We can't use %rbp with CONFIG_FRAME_POINTER, so check if CONFIG_FRAME_POINTER is set and use %rdi register instead of %rbp.

const-t

We should spill %rdi because it contains result destination address, seems ifdef'ing of CONFIG_FRAME_POINTER is good solution, but with this solution we should use CONFIG_UNWINDER_ORC by default, in our CI as well. Also, it would be great to depict it in the wiki.

EvgeniiMekhanik requested review from krizhanovsky and const-t November 15, 2024 22:48

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-2268 branch 2 times, most recently from 3a37329 to 4b57b1b Compare November 15, 2024 22:50

krizhanovsky reviewed Nov 19, 2024

View reviewed changes

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-2268 branch from 4b57b1b to 5b76bb3 Compare November 24, 2024 02:21

EvgeniiMekhanik requested a review from krizhanovsky November 24, 2024 02:22

const-t reviewed Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `bad 'bp' value` warning #2281

Fix `bad 'bp' value` warning #2281

EvgeniiMekhanik commented Nov 15, 2024

krizhanovsky left a comment

krizhanovsky Nov 19, 2024

krizhanovsky Nov 19, 2024

krizhanovsky Nov 19, 2024

EvgeniiMekhanik Nov 20, 2024

EvgeniiMekhanik Nov 20, 2024

EvgeniiMekhanik Nov 21, 2024

EvgeniiMekhanik Nov 21, 2024

const-t left a comment

		@@ -1658,7 +1658,6 @@ SYM_FUNC_END(mpi_from_mont_p256_x86_64)
		*/

Fix bad 'bp' value warning #2281

Are you sure you want to change the base?

Fix bad 'bp' value warning #2281

Conversation

EvgeniiMekhanik commented Nov 15, 2024

krizhanovsky left a comment

Choose a reason for hiding this comment

krizhanovsky Nov 19, 2024

Choose a reason for hiding this comment

krizhanovsky Nov 19, 2024

Choose a reason for hiding this comment

krizhanovsky Nov 19, 2024

Choose a reason for hiding this comment

EvgeniiMekhanik Nov 20, 2024

Choose a reason for hiding this comment

EvgeniiMekhanik Nov 20, 2024

Choose a reason for hiding this comment

EvgeniiMekhanik Nov 21, 2024

Choose a reason for hiding this comment

EvgeniiMekhanik Nov 21, 2024

Choose a reason for hiding this comment

const-t left a comment

Choose a reason for hiding this comment

Fix `bad 'bp' value` warning #2281

Fix `bad 'bp' value` warning #2281