Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad 'bp' value warning #2281

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EvgeniiMekhanik
Copy link
Contributor

We use rbp register in mpi_mul_mont_mod_p256_x86_64 function to save result of mulxq operation. That breaks breaks frame pointer convention and breaks stack traces. The same problem described here:
https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ Change using of rbp to rdi fixes promlem.

@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-2268 branch 2 times, most recently from 3a37329 to 4b57b1b Compare November 15, 2024 22:50
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment for now - it seems more attention is needed

@@ -1658,7 +1658,6 @@ SYM_FUNC_END(mpi_from_mont_p256_x86_64)
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the copyright year

pushq %rdi
mulxq (%rcx), %rdi, %rax
adoxq %rdi, %r11
popq %rdi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not sure about push and pop in the middle of the function - please compare the dependency flow of this function and the original with llvm-mca. Also put attention on other performance characteristics of the function, they must not get any worse.
  2. it seems we don't need to store rdi at all. It seems there is a mistake in the function comment as well as for MONT_REDUCE: it seems we do not use the 1st function argument in rdi at all and just use the registers as for result. See for example the computation M = 3(X + Z^2)(X - Z^2) in ecp256_double_jac). Maybe there are other cases and I'm wrong. If I'm right, then we do not need to store rdi results and can just eliminate your push and pop operations. Having that we don't need to bother with rbp, the function should become faster. If so, please fix the function comment and MONT_REDUCE that we use rdx, not rdi, for computation and store results in rdi. It seems MONT_REDUCE also uses rdi for results only.
  3. This is not our code, the modern wolfssl also does the same with rbp (see _sp_256_mont_mul_avx2_4 in wolfcrypt/src/sp_x86_64_asm.S and apparently they don't have the assembly issue - maybe we can just use some flag to eliminate the problem? The thing is that each instruction is very important in assembly crypto (this is super hot code!) and it's usually beneficial if we can use as many registers as we can, so if we don't need rbp in the assembly code, then it could be beneficial to use it for some computation. Maybe, not necessary in this particular place, but also in others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ they just removed the code as it is not performance crucial - this is very different for our case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MONT_REDUCE we do movq %r12, (%rdi) so if rdi was changed we write value to incorrect place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About wolfssl it seems that it is not a kernel library, which works in irq context.
Here https://lists.openwall.net/linux-kernel/2017/11/22/255 is described that it is impossible to use rbp register with CONFIG_FRAME_POINTER from kenrel config and correct frame unwinding from interrupts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can not use rbp with CONFIG_FRAME_POINTER and kasan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old

Iterations:        100
Instructions:      7400
Total Cycles:      4813
Total uOps:        14200

Dispatch Width:    6
uOps Per Cycle:    2.95
IPC:               1.54
Block RThroughput: 23.7

new

Iterations:        100
Instructions:      7400
Total Cycles:      4812
Total uOps:        14200

Dispatch Width:    6
uOps Per Cycle:    2.95
IPC:               1.54
Block RThroughput: 23.7

We use `rbp` register in `mpi_mul_mont_mod_p256_x86_64`
function to save result of `mulxq` operation. That breaks
frame pointer convention and breaks stack traces. The same
problem described here:
https://lists.openwall.net/linux-kernel/2017/11/22/255
We can't use %rbp with CONFIG_FRAME_POINTER, so check
if CONFIG_FRAME_POINTER is set and use %rdi register
instead of %rbp.
Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should spill %rdi because it contains result destination address, seems ifdef'ing of CONFIG_FRAME_POINTER is good solution, but with this solution we should use CONFIG_UNWINDER_ORC by default, in our CI as well. Also, it would be great to depict it in the wiki.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants