Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad 'bp' value warning #2281

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions tls/bignum_x86-64.S
Original file line number Diff line number Diff line change
Expand Up @@ -1658,7 +1658,9 @@ SYM_FUNC_END(mpi_from_mont_p256_x86_64)
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the copyright year

SYM_FUNC_START_32(mpi_mul_mont_mod_p256_x86_64)
pushq %rbx
#if defined(__KERNEL__) && !defined(CONFIG_FRAME_POINTER)
pushq %rbp
#endif
pushq %r12
pushq %r13
pushq %r14
Expand Down Expand Up @@ -1727,8 +1729,15 @@ SYM_FUNC_START_32(mpi_mul_mont_mod_p256_x86_64)
# A[3] * B[0]
movq 24(%rsi), %rdx
adcxq %rax, %r12
#if defined(__KERNEL__) && !defined(CONFIG_FRAME_POINTER)
mulxq (%rcx), %rbp, %rax
adoxq %rbp, %r11
#else
pushq %rdi
mulxq (%rcx), %rdi, %rax
adoxq %rdi, %r11
popq %rdi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not sure about push and pop in the middle of the function - please compare the dependency flow of this function and the original with llvm-mca. Also put attention on other performance characteristics of the function, they must not get any worse.
  2. it seems we don't need to store rdi at all. It seems there is a mistake in the function comment as well as for MONT_REDUCE: it seems we do not use the 1st function argument in rdi at all and just use the registers as for result. See for example the computation M = 3(X + Z^2)(X - Z^2) in ecp256_double_jac). Maybe there are other cases and I'm wrong. If I'm right, then we do not need to store rdi results and can just eliminate your push and pop operations. Having that we don't need to bother with rbp, the function should become faster. If so, please fix the function comment and MONT_REDUCE that we use rdx, not rdi, for computation and store results in rdi. It seems MONT_REDUCE also uses rdi for results only.
  3. This is not our code, the modern wolfssl also does the same with rbp (see _sp_256_mont_mul_avx2_4 in wolfcrypt/src/sp_x86_64_asm.S and apparently they don't have the assembly issue - maybe we can just use some flag to eliminate the problem? The thing is that each instruction is very important in assembly crypto (this is super hot code!) and it's usually beneficial if we can use as many registers as we can, so if we don't need rbp in the assembly code, then it could be beneficial to use it for some computation. Maybe, not necessary in this particular place, but also in others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ they just removed the code as it is not performance crucial - this is very different for our case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MONT_REDUCE we do movq %r12, (%rdi) so if rdi was changed we write value to incorrect place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About wolfssl it seems that it is not a kernel library, which works in irq context.
Here https://lists.openwall.net/linux-kernel/2017/11/22/255 is described that it is impossible to use rbp register with CONFIG_FRAME_POINTER from kenrel config and correct frame unwinding from interrupts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can not use rbp with CONFIG_FRAME_POINTER and kasan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old

Iterations:        100
Instructions:      7400
Total Cycles:      4813
Total uOps:        14200

Dispatch Width:    6
uOps Per Cycle:    2.95
IPC:               1.54
Block RThroughput: 23.7

new

Iterations:        100
Instructions:      7400
Total Cycles:      4812
Total uOps:        14200

Dispatch Width:    6
uOps Per Cycle:    2.95
IPC:               1.54
Block RThroughput: 23.7

#endif
adoxq %rax, %r12
# A[3] * B[2]
mulxq 16(%rcx), %rdx, %rax
Expand All @@ -1748,7 +1757,9 @@ SYM_FUNC_START_32(mpi_mul_mont_mod_p256_x86_64)
popq %r14
popq %r13
popq %r12
#if defined(__KERNEL__) && !defined(CONFIG_FRAME_POINTER)
popq %rbp
#endif
popq %rbx
retq
SYM_FUNC_END(mpi_mul_mont_mod_p256_x86_64)
Expand Down