-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bad 'bp' value
warning
#2281
base: master
Are you sure you want to change the base?
Conversation
3a37329
to
4b57b1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment for now - it seems more attention is needed
@@ -1658,7 +1658,6 @@ SYM_FUNC_END(mpi_from_mont_p256_x86_64) | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the copyright year
pushq %rdi | ||
mulxq (%rcx), %rdi, %rax | ||
adoxq %rdi, %r11 | ||
popq %rdi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm not sure about
push
andpop
in the middle of the function - please compare the dependency flow of this function and the original withllvm-mca
. Also put attention on other performance characteristics of the function, they must not get any worse. - it seems we don't need to store
rdi
at all. It seems there is a mistake in the function comment as well as forMONT_REDUCE
: it seems we do not use the 1st function argument inrdi
at all and just use the registers as for result. See for example the computationM = 3(X + Z^2)(X - Z^2)
inecp256_double_jac)
. Maybe there are other cases and I'm wrong. If I'm right, then we do not need to storerdi
results and can just eliminate yourpush
andpop
operations. Having that we don't need to bother withrbp
, the function should become faster. If so, please fix the function comment andMONT_REDUCE
that we userdx
, notrdi
, for computation and store results inrdi
. It seemsMONT_REDUCE
also usesrdi
for results only. - This is not our code, the modern wolfssl also does the same with
rbp
(see_sp_256_mont_mul_avx2_4
inwolfcrypt/src/sp_x86_64_asm.S
and apparently they don't have the assembly issue - maybe we can just use some flag to eliminate the problem? The thing is that each instruction is very important in assembly crypto (this is super hot code!) and it's usually beneficial if we can use as many registers as we can, so if we don't needrbp
in the assembly code, then it could be beneficial to use it for some computation. Maybe, not necessary in this particular place, but also in others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ they just removed the code as it is not performance crucial - this is very different for our case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MONT_REDUCE
we do movq %r12, (%rdi)
so if rdi was changed we write value to incorrect place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About wolfssl it seems that it is not a kernel library, which works in irq context.
Here https://lists.openwall.net/linux-kernel/2017/11/22/255 is described that it is impossible to use rbp register with CONFIG_FRAME_POINTER from kenrel config and correct frame unwinding from interrupts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we can not use rbp with CONFIG_FRAME_POINTER and kasan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old
Iterations: 100
Instructions: 7400
Total Cycles: 4813
Total uOps: 14200
Dispatch Width: 6
uOps Per Cycle: 2.95
IPC: 1.54
Block RThroughput: 23.7
new
Iterations: 100
Instructions: 7400
Total Cycles: 4812
Total uOps: 14200
Dispatch Width: 6
uOps Per Cycle: 2.95
IPC: 1.54
Block RThroughput: 23.7
We use `rbp` register in `mpi_mul_mont_mod_p256_x86_64` function to save result of `mulxq` operation. That breaks frame pointer convention and breaks stack traces. The same problem described here: https://lists.openwall.net/linux-kernel/2017/11/22/255 We can't use %rbp with CONFIG_FRAME_POINTER, so check if CONFIG_FRAME_POINTER is set and use %rdi register instead of %rbp.
4b57b1b
to
5b76bb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should spill %rdi
because it contains result destination address, seems ifdef'ing of CONFIG_FRAME_POINTER
is good solution, but with this solution we should use CONFIG_UNWINDER_ORC
by default, in our CI as well. Also, it would be great to depict it in the wiki.
We use
rbp
register inmpi_mul_mont_mod_p256_x86_64
function to save result ofmulxq
operation. That breaks breaks frame pointer convention and breaks stack traces. The same problem described here:https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/ Change using of
rbp
tordi
fixes promlem.