Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary validation in tdh_sys_lp_init() #8

Open
pansilup opened this issue Nov 27, 2024 · 5 comments
Open

Canary validation in tdh_sys_lp_init() #8

pansilup opened this issue Nov 27, 2024 · 5 comments

Comments

@pansilup
Copy link

Summary:
I have an issue related to the use of stack canaries in the TDX module's execution of the seamcall, TDH.SYS.LP.INIT. My analysis points to a scenario where canary validation at the end of tdh_sys_lp_init() fails when the FS base is changed in the middle of the function. I appreciate your help in clarifying this.

Background Information Considered:

  1. According to the TDX module's source code, the build process enables "fstack-protector-strong" (stack canaries), as seen in this line of the compiler_defs.mk file.
    SECV_FLAGS = -Wdouble-promotion -Wshadow -Wconversion -Wmissing-prototypes -Wpointer-arith -Wuninitialized -Wunreachable-code -Wunused-function -Werror -D_FORTIFY_SOURCE=2 -fno-zero-initialized-in-bss -fstack-protector-strong
  2. The FS base in the initial per-LP SEAMCALL VMCS points to the start of the 'sys-info table' (i.e., the SEAM range start).
  3. According to the "Intel TDX Module Platform-Scope First-Time Initialization" sequence in the TDX Base Architecture Specification, TDH.SYS.INIT is executed on one LP, followed by the execution of TDH.SYS.LP.INIT on each LP.

Execution Sequence
Suppose there are 2 LPs, LP0 and LP1. SEAMCALLs are issued sequentially.

  1. Sys init
    on LP0: TDH.SYS.INIT
    Updates the last global data page with a canary value.
    This data page consists of a copy of the original sys-info table.
    Changes fs base of LP0 to the last global data page address.
  2. Per LP init after completing the above.
    On LP0: TDH.SYS.LP.INIT
    no issue here
    On LP1: TDH.SYS.LP.INIT

Discussion:
Consider the following binary code from libtdx.so applicable for tdh_sys_lp_init() function.

0000000000021780 <tdh_sys_lp_init>: start
   ...
 Get the canary value and save it on the stack
   21795: mov    %fs:0x28,%rax
   2179e: mov    %rax,0x20(%rsp)
   ...
          call tdx_local_init()
                change fs base (to last global pg addr in data rgn)
                mov    $0x6c06,%ecx
                vmwrite %rax,%rcx
   ...
  Validate the canary value saved on the stack   
   20e36: mov    %fs:0x28,%rcx
   20e3f: cmp    0x20(%rsp),%rcx
   20e44: jne    20f90 <tdh_sys_lp_init+0x1d0>
   ...
<tdh_sys_lp_init>: end

   20f90: callq  1a30 <__wrap___stack_chk_fail>

-My scenario pertains to the third SEAMCALL in the sequence above.

  • For LP1, TDH.SYS.LP.INIT is the first SEAMCALL on the LP.
  • Therefore, when the SEAMCALL is issued on LP1, the FS base points to the start of the sys-info table, as set up in LP1's VMCS by pseamldr.
  • At the start of tdh_sys_lp_init(), the canary value saved on the stack is taken from the address fs:0x28, i.e., "sys-info table + 0x28".
  • Later, as part of tdx_local_init(), the FS base is changed (i.e., FS is updated to point to the last global data page).
  • Finally, at the end of the tdh_sys_lp_init() function, the canary on the stack is compared with the reference canary from fs:0x28.
  • Since the FS base has changed, the reference canary is now different from the value saved on the stack. In fact, the reference canary is now the canary initialized by TDH.SYS.INIT at the address "last global data page + 0x28".
  • As a result, it appears that the canary validation fails without stack smashing, and the control flow is directed to a UD2.

Questions:
I have three questions:

  1. Regarding Platform Initialization and Canary Validation:
    My real TDX platform does not have any issues with platform initialization, and I assume none of the other users are facing such issues either. However, based on the analysis above, it appears that the canary validation at the end of tdh_sys_lp_init() should fail on any LP where the very first SEAMCALL on the LP happens to be TDH.SYS.LP.INIT. Logically, this makes sense, as I’m under the impression that the FS base must not be changed in the middle of a function if the function includes stack canary operations at both the beginning and the end. Do you have any thoughts on this? Is there any part of my analysis where I might have gone astray?

  2. About the libtdx.so Compilation:
    Is the production libtdx.so built without the fstack-protector-strong option?

  3. About the Canary Location in sys-info Table:
    It seems that the canary value is initialized by TDH.SYS.INIT in a copy of the sys-info table, which is copied onto the last global data page. In the struct sysinfo_table_s, the offset of the canary is 0x28. This offset is consistent with the offset to the FS base we observer when the canary is referenced in libtdx.so. How does the compilation process know where (i.e., at an offset of 0x28 from the FS base) the TDX module stores the canary value?

Thank you.

@sergey687
Copy link

Hello! Thank you for being interested in TDX.
Regarding your questions:

  1. The FS Base update is done in the VMCS only, with VMWRITE. Which means that it will have any effect only on the next SEAMCALL (on the same LP), so until the SEAMRET the current LP still runs with the same FSBASE that it started with.

  2. Production TDX is built with the stack protector also.

  3. 0x28 is the default offset from FS that Clang 9 compiler is looking for when it builds a binary with stack protector.

@pansilup
Copy link
Author

Hi @sergey687, Thanks for the clarification. Cheers !!!

@pansilup
Copy link
Author

pansilup commented Nov 29, 2024

@sergey687 I would like to ask a follow up question.
So, the TDH.SYS.LP.INIT executed on the LP on which the TDH.SYS.INIT was run(in my example, LP0) uses the correct canary setup in the global data. However none of the other TDH.SYS.LP.INIT SEAMCALLs executed on the remaining LPs use the canary setup by TDH.SYS.INIT as the FS on those LPs still points to the sys_info table. Those SEAMCALLs get a value for the canary from sys_info + 0x28, which I believe is not a random value (I think it falls on the socket_cpuid_table[] in sys_info). Any thoughts ? Please correct if my understanding is not correct.
Thank you.

@sergey687
Copy link

@pansilup
Yes you are correct. The TDH.SYS.LP.INIT on each LP have to execute with a non-random value of canary. And only after the TDH.SYS.LP.INIT completes, each LP will execute any subsequent SEAMCALLs with a correct randomized canary.
It's a known issue that we have in this version of TDX due to some architectural definition and compiler constraints that we had.

@pansilup
Copy link
Author

pansilup commented Dec 3, 2024

Hi @sergey687 Thanks.
Cheers !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants