Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/Linux-SGX] AEX-Notify 2/5: Inject signals into enclave in regular context #2032

Open
wants to merge 1 commit into
base: dimakuv/aex-notify-part1
Choose a base branch
from

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Oct 15, 2024

Description of the changes

Part 2 in AEX-Notify series.

Previously, if the enclave was interrupted by a sync signal (e.g., SIGILL) or an async signal (e.g., SIGTERM), then the untrusted-runtime signal handler injected the signal directly into the enclave. In particular, untrusted runtime ran in the signal-handling context and called sgx_raise() that would perform EENTER to enter the in-enclave stage-1 signal handler, then EEXIT to exit the enclave back into the untrusted-runtime signal-handling context, and then untrusted runtime would perform sigreturn to go back to untrusted-runtime regular context, jumping to AEP (Asynchronous Exit Pointer). In AEP, ERESUME was called to resume the enclave execution from the stage-2 signal handler.

In other words, the following invariants held:

  • In-enclave stage-1 signal handler (in SSA 1) always executed in the signal-handling context of the untrusted runtime.
  • In-enclave stage-2 signal handler (in SSA 0) always executed in regular context of the untrusted runtime.

As a preparation for AEX-Notify support, this commit breaks the above strong coupling of contexts: in-enclave stage-1 signal handler must execute in regular context of the untrusted runtime.

In particular, this commit changes signal-handling logic as follows: instead of immediately delivering a sync/async signal into the enclave, the untrusted runtime's signal handler memorizes the signal in a thread-local variable last_sync_signal/last_async_signal and returns. When host kernel returns back to regular context from the signal handler, it jumps to the AEP, which is augmented with a new logic: checking whether there is any signal pending (variables last_sync_signal or last_async_signal are not zero). If there is a pending signal, the new AEP logic performs EENTER, so that in-enclave stage-1 handler executes. After the stage-1 handler is done, it performs EEXIT, and the AEP logic finalizes with ERESUME as usual. At this point the flow is the same as was previously implemented: the enclave is resumed in the in-enclave stage-2 handler.

There is one corner case: an async signal can arrive while the enclave is executing the stage-1 handler (in SSA 1). In this case, an async signal flow is triggered in untrusted runtime, and the AEP after the async signal will try to EENTER, but since there's already SSA 1 executing inside the enclave and SSA 2 is forbidden by SGX hardware, this (nested) EENTER will raise a #GP fault which translates into SIGSEGV and is delivered to the untrusted runtime's signal handler. We augment the SIGSEGV (aka PAL_EVENT_MEMFAULT) signal handler to catch this particular case and ignore it: the async signal is re-memorized in last_async_signal variable but cannot be delivered right now. This async signal will be delivered on some later AEX event.

See also related PRs and discussions:

How to test this PR?

CI is enough.


This change is Reviewable

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
Must be applied on top of #2025. Blocking.


Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


-- commits line 21 at r1:
Here and everywhere: the correct spelling is AEX-Notify, with a dash. Change.


pal/src/host/linux-sgx/host_exception.c line 141 at r1 (raw file):

         *
         * We do not deliver the signal immediately to the enclave (but instead mark it as pending)
         * because we want to support AEX Notify hardware feature in SGX. In particular, AEX Notify

Here and everywhere: the correct spelling is AEX-Notify, with a dash. Change.

Previously, if the enclave was interrupted by a sync signal (e.g.,
SIGILL) or an async signal (e.g., SIGTERM), then the untrusted-runtime
signal handler injected the signal directly into the enclave. In
particular, untrusted runtime ran in the signal-handling context and
called `sgx_raise()` that would perform EENTER to enter the in-enclave
stage-1 signal handler, then EEXIT to exit the enclave back into the
untrusted-runtime signal-handling context, and then untrusted runtime
would perform sigreturn to go back to untrusted-runtime regular context,
jumping to AEP (Asynchronous Exit Pointer). In AEP, ERESUME was called
to resume the enclave execution from the stage-2 signal handler.

In other words, the following invariants held:
- In-enclave stage-1 signal handler (in SSA 1) always executed in the
  signal-handling context of the untrusted runtime.
- In-enclave stage-2 signal handler (in SSA 0) always executed in
  regular context of the untrusted runtime.

As a preparation for AEX-Notify support, this commit breaks the above
strong coupling of contexts: in-enclave stage-1 signal handler must
execute in regular context of the untrusted runtime.

In particular, this commit changes signal-handling logic as follows:
instead of immediately delivering a sync/async signal into the enclave,
the untrusted runtime's signal handler memorizes the signal in a
thread-local variable `last_sync_signal`/`last_async_signal` and
returns. When host kernel returns back to regular context from the
signal handler, it jumps to the AEP, which is augmented with a new
logic: checking whether there is any signal pending (variables
`last_sync_signal` or `last_async_signal` are not zero). If there is a
pending signal, the new AEP logic performs EENTER, so that in-enclave
stage-1 handler executes. After the stage-1 handler is done, it performs
EEXIT, and the AEP logic finalizes with ERESUME as usual. At this point
the flow is the same as was previously implemented: the enclave is
resumed in the in-enclave stage-2 handler.

There is one corner case: an async signal can arrive while the enclave
is executing the stage-1 handler (in SSA 1). In this case, an async
signal flow is triggered in untrusted runtime, and the AEP after the
async signal will try to EENTER, but since there's already SSA 1
executing inside the enclave and SSA 2 is forbidden by SGX hardware,
this (nested) EENTER will raise a #GP fault which translates into
SIGSEGV and is delivered to the untrusted runtime's signal handler. We
augment the SIGSEGV (aka PAL_EVENT_MEMFAULT) signal handler to catch
this particular case and ignore it: the async signal is re-memorized in
`last_async_signal` variable but cannot be delivered right now. This
async signal will be delivered on some later AEX event.

Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
@dimakuv dimakuv force-pushed the dimakuv/aex-notify-part1 branch 2 times, most recently from b9b90ed to b272b9d Compare October 22, 2024 07:12
@dimakuv dimakuv changed the title [PAL/Linux-SGX] Inject signals into enclave in regular context [PAL/Linux-SGX] AEX-Notify 2/5: Inject signals into enclave in regular context Oct 22, 2024
@dimakuv dimakuv force-pushed the dimakuv/aex-notify-part2 branch from bca2d41 to 438a1bb Compare October 22, 2024 07:17
Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


-- commits line 21 at r1:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Here and everywhere: the correct spelling is AEX-Notify, with a dash. Change.

Done


pal/src/host/linux-sgx/host_exception.c line 141 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Here and everywhere: the correct spelling is AEX-Notify, with a dash. Change.

Done

@dimakuv
Copy link
Author

dimakuv commented Oct 22, 2024

Jenkins, retest Jenkins-Direct-24.04-Sanitizers please (ppoll01 LTP test failed, unrelated to this PR as this PR only changes gramine-sgx and not gramine-direct). Also, this LTP failure is like this one: #1996 (comment)

@dimakuv
Copy link
Author

dimakuv commented Oct 22, 2024

Jenkins, retest Jenkins-SGX-22.04-EDMM please (cryptography.exceptions.InternalError: Unknown OpenSSL error). Not related to this PR, see #2023 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant