-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add check for UNWIND_PATCH_PAC_INTO_SCS, which reduces security compared to using both PAC + SCS #105
Comments
Hello @thestinger, Thanks a lot for the explanation. As I understand you, GrapheneOS uses:
So you recommend to check that CONFIG_UNWIND_PATCH_PAC_INTO_SCS is disabled to avoid security degradation. Am I right? By the way, could you please have a look at this part of the Linux Kernel Defence Map: |
Yes, that's correct. AOSP or the stock OS on the Pixel 8 uses PAC without SCS via CONFIG_UNWIND_PATCH_PAC_INTO_SCS and Clang CFI without BTI enabled. GrapheneOS uses PAC + SCS and Clang CFI + BTI. BTI would be useless if CFI had full coverage but it doesn't since they had to exclude a fair bit of stuff for compatibility with the architecture such as things like exception tables. They also excluded certain hooks for Android from Clang CFI for performance reasons, but that part of the exclusions will hopefully go away when the traditional Clang CFI is replaced by kCFI. kCFI should get closer to full coverage but as long as there's anything excluded it's still at least minimally useful.
Yes, since PAC is a sidegrade from SCS by itself. SCS is a deterministic mitigation itself and currently depends on ASLR to protect the deterministic metadata (shadow stack). PAC is purely probabilistic and the strength depends on the memory configuration which is quite annoying since a larger address space with better ASLR and more importantly lots of room for address space based mitigations reduces PAC security.
That looks correct. The PAC instructions at the start of functions are interpreted as BTI instructions for performance reasons to avoid needing BTI instructions in those functions, which means non-leaf functions which get protected by PAC don't need their own BTI instruction but also means that all non-leaf functions are considered indirectly callable even if the compiler can figure out they aren't such as functions marked static without their address taken. It doesn't really matter much since it's incredibly coarse either way, but PAC + BTI makes BTI a bit more coarse. It might also be worth distinguishing probabilistic vs. deterministic. Clang CFI (traditional or kCFI) and most of RAP is deterministic based on type signatures. RAP also has a probabilistic return defense via a form of XOR canary (Samsung also had something similar to the latter but I'm unsure if they still do). PAC is purely probabilistic. If you can predict/leak the values, you can bypass it. SCS is deterministic itself but lacks write protection for the shadow stack like Intel CET so it depends on ASLR for protecting that against arbitrary writes, but writes to the stack are protected against deterministically. I'd still call it deterministic for the main value but it does depend on ASLR for the broader threat model it doesn't do well against (and ASLR is much weaker in the kernel). I personally dislike the approach used for PAC and think they made a major mistake not providing a shadow stack and a different approach for protecting data. PAC is at odds with using bits more other purposes such as memory tagging and a larger address space. It's purely probabilistic. It also requires a lot of work to integrate, unlike memory tagging which only needs support in heap memory allocators such as malloc and allocations made by the compiler. MTE is also primarily aimed at detecting the initial memory corruption, not protecting specific targets but rather stopping the memory corruption occurring at all. It would be possible to use MTE to protect specific things but the main use is tagging every allocation which could have an overflow or use-after-free including stack allocations when using stack MTE. PAC is still worth using since it's there... but especially when using it only for protecting return values as is the case on Linux currently, it's such a disappointment. It would be so much better having deterministic hardware shadow stack support, more tagging bits for MTE and other mitigations focused on deterministic protections. We don't quite know what to do about PAC right now. If SCS didn't rely on ASLR to protect the shadow stack, we could just disable PAC in the kernel itself. SCS is trickier to fully deploy in userspace than the kernel so using PAC there is easier. It only demonstrates how much nicer the hardware shadow stack approach would be. It's not too late for ARM to add that. |
We're considering undoing this. The issue is that as part of GKI, they moved scheduler customizations to using hooks in the core kernel code which call into dynamically loaded kernel modules. This adds the overhead of calls into dynamic kernel modules which is increased with certain configuration options such as the full arm64 KASLR implementation for modules (not very valuable, since it only randomizes modules separately from the base kernel, which wouldn't happen without using modules anyway). Clang CFI before kCFI is particularly expensive for this case. I'm not sure how much kCFI will help with it. Pixel 8 is using the 5.15 LTS branch so there's no kCFI yet unless they backport it. They might move Pixels to the 6.1 LTS branch since they even have a test branch for the Pixel 6 based on 6.1 but it's not clear. New kernels have lots of regressions and previous Pixels didn't have the 5 and now 7 year support lifetimes they do now where moving to at least 1 new kernel branch starts to seem mandatory. |
We've also determined that enabling BTI is broken with CONFIG_UNWIND_PATCH_PAC_INTO_SCS enabled for the Pixel 8 kernel but his issue is likely fixed in mainline already or may not have ever been a problem there. They implemented Clang CFI, CONFIG_UNWIND_PATCH_PAC_INTO_SCS, etc. downstream first and then ported them to mainline later to be upstreamed so sometimes there are actually regressions in the mainline implementation compared to the initial GKI branch implementation. It's quite a mess. CFI is really only just becoming usable in mainline, particularly for x86. They were missing lots of required fixes for undefined behavior caught by CFI and other issues especially on x86 until recently. kCFI should result in broader adoption due to better performance so maybe it will get much better soon if traditional distributions actually start using it which they haven't so far. |
It would be nice if the recommendation to use this was at least removed since it's encouraging downgrading security if you have both SCS and PAC enabled. It considers it a failure for checking the GrapheneOS kernel even though we're doing something more secure by having both enabled. |
Currently, there is no consensus about this feature: KSPP/kspp.github.io#2 Refers to #105
@thestinger, thanks. |
The
UNWIND_PATCH_PAC_INTO_SCS
configuration option disables ShadowCallStack when PAC is supported by the hardware. it does this by removing the SCS instructions and dynamically patches PAC instructions into SCS instructions when PAC is unavailable.PAC is a purely probabilistic security feature which can be bypassed through brute force attacks. PAC normally has 16 bits in the default configuration with 39-bit address space and 4k pages, but it drops to 7 bits with a 48-bit address space. It's even lower in some of the other configurations. SCS is a deterministic security feature, but it lacks a way to protect the shadow stack from arbitrary writes. It's difficult to say which is better, but having both enabled is clearly better for security than only PAC.
SCS has higher overhead than PAC, but it was deemed acceptable enough to deploy it on Pixels in production long before PAC was available. Going from SCS to SCS + PAC isn't a big deal. When PAC is enabled, it adds entry/exit instructions to each function and the entry function replaces the BTI instruction in non-leaf functions since it counts as the BTI instruction too. BTI is enabled by default, but Google is currently disabling it for Android in the kernel because they use the overlapping Clang CFI feature (which will be replaced by Clang's kCFI implementation).
We're choosing to enable SCS in addition to PAC for GrapheneOS because we're concerned about going from a deterministic mitigation to a probabilistic one, and SCS was deemed cheap enough before so it should still be fine on significantly better hardware. GrapheneOS is choosing to enable BTI in addition to Clang CFI because there are indirect calls excluded from Clang CFI for architectural compatibility reasons. Google also excluded certain hooks for performance reasons. We're prefer to have kCFI already deployed along with architecture support to get full coverage, but we have to use what's available. We currently enable 48-bit address space which reduces PAC from 16 bit to 7 bit, so we're having to reconsider doing that. We don't like the design of the PAC feature and would greatly prefer having 8 bit or higher MTE (instead of only 4 bits) along with a hardware shadow stack like Intel CET for deterministic return protection instead of probabilistic PAC. PAC can be used for more than protecting returns, but currently it's only used for protecting returns. There are better ways to do things than PAC and we find it unfortunate ARM went with this for performance reasons which Google is going along with too.
The text was updated successfully, but these errors were encountered: