-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow #59424
base: main
Are you sure you want to change the base?
perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow #59424
Conversation
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/KeyManagement/KeyManagementOptions.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Outdated
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs
Show resolved
Hide resolved
src/DataProtection/DataProtection/src/SP800_108/ManagedSP800_108_CTR_HMACSHA512.cs
Outdated
Show resolved
Hide resolved
Code review notes
Scenario notesIs the goal to improve the performance of the [real-world?] AntiForgery benchmark or to improve the performance of DataProtection in a standalone benchmark? The PR description (and attached graph) make it sound like improving the performance of the crank-based benchmark is the goal, but no throughput measurement is provided for the changes in this PR. Please provide that graph. It would supply evidence that these changes have real-world impact and aren't just microbenchmark improvements. |
Thanks for detailed answer @GrabYourPitchforks! Firstly, I ran the Antiforgery benchmark multiple times, and I provided the results in the PR description. Re №3: I ran a BenchmarkDotNet for stackalloc with dynamic \ constant length of stackalloc (also with or without Re №2: Thanks for clarifying it, I will create issues on the dotnet/runtime explaining what API I would like to have to make DataProtection's flow dont use Re №1: Could you please describe how attack surface of the application is increased if pool buffers are used? Does that mean that pooling is easier to inject into via reflection for example? Actually, even if we will not be using pooling byte arrays, if I work with corelib to introduce APIs supporting
|
…r comments + distinguish .net10 and .netFx impls
Here is the progress:
@GrabYourPitchforks let me know what you think about №1, 4, 5 please |
@@ -55,6 +57,217 @@ public static void DeriveKeys(byte[] kdk, ArraySegment<byte> label, ArraySegment | |||
} | |||
} | |||
|
|||
#if NET10_0_OR_GREATER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.NET already has SP800108 in the box. We should probably just use the one that is built-in to .NET if it is available.
https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.sp800108hmaccounterkdf
I'm interested in the answer to this, my assumption was it's just more copies in the heap, so higher probability of appearing in a dump of the service |
Using pooled arrays increases the attack surface by allowing other components of the system to have managed references to sensitive key material. It is unfortunately quite common to see buggy components misuse the array pool. And to be fair, the bug would be in those components, not within ASP.NET. And the consequences of such bugs span the gamut of issues - data corruption, information disclosure, etc. If we allow the pool to contain sensitive data, then the stakes are raised, as it were. These buggy components might now inadvertently disclose sensitive plaintext data - or even active key material. Leaking the latter especially would cause the collapse of other security features within ASP.NET applications, including allowing attackers to carry our CSRF attacks or even mint their own administrative auth tokens. Pinning is similar. By keeping the sensitive data contained within a pinned array, we ensure that no copies of it are made before we have a chance to clear it, which limits the chances of buggy code (either through direct memory access or by leaking uninitialized memory from the array pool) to disclose it. To be sure, this is all defense-in-depth. In an otherwise properly functioning application, the security of DataProtection does not rely on any of this. These behaviors are solely to minimize the chances of other bugs in the system from manifesting in such a way that it undermines DataProtection. If you'd like to think of it another way: the information processed by DataProtection is so sensitive and failure would be so wildly catastrophic that this is an attempt to isolate the component from other failures within the same process. Any perf-vs-security tradeoff is always at the discretion of the feature team. And the metrics used to judge these tradeoffs are obviously subjective! But it is nevertheless a tradeoff, and the product team needs to stand behind a statement akin to:
-- Fun fact! The original design of this component didn't even store the keys in-proc. It actually stored the keys out-of-proc to further reduce the risk of a buggy component within the process leaking keys. The only reason we didn't move forward with it at GA is that it didn't work in Azure Web Sites. At the time, their sandbox saw our IPC mechanism as a potential threat, and it protected the stability of the machine by throttling our IPC channel to 100 transactions per second. That's 100 The design was very secure, but it was unworkably slow to the point where no real-world customer could ever depend on it. So for the sake of practicality we ended up with the design we have today. |
The goal of PR is to bring linux performance closer to windows performance for
DataProtection
scenario. Below is the picture of Antiforgery benchmarks on win vs lin machines.Results: DataProtection Benchmark
The benchmark I am relying on to show the result numbers is here, which is basically building default
ServiceProvider
, adding DataProtection via.AddDataProtection()
and callingIDataProtector.Protect()
orIDataProtector.Unprotect()
.Results: Antiforgery Benchmark
note: results below dont have the latest numbers, I will update them once we change to the new API
However, since we are originally looking at improving the Antiforgery performance on linux, I ran the Antiforgery benchmark including locally built dll's from this PR.
current aspnet core gives these stats on the benchmark:
RPS of run with changed dll's varies from run to run, therefore I ran it 10 times
But memory usage is stable with such values:
Which provide an evidence of ~10% of max app allocation size and ~5% RPS improvement.
Optimization details
I looked into
Unprotect
method forManagedAuthenticatedEncryptor
and spottedMemoryStream
usage and multipleBuffer.BlockCopy
usages. Also I saw that there is some shuffling ofbyte[]
data, which I think can be skipped and performed in such a way, that some allocations are skipped.In order to be as safe as possible, I created a separate
DataProtectionPool
which provides API to rent and return byte arrays. It is not intersecting withArrayPool<byte>.Shared
.ManagedSP800_108_CTR_HMACSHA512.DeriveKeys
is changed to explicit usageManagedSP800_108_CTR_HMACSHA512.DeriveKeysHMACSHA512
, because _kdkPrfFactory is anyway hardcoded to useHMACSHA512
. There is a static API allowing to hash without allocating kdkbyte[]
which is rented from the buffer:HMACSHA512.TryHashData(kdk, prfInput, prfOutput, out _);
Avoided usage of
DeriveKeysWithContextHeader
which allocates a separate intermediate array forcontextHeader
andcontext
. Instead passing the spansoperationSubkey
andvalidationSubkey
directly intoManagedSP800_108_CTR_HMACSHA512.DeriveKeys
ManagedSP800_108_CTR_HMACSHA512.DeriveKeysHMACSHA512
had 2 more arrays (prfInput
andprfOutput
), which now I am renting (viaDataProtectionPool
) or evenstackalloc
'ing. They are returned to the pool withclearArray: true
flag to make sure key material is removed from the memory after usage.In
Decrypt()
flow I am again using HashAlgorithm.TryComputeHash overload, which works based on theSpan<byte>
types, compared to previously used HashAlgorithm.ComputeHashIn
Decrypt()
flow changed usage to SymmetricAlgorithm.DecryptCbc() instead of CryptoTransform.TransformBlock() with same idea to useSpan<byte>
API instead of anotherbyte[]
allocation.Encrypt()
flow is reusing №1, №2 and №3 optimizations as wellEncrypt()
before was relying on theMemoryStream
andCryptoStream
to write data in the result buffer, but I am pre-calculating the length, and then doing a single allocation of result array:var outputArray = new byte[keyModifierLength + ivLength + cipherTextLength + macLength];
All required data is copied into the outputArray via APIs supportingSpan<byte>
.All listed optimizations are included in the
net10
TFM, but only some (№ 2, №3 and №6) are used innetstandard2.0
andnetFx
TFMs which DataProtection also targets.Related to #59287