-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tiered Caching] Add a memory-efficient key lookup store for use in tiered cache #10874
[Tiered Caching] Add a memory-efficient key lookup store for use in tiered cache #10874
Conversation
Compatibility status:Checks if related components are compatible with change c700142 Incompatible componentsSkipped componentsCompatible components |
Gradle Check (Jenkins) Run Completed with:
|
e247aa5
to
8d13a17
Compare
Signed-off-by: Sagar <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 48185f8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for cdea52d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 62b80ca: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 6be8c6c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 61cb54d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for ce88ee8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 8201696: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 04cd84a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for f4e73b9: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 5261cdf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for c700142: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
This PR is stalled because it has been open for 30 days with no activity. |
Description
Adds a roaring bitmap-based key lookup store, designed to store integer hashcodes of keys for a future disk cache in a memory-efficient way. It stores values in the RBM with a modulo, as very sparse RBMs are not memory-efficient. This comes with a tradeoff of some rare collisions (~0.5% of values for the recommended modulo 2^28 in a store with 10^7 values). To handle this, we also maintain a hash set of values which have had collisions. Values with no collisions can be safely removed without risking any false negatives in contains(). The keystore has an optional memory cap and will not add more values once it grows too large. To enable this, we also had to make an RBM size estimator based on performance test data, since the built-in one is very inaccurate for randomly-distributed data like hashes:
This implementation is generic and can be used in other places as well.
We investigated different data structures (RBMs, sorted int[], hash sets, or a hybrid combination of all of these) for memory footprint and access time before settling on an RBM with this choice of modulo. While a sorted int[] can be more memory-efficient than an RBM when it contains less than about 50,000 values, it is much slower because it requires binary search to add or access elements. Hash sets are more memory efficient below about 5,000 values. However, we are mostly concerned with memory efficiency when the store contains many keys, so we decided against adding the complexity of switching between data structures. We expect to allocate 5% of the on-heap cache size to this keystore, which would allow many domains to store 2-10 million key hashes, based on a domain scan that looks for domains which will benefit from tiered caching:
Related Issues
Resolves #10309
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.