Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tiered Caching] Add a memory-efficient key lookup store for use in tiered cache #12527

Closed

Conversation

peteralfonsi
Copy link
Contributor

Description

Adds a roaring bitmap-based key lookup store, designed to store integer hashcodes of keys for a future disk cache in a memory-efficient way. It stores values in the RBM with a modulo, as very sparse RBMs are not memory-efficient. This comes with a tradeoff of some rare collisions (~0.5% of values for the recommended modulo 2^28 in a store with 10^7 values). To handle this, we also maintain a hash set of values which have had collisions. Values with no collisions can be safely removed without risking any false negatives in contains(). The keystore has an optional memory cap and will not add more values once it grows too large. To enable this, we also had to make an RBM size estimator based on performance test data, since the built-in one is very inaccurate for randomly-distributed data like hashes:
Screenshot 2023-10-23 at 4 21 30 PM

This implementation is generic and can be used in other places as well.

We investigated different data structures (RBMs, sorted int[], hash sets, or a hybrid combination of all of these) for memory footprint and access time before settling on an RBM with this choice of modulo. While a sorted int[] can be more memory-efficient than an RBM when it contains less than about 50,000 values, it is much slower because it requires binary search to add or access elements. Hash sets are more memory efficient below about 5,000 values. However, we are mostly concerned with memory efficiency when the store contains many keys, so we decided against adding the complexity of switching between data structures. We expect to allocate 5% of the on-heap cache size to this keystore, which would allow many domains to store 2-10 million key hashes, based on a domain scan that looks for domains which will benefit from tiered caching:
Screenshot 2023-10-06 at 10 49 12 AM

Related Issues

Resolves #10309

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • [N/A] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [N/A] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Mar 4, 2024

Compatibility status:

Checks if related components are compatible with change 1f7f460

Incompatible components

Skipped components

Compatible components

Peter Alfonsi added 2 commits March 4, 2024 11:43
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Peter Alfonsi added 3 commits March 4, 2024 11:43
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for 4965cd0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for 2e01c31: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for b4c8d87: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for 928da04: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 4, 2024

❌ Gradle check result for 5b26d35: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 5, 2024

✅ Gradle check result for af35c6d: SUCCESS

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 89.84772% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 71.38%. Comparing base (9814eb9) to head (af35c6d).
Report is 346 commits behind head on main.

Current head af35c6d differs from pull request most recent head 1f7f460

Please upload reports for the commit 1f7f460 to get more accurate results.

Files Patch % Lines
...pensearch/cache/keystore/RBMIntKeyLookupStore.java 90.32% 8 Missing and 7 partials ⚠️
.../opensearch/cache/store/disk/EhcacheDiskCache.java 70.58% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12527      +/-   ##
============================================
- Coverage     71.41%   71.38%   -0.03%     
- Complexity    59953    59981      +28     
============================================
  Files          4980     4982       +2     
  Lines        282131   282323     +192     
  Branches      40937    40960      +23     
============================================
+ Hits         201476   201539      +63     
- Misses        63928    64088     +160     
+ Partials      16727    16696      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@peteralfonsi
Copy link
Contributor Author

@sohami and @msfroh, would you mind taking a look at this PR when you have time?

@harshavamsi harshavamsi added the v2.13.0 Issues and PRs related to version 2.13.0 label Mar 7, 2024
Comment on lines 117 to 120
public static final Setting.AffixSetting<Boolean> USE_RBM_KEYSTORE_SETTING = Setting.suffixKeySetting(
EhcacheDiskCache.EhcacheDiskCacheFactory.EHCACHE_DISK_CACHE_NAME + "use_keystore",
(key) -> Setting.boolSetting(key, true, NodeScope)
);
Copy link
Contributor

@sgup432 sgup432 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be .use_keystore, note the dot prefix. Same for below.

Also, Instead of defining like above should we have something like below?

public static final Setting.AffixSetting<String> KEYSTORE_SETTING = Setting.suffixKeySetting(
        EhcacheDiskCache.EhcacheDiskCacheFactory.EHCACHE_DISK_CACHE_NAME + ".keystore",
        (key) -> Setting.simpleStringSetting(key, "rbm", NodeScope)
    );

Here we will mention "which" keystore to be used rather than having an explicit setting on RBM.

"rbm" would be one of the implementation
And we can set this to null or "" if we don't want to use it. In this case we will create a mock keyStore which does nothing.

This way we can eventually have different kind of keystores in the future and load them from respective plugins(if needed). As of now we can keep this inside the same plugin.

@sohami What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the dot prefix!

Signed-off-by: Peter Alfonsi <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 8, 2024

❌ Gradle check result for 1f7f460: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@sohami sohami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peteralfonsi @sgup432 Have posted some general question before reviewing the implementation of the RBMKeystore


// For roaring bitmaps
api 'org.roaringbitmap:RoaringBitmap:0.9.49'
runtimeOnly 'org.roaringbitmap:shims:0.9.49'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason to not use 1.x instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this way back at the beginning of working on tiered caching, and at that point 0.9.49 was the most up to date version. Let me switch this to the 1.0 version and check nothing changes.

* int values. The internal representations may have collisions. Example transformations include a modulo
* or -abs(value), or some combination.
*/
public interface KeyLookupStore<T> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is specific to Ehcache plugin so should be in that plugin package

/**
* Key for whether to use RBM keystore
*/
public static final String USE_KEYSTORE_KEY = "use_keystore";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This key is for setting that defines keystore name so better to rename as keystore_name

* It also maintains a hash set of values which have had collisions. Values which haven't had collisions can be
* safely removed from the store. The fraction of collided values should be low,
* about 0.5% for a store with 10^7 values and a modulo of 2^28.
* The store estimates its memory footprint and will stop adding more values once it reaches its memory cap.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen to those values after the store reaches memory cap ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently they just stay in memory, and the disk cache always goes to disk to look for a key (since it could get a false negative from a full keystore). We could also change it so the values are deleted to free memory, but the keystore is still marked as being full.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the disk cache always goes to disk to look for a key

This can cause frequent disk access if only the keys not present in RBM are accessed. We can have some form of eviction policy on the RBM to ensure this doesn't happen.

* The modulo increases the density of values, which makes RBMs more memory-efficient. The recommended modulo is ~2^28.
* It also maintains a hash set of values which have had collisions. Values which haven't had collisions can be
* safely removed from the store. The fraction of collided values should be low,
* about 0.5% for a store with 10^7 values and a modulo of 2^28.
Copy link
Collaborator

@sohami sohami Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So collision will be dependent on the hashCode of key and the modulo chosen for store. Also I see that hashcode used here is essentially the hashcode of java key objects which I don't think has very low collision rate. Or am I missing anything here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the 0.5% number is just from the modulo. I didn't realize Java's hash codes had a high collision rate, but if that's a concern, for the new pluggable caches like EhcacheDiskCache we always use a wrapper object (ICacheKey) as the key. So we could change the hashcode function to whatever we like.

V value = null;
if (keystore.contains(key.hashCode()) || keystore.isFull()) {
try {
value = cache.get(key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any numbers for get calls from this Ehcache based implementation ? I wanted to understand how much is the overhead of these get calls when key is absent in Ehcache, to see if we really need RBM here. Would expect probably some of these optimizations to be in Ehcache impl itself (but not necessary)

}
int transformedValue = transform(value);

writeLock.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we optimise locking here similar to how Concurrent Hash map locks over different buckets to increase write concurrency ?
Typical RBMs also has multiple segments so we can consider locking over individual segments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a quick glance at the implementation, looks like RoaringBitmap leaves handling thread safety during add operation to the caller. Maybe it will be a nice idea to open an issue on their repo to provide a concurrent alternative.

* It also maintains a hash set of values which have had collisions. Values which haven't had collisions can be
* safely removed from the store. The fraction of collided values should be low,
* about 0.5% for a store with 10^7 values and a modulo of 2^28.
* The store estimates its memory footprint and will stop adding more values once it reaches its memory cap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the disk cache always goes to disk to look for a key

This can cause frequent disk access if only the keys not present in RBM are accessed. We can have some form of eviction policy on the RBM to ensure this doesn't happen.

CounterMetric numCollisions = collidedIntCounters.get(transformedValue);
if (numCollisions == null) { // First time the transformedValue has had a collision
numCollisions = new CounterMetric();
numCollisions.inc(2); // initialize to 2, since the first collision means 2 keys have collided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shouldn't 2 keys colliding count as one collision ?

protected final int modulo;
private final int modulo_bitmask;
// Since our modulo is always a power of two we can optimize it by ANDing with a particular bitmask
KeyStoreStats stats;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing the stats can be abstracted to a parent class for reuse by different future implementations of the key lookup store.

KeyStoreStats stats;
private RoaringBitmap rbm;
private HashMap<Integer, CounterMetric> collidedIntCounters;
private HashMap<Integer, HashSet<Integer>> removalSets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be friendly for the CPU cache as default Java implementations use Linked List for collisions and then convert into a RedBlack Tree after a certain threshold.

How about using another global RBM to handle collisions ?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels May 1, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v2.13.0 Issues and PRs related to version 2.13.0
Projects
Status: Planned work items
Development

Successfully merging this pull request may close these issues.

[Tiered Caching] [Milestone 1] KeyLookup store for disk cache
5 participants