-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tiered Caching] Add a memory-efficient key lookup store for use in tiered cache #12527
[Tiered Caching] Add a memory-efficient key lookup store for use in tiered cache #12527
Conversation
Compatibility status:Checks if related components are compatible with change 1f7f460 Incompatible componentsSkipped componentsCompatible components |
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
Signed-off-by: Peter Alfonsi <[email protected]>
4965cd0
to
2e01c31
Compare
❌ Gradle check result for 4965cd0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 2e01c31: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for b4c8d87: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 928da04: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 5b26d35: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Peter Alfonsi <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12527 +/- ##
============================================
- Coverage 71.41% 71.38% -0.03%
- Complexity 59953 59981 +28
============================================
Files 4980 4982 +2
Lines 282131 282323 +192
Branches 40937 40960 +23
============================================
+ Hits 201476 201539 +63
- Misses 63928 64088 +160
+ Partials 16727 16696 -31 ☔ View full report in Codecov by Sentry. |
public static final Setting.AffixSetting<Boolean> USE_RBM_KEYSTORE_SETTING = Setting.suffixKeySetting( | ||
EhcacheDiskCache.EhcacheDiskCacheFactory.EHCACHE_DISK_CACHE_NAME + "use_keystore", | ||
(key) -> Setting.boolSetting(key, true, NodeScope) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be .use_keystore
, note the dot prefix. Same for below.
Also, Instead of defining like above should we have something like below?
public static final Setting.AffixSetting<String> KEYSTORE_SETTING = Setting.suffixKeySetting(
EhcacheDiskCache.EhcacheDiskCacheFactory.EHCACHE_DISK_CACHE_NAME + ".keystore",
(key) -> Setting.simpleStringSetting(key, "rbm", NodeScope)
);
Here we will mention "which" keystore to be used rather than having an explicit setting on RBM.
"rbm"
would be one of the implementation
And we can set this to null or ""
if we don't want to use it. In this case we will create a mock keyStore which does nothing.
This way we can eventually have different kind of keystores in the future and load them from respective plugins(if needed). As of now we can keep this inside the same plugin.
@sohami What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the dot prefix!
Signed-off-by: Peter Alfonsi <[email protected]>
❌ Gradle check result for 1f7f460: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peteralfonsi @sgup432 Have posted some general question before reviewing the implementation of the RBMKeystore
|
||
// For roaring bitmaps | ||
api 'org.roaringbitmap:RoaringBitmap:0.9.49' | ||
runtimeOnly 'org.roaringbitmap:shims:0.9.49' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason to not use 1.x instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this way back at the beginning of working on tiered caching, and at that point 0.9.49 was the most up to date version. Let me switch this to the 1.0 version and check nothing changes.
* int values. The internal representations may have collisions. Example transformations include a modulo | ||
* or -abs(value), or some combination. | ||
*/ | ||
public interface KeyLookupStore<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is specific to Ehcache plugin so should be in that plugin package
/** | ||
* Key for whether to use RBM keystore | ||
*/ | ||
public static final String USE_KEYSTORE_KEY = "use_keystore"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This key is for setting that defines keystore name so better to rename as keystore_name
* It also maintains a hash set of values which have had collisions. Values which haven't had collisions can be | ||
* safely removed from the store. The fraction of collided values should be low, | ||
* about 0.5% for a store with 10^7 values and a modulo of 2^28. | ||
* The store estimates its memory footprint and will stop adding more values once it reaches its memory cap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen to those values after the store reaches memory cap ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently they just stay in memory, and the disk cache always goes to disk to look for a key (since it could get a false negative from a full keystore). We could also change it so the values are deleted to free memory, but the keystore is still marked as being full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the disk cache always goes to disk to look for a key
This can cause frequent disk access if only the keys not present in RBM are accessed. We can have some form of eviction policy on the RBM to ensure this doesn't happen.
* The modulo increases the density of values, which makes RBMs more memory-efficient. The recommended modulo is ~2^28. | ||
* It also maintains a hash set of values which have had collisions. Values which haven't had collisions can be | ||
* safely removed from the store. The fraction of collided values should be low, | ||
* about 0.5% for a store with 10^7 values and a modulo of 2^28. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So collision will be dependent on the hashCode
of key and the modulo chosen for store. Also I see that hashcode used here is essentially the hashcode of java key objects which I don't think has very low collision rate. Or am I missing anything here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the 0.5% number is just from the modulo. I didn't realize Java's hash codes had a high collision rate, but if that's a concern, for the new pluggable caches like EhcacheDiskCache we always use a wrapper object (ICacheKey) as the key. So we could change the hashcode function to whatever we like.
V value = null; | ||
if (keystore.contains(key.hashCode()) || keystore.isFull()) { | ||
try { | ||
value = cache.get(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any numbers for get
calls from this Ehcache based implementation ? I wanted to understand how much is the overhead of these get
calls when key is absent in Ehcache, to see if we really need RBM here. Would expect probably some of these optimizations to be in Ehcache impl itself (but not necessary)
} | ||
int transformedValue = transform(value); | ||
|
||
writeLock.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we optimise locking here similar to how Concurrent Hash map locks over different buckets to increase write concurrency ?
Typical RBMs also has multiple segments so we can consider locking over individual segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a quick glance at the implementation, looks like RoaringBitmap leaves handling thread safety during add
operation to the caller. Maybe it will be a nice idea to open an issue on their repo to provide a concurrent alternative.
* It also maintains a hash set of values which have had collisions. Values which haven't had collisions can be | ||
* safely removed from the store. The fraction of collided values should be low, | ||
* about 0.5% for a store with 10^7 values and a modulo of 2^28. | ||
* The store estimates its memory footprint and will stop adding more values once it reaches its memory cap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the disk cache always goes to disk to look for a key
This can cause frequent disk access if only the keys not present in RBM are accessed. We can have some form of eviction policy on the RBM to ensure this doesn't happen.
CounterMetric numCollisions = collidedIntCounters.get(transformedValue); | ||
if (numCollisions == null) { // First time the transformedValue has had a collision | ||
numCollisions = new CounterMetric(); | ||
numCollisions.inc(2); // initialize to 2, since the first collision means 2 keys have collided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shouldn't 2 keys colliding count as one collision ?
protected final int modulo; | ||
private final int modulo_bitmask; | ||
// Since our modulo is always a power of two we can optimize it by ANDing with a particular bitmask | ||
KeyStoreStats stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing the stats can be abstracted to a parent class for reuse by different future implementations of the key lookup store.
KeyStoreStats stats; | ||
private RoaringBitmap rbm; | ||
private HashMap<Integer, CounterMetric> collidedIntCounters; | ||
private HashMap<Integer, HashSet<Integer>> removalSets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be friendly for the CPU cache as default Java implementations use Linked List for collisions and then convert into a RedBlack Tree after a certain threshold.
How about using another global RBM to handle collisions ?
This PR is stalled because it has been open for 30 days with no activity. |
This PR is stalled because it has been open for 30 days with no activity. |
Description
Adds a roaring bitmap-based key lookup store, designed to store integer hashcodes of keys for a future disk cache in a memory-efficient way. It stores values in the RBM with a modulo, as very sparse RBMs are not memory-efficient. This comes with a tradeoff of some rare collisions (~0.5% of values for the recommended modulo 2^28 in a store with 10^7 values). To handle this, we also maintain a hash set of values which have had collisions. Values with no collisions can be safely removed without risking any false negatives in contains(). The keystore has an optional memory cap and will not add more values once it grows too large. To enable this, we also had to make an RBM size estimator based on performance test data, since the built-in one is very inaccurate for randomly-distributed data like hashes:
This implementation is generic and can be used in other places as well.
We investigated different data structures (RBMs, sorted int[], hash sets, or a hybrid combination of all of these) for memory footprint and access time before settling on an RBM with this choice of modulo. While a sorted int[] can be more memory-efficient than an RBM when it contains less than about 50,000 values, it is much slower because it requires binary search to add or access elements. Hash sets are more memory efficient below about 5,000 values. However, we are mostly concerned with memory efficiency when the store contains many keys, so we decided against adding the complexity of switching between data structures. We expect to allocate 5% of the on-heap cache size to this keystore, which would allow many domains to store 2-10 million key hashes, based on a domain scan that looks for domains which will benefit from tiered caching:
Related Issues
Resolves #10309
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.