-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Separate index cache and data cache #4438
Conversation
<tr> | ||
<td><h5>lookup.hash-load-factor</h5></td> | ||
<td style="word-wrap: break-word;">0.75</td> | ||
<td>Float</td> | ||
<td>The index load factor for lookup.</td> | ||
</tr> | ||
<tr> | ||
<td><h5>lookup.index-cache-max-memory-size</h5></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just introduce an option like state.backend.rocksdb.memory.high-prio-pool-ratio
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
@@ -915,6 +915,13 @@ public class CoreOptions implements Serializable { | |||
.defaultValue(MemorySize.parse("256 mb")) | |||
.withDescription("Max memory size for lookup cache."); | |||
|
|||
public static final ConfigOption<Double> LOOKUP_CACHE_HIGH_PRIO_POOL_RATIO = | |||
key("lookup.cache.high-prio-pool-ratio") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lookup.cache.high-priority-pool-ratio may be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
public CacheManager( | ||
Cache.CacheType cacheType, MemorySize maxMemorySize, double highPrioPoolRatio) { | ||
Preconditions.checkArgument( | ||
highPrioPoolRatio >= 0 && highPrioPoolRatio < 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pay attention to the accuracy of floating-point numbers here, which may lead to judgment errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e, I think it's ok here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -61,9 +60,9 @@ private byte[] readFrom(long offset, int length) throws IOException { | |||
} | |||
|
|||
public MemorySegment getBlock( | |||
long position, int length, Function<byte[], byte[]> decompressFunc) { | |||
long position, int length, Function<byte[], byte[]> decompressFunc, boolean index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be isIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -423,6 +423,12 @@ | |||
<td>Double</td> | |||
<td>Define the default false positive probability for lookup cache bloom filters.</td> | |||
</tr> | |||
<tr> | |||
<td><h5>lookup.cache.high-priority-pool-ratio</h5></td> | |||
<td style="word-wrap: break-word;">0.25</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the source of this default value? 0.1 or 0.25?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if 0.1, the size is 25MB, 0.25 is 64MB. When I test, the bloom filter may occupy 5-10MB. So, I give a bigger (64MB) default index cache here.
static CacheKey forPosition(RandomAccessFile file, long position, int length) { | ||
return new PositionCacheKey(file, position, length); | ||
static CacheKey forPosition(RandomAccessFile file, long position, int length, boolean isIndex) { | ||
return new PositionCacheKey(file, position, length, isIndex); | ||
} | ||
|
||
static CacheKey forPageIndex(RandomAccessFile file, int pageSize, int pageIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this method? The invoker should always set the isIndex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for the pageIndex
, we always store it in the data cache now. Because we are unsure if this page is an index page.
Hi @Aitozi , consider modifying old hash store, can you test it in your production env? I am not sure if this modification is good for old hash store. |
Sure, will test it first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Linked issue: close #4437
Tests
API and Format
Documentation