-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuning Lucene Codec Block sizes to Optimize Performance #12029
Tuning Lucene Codec Block sizes to Optimize Performance #12029
Conversation
21f71af
to
8908ce8
Compare
Compatibility status:Checks if related components are compatible with change a3fdc2a Incompatible componentsIncompatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer-rca.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git] |
❌ Gradle check result for 21f71af: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
26bbf9e
to
3d3a133
Compare
❌ Gradle check result for 26bbf9e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 8908ce8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 3d3a133: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
3d3a133
to
f9013a2
Compare
❌ Gradle check result for f9013a2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
f9013a2
to
c2a617f
Compare
❌ Gradle check result for c2a617f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sarthak Aggarwal <[email protected]>
c2a617f
to
9dbf157
Compare
❌ Gradle check result for 9dbf157: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sarthak Aggarwal <[email protected]>
❌ Gradle check result for 5b40132: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sarthak Aggarwal <[email protected]>
4226705
to
a3fdc2a
Compare
❌ Gradle check result for 4226705: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12029 +/- ##
============================================
+ Coverage 71.36% 71.39% +0.02%
- Complexity 59468 59514 +46
============================================
Files 4925 4927 +2
Lines 279458 279503 +45
Branches 40631 40636 +5
============================================
+ Hits 199446 199557 +111
+ Misses 63437 63333 -104
- Partials 16575 16613 +38 ☔ View full report in Codecov by Sentry. |
public static final CompressionMode BEST_COMPRESSION_MODE = new DeflateWithPresetDictCompressionMode(); | ||
|
||
// Shoot for 10 sub blocks of 8kB each. | ||
private static final int BEST_SPEED_BLOCK_LENGTH = 10 * 16 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says 8kB while the value is 16kB.
What is the effect of this change w.r.t i) disk size, ii) indexing performance and iii) search performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what is the ongoing maintenance burden of this change? It looks like we'll be on the hook to implement new codec wrappers on every Lucene minor version. And is it plausible that future version of Lucene codecs might not expose the configurability we rely on here and then leave us in a bad spot regarding compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks folks for taking a look. This PR should have ideally been in draft stage ~ have moved it to draft.
Thanks for bringing these points up, I'm working on the fresh benchmarks, and correctness as of now.
Will get back on these questions in some time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we'll be on the hook to implement new codec wrappers on every Lucene minor version.
Yes, we will be maintaining new codec wrappers. Let me think if there is another way to get around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we let customers control the page size? For example, MySQL lets customers configure the page size (https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_page_size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we think through the maintenance and compatibility implications of a change like this before we merge it as a fully supported option.
This PR is stalled because it has been open for 30 days with no activity. |
This PR is stalled because it has been open for 30 days with no activity. |
Hi @sarthakaggarwal97 are you still planning on moving forward with this change? I see you created an associated issue (great!) but am not sure what the current status of this is. |
@scrawfor99 thanks for reaching out. I have documented my findings and experiments over here for reference: #3769 (comment) 8K Block size provides the best performance in terms of search latency over stored fields, and for users who are concerned about storage, OpenSearch supports Zstandard via custom-codecs plugin which gives nice trade offs between storage and latency. I feel there is not enough merit to make this change as it will introduce additional maintenance effort for the community upon every lucene version upgrade. I will close this PR for now. If the community feels that users can benefit from exposing this option, I am happy to take this forward. |
Description
Adds support to configure chunkSize aka Block Size for the
default
andbest_compression
codecs.Related Issues
Resolves #7475, #3769
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.