-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi ranges traversal when doing date histogram rewrite optimization #13317
Support multi ranges traversal when doing date histogram rewrite optimization #13317
Conversation
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for c4d9055: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 02f5c14: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
76cc193
to
aaa7414
Compare
❌ Gradle check result for 76dcb65: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 76cc193: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for aaa7414: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for aaa7414: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for aaa7414: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <[email protected]>
aaa7414
to
1ee1070
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13317 +/- ##
============================================
+ Coverage 71.42% 71.55% +0.13%
- Complexity 59978 60991 +1013
============================================
Files 4985 5050 +65
Lines 282275 286987 +4712
Branches 40946 41591 +645
============================================
+ Hits 201603 205347 +3744
- Misses 63999 64618 +619
- Partials 16673 17022 +349 ☔ View full report in Codecov by Sentry. |
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
❌ Gradle check result for ba7c549: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 50126be: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for d739534: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-13317-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ed33488aa426bd618685729fc638adad763f6ff7
# Push it to GitHub
git push --set-upstream origin backport/backport-13317-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…mization (opensearch-project#13317) Signed-off-by: bowenlan-amzn <[email protected]>
…mization (opensearch-project#13317) Signed-off-by: bowenlan-amzn <[email protected]> remove unnecessary change log Signed-off-by: bowenlan-amzn <[email protected]>
…mization (opensearch-project#13317) Signed-off-by: bowenlan-amzn <[email protected]> remove unnecessary change log Signed-off-by: bowenlan-amzn <[email protected]>
…ram rewrite optimization (#13317) (#13533) * Support multi ranges traversal when doing date histogram rewrite optimization (#13317) Signed-off-by: bowenlan-amzn <[email protected]> remove unnecessary change log Signed-off-by: bowenlan-amzn <[email protected]> * Update rest tests skip version to include 2.15 Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
…am rewrite optimization (#13317) (#13522) * Support multi ranges traversal when doing date histogram rewrite optimization (#13317) Signed-off-by: bowenlan-amzn <[email protected]> remove unnecessary change log Signed-off-by: bowenlan-amzn <[email protected]> * Update rest tests skip version to include 2.15 Signed-off-by: bowenlan-amzn <[email protected]> --------- Signed-off-by: bowenlan-amzn <[email protected]>
Confirmed the performance improvements from all 4 workloads |
Description
When intersect and visit a BKD tree, the value of the point or the min/max of the inner node are ever-increasing.
The ranges rewritten from date histogram are also every-increasing, and connected with each other.
So, we can do a single traversal to populate the results for multiple ranges which makes sure that any points or inner node are only visited once.
And we can do it in a clever way that only when we meet a new value that not inside current range, we iterate to the next range, and we only need to collect into one range at any point of time.
This new value is possibly met when visiting a new node or visiting inside a leaf node.
Note it's hard to cover the traversal logic, it will need to ingest thousands or more documents (every leaf node contains 512 docs).
Benchmark Results
The threshold of doing filter rewrite is set to 70k, so every operation is using the optimization.
The JVM memory is 1g
r6g.xlarge instance (4vCPU, 32G memory), one node cluster
For each workload, old optimized results firt, then the new opimitzed results with multi range traversal, last one if exists would be the optimization disabled results which falls back to default aggregation workflow.
pmc
big5
nyc_taxis
http_logs
Related Issues
Resolves #13171 #13345
Check List
[ ] Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.