-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Support sub aggregation in filter rewrite optimization #12602
Labels
Comments
bowenlan-amzn
added
Search:Aggregations
Search:Performance
and removed
untriaged
labels
Mar 11, 2024
getsaurabh02
added
v2.16.0
Issues and PRs related to version 2.16.0
and removed
v2.15.0
Issues and PRs related to version 2.15.0
labels
Jun 6, 2024
This was referenced Jun 18, 2024
Picking up this issue. |
This was referenced Aug 3, 2024
3 tasks
github-project-automation
bot
moved this to 2.17 (First RC 09/03, Release 09/17)
in OpenSearch Project Roadmap
Aug 30, 2024
getsaurabh02
changed the title
Support sub aggregation in filter rewrite optimization
[Proposal] Support sub aggregation in filter rewrite optimization
Sep 6, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Follow up task of #9310
Currently sub aggregation is not supported in filter rewrite optimization, only single date histogram is supported.
This makes the applicable scenarios very limited. It would be great we can find a way to support sub aggregation while applying the filter rewrite optimization.
I notice one possible path when applying the optimization to composite aggregation previously. There's a established pattern to defer the sub aggregation collection. The idea is to do the aggregation collection in 2 pass. 1st pass is to get the docIdSets per bucket, 2nd pass is to run the collection of the sub aggregation on these docIdSets per bucket.
OpenSearch/server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Lines 648 to 673 in 246557c
Theoretically, the performance improvement still comes from using index structure instead of iteration to get the matching docs to collect at the date histogram level. Sub aggregation collection on these matching docs is expected to be at same speed. And there would be some memory cost of saving the docIdSets for a certain period for 2nd pass.
In the end, we are expected performance improvement on these 2 operations from big5 workload. These operations have sub-aggregation.
Some other issues will also improve the performance of sub-aggregation, and they are coming from indexing side — compute some special index structure to improve the sub-aggregation performance, whereas this approach is focused on the query-time improvement.
#3734
#12498
The text was updated successfully, but these errors were encountered: