Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add things to know for concurrent segment search #6362

Merged
merged 7 commits into from
Feb 8, 2024

Conversation

jed326
Copy link
Contributor

@jed326 jed326 commented Feb 6, 2024

Description

Add "things to know" section for concurrent segment search.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jed326! Just a couple of suggestions.

_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
@hdhalter hdhalter added 3 - Tech review PR: Tech review in progress release-notes PR: Include this PR in the automated release notes v2.12.0 labels Feb 7, 2024
@jed326 jed326 force-pushed the cs-things-to-know branch from 1fcb205 to 07686f3 Compare February 7, 2024 20:15
Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Jay Deng <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jed326 @kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
Parent aggregations on [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) fields do not support the concurrent search model. Thus, if a search request contains a parent aggregation, the aggregation will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level.
The following aggregations do not support the concurrent search model. If a search request contains one of these aggregations, the request will be executed using the non-concurrent path even if concurrent segment search is enabled at the cluster level or index level.
- Parent aggregations on [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) fields. See [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/9316).
- Sampler and Diversified Sampler aggregations. See [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/110750).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that "Diversified Sampler" is meant to be capitalized and should not be formatted differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably just be "Diversified sampler"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it a common noun and should just be "diversified sampler"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the aggregation type is "Diversified sampler". Similar to "Significant terms" vs. "Terms" aggregations.

See: https://opensearch.org/docs/latest/aggregations/bucket/diversified-sampler/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, it should be in code font, then.

_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved

### Terms aggregations

When performing concurrent segment search, the `shard_size` parameter will be applied at the segment slice level. This may introduce additional document count error, which is correctly calculated by the `doc_count_error_upper_bound` response parameter for such cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "additional" necessary here, or should it just be "a document count error"? I don't believe we've referenced an initial error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jed326 Could you address this question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-concurrent search doc count error is introduced when the shards eliminate candidate buckets and we just added some more detailed documentation for this in #6355.
The "additional" here refers to how in additional to the doc count error introduced by the shards, there is (potentially) additional document count error introduced by the slices for concurrent segment search.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then if we want to keep the word "additional," we need to explain that, or the reader may ask "Where was the first error?"

_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved

When performing concurrent segment search, the `shard_size` parameter will be applied at the segment slice level. This may introduce additional document count error, which is correctly calculated by the `doc_count_error_upper_bound` response parameter for such cases.

For more details on how `shard_size` can affect both `doc_count_error_upper_bound` and collect buckets, see [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/11680#issuecomment-1885882985).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace "collect" with the appropriate word. Collector? Collection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jed326 Could you address this question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "collected"

_search-plugins/concurrent-segment-search.md Outdated Show resolved Hide resolved
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@kolchfa-aws kolchfa-aws merged commit 3369768 into opensearch-project:main Feb 8, 2024
3 checks passed
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 3 - Tech review PR: Tech review in progress labels Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete release-notes PR: Include this PR in the automated release notes v2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants