Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness #12776

neetikasinghal · 2024-03-19T23:33:17Z

Description

The explain itself is coming from Lucene: https://github.com/apache/lucene/blob/a6f70ad2bb0b682eb65feb522358ee6d16cad766/lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java#L432-L440

Here's some info on the docIds in lucene:

A DocId in lucene is not actually unique to the Index but is unique to a Segment. Lucene does this mainly to optimize writing and compression. Since it is only unique to a Segment, how can a Doc be uniquely identified at the Index level? The solution is simple. The segments are ordered. To take a simple example, an Index has two segments and each segment has 100 docs respectively. The DocId's in the Segment are 0-100 but when they are converted to the Index level, the range of the DocId's in the second Segment is converted to 100-200.
DocId's are unique within a Segment, numbered progressively from zero. But this does not mean that the DocId's are continuous. When a Doc is deleted, there is a gap.
The DocId corresponding to a document can change, usually when Segments are merged.

The test fails on the validation of range of the docIds in the assertion, the range changes as with indexRandomForMultipleSlices function there are several bogus documents ingested and deleted which could trigger background merges and cause the range of the docIds matched with the search query to change.

One of the solutions is to not the delete the bogus documents as part of the indexRandomForMultipleSlices, perform search for concurrent search and non-concurrent search and delete the bogus docs at the end. This is done as part of this PR.

Validated the success of the tests for 500 times.

Relates: #11681

Related Issues

Resolves #12318

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)
Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

neetikasinghal · 2024-03-19T23:35:02Z

@reta @jed326 please review

github-actions · 2024-03-19T23:40:06Z

❌ Gradle check result for f9cb4f5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-03-19T23:55:27Z

Compatibility status:

Checks if related components are compatible with change 8e3688e

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git]

jed326 · 2024-03-20T00:00:53Z

FYI I think the "resolves" issue is wrong:

[Backport 2.x] Fix PluginInfo bwc for opensearch_version field #12605

jed326

Perhaps I do not understand the problem correctly, but If the issue here is background merges are happening between the concurrent and non-concurrent search queries, would a simpler solution be to add a small sleep after indexing is completed so we wait for background merges to complete before we perform the search request?
Or we could even force merge to 2 segments ourselves?

It seems like making deletion of bogus docs optional is a pretty heavy duty solution to what seems like a pretty simple problem.

server/src/internalClusterTest/java/org/opensearch/search/nested/SimpleNestedExplainIT.java

test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java

github-actions · 2024-03-20T00:26:54Z

✅ Gradle check result for fce9d57: SUCCESS

codecov · 2024-03-20T00:27:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.38%. Comparing base (b15cb0c) to head (d346c9f).
Report is 71 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #12776      +/-   ##
============================================
- Coverage     71.42%   71.38%   -0.04%     
- Complexity    59978    60282     +304     
============================================
  Files          4985     5011      +26     
  Lines        282275   283659    +1384     
  Branches      40946    41117     +171     
============================================
+ Hits         201603   202501     +898     
- Misses        63999    64425     +426     
- Partials      16673    16733      +60

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

neetikasinghal · 2024-03-20T04:04:24Z

add a small sleep after indexing is completed so we wait for background merges to complete before we perform the search request?
Or we could even force merge to 2 segments ourselves?

@jed326 I thought the same, but looks like even a sleep time of 5 seconds or force merge (which defeats the purpose of indexing multiple docs)doesn't guarantee for successful runs of the tests.

Hence, I came up with this approach of not deleting the bogus docs before the search.

test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java

jed326 · 2024-03-20T16:30:16Z

Hence, I came up with this approach of not deleting the bogus docs before the search.

Instead of adding bogus docs could we just add more "real" docs instead?

neetikasinghal · 2024-03-20T17:55:19Z

Hence, I came up with this approach of not deleting the bogus docs before the search.

Instead of adding bogus docs could we just add more "real" docs instead?

I think it would be same as calling indexRandomForSlices function with deletion of bogus docs set to false, which is currently happening. This would cause code duplication.

jed326 · 2024-03-20T19:39:23Z

I think it would be same as calling indexRandomForSlices function with deletion of bogus docs set to false,

This would ingest bogus docs while what I was suggesting is modify the test to ingest more (looks like only 1 more would be needed?) non-bogus docs.

Overall it feels like not deleting bogus docs is a sledgehammer solution approach here that I don't think is generically useful for other tests so in my opinion it looks like a better solution to modify the singular test that is flaky.

Signed-off-by: Neetika Singhal <[email protected]>

neetikasinghal · 2024-03-21T19:58:12Z

I think it would be same as calling indexRandomForSlices function with deletion of bogus docs set to false,

This would ingest bogus docs while what I was suggesting is modify the test to ingest more (looks like only 1 more would be needed?) non-bogus docs.

Overall it feels like not deleting bogus docs is a sledgehammer solution approach here that I don't think is generically useful for other tests so in my opinion it looks like a better solution to modify the singular test that is flaky.

ok, so I modified the change to ingest one more doc and just have one shard for this test class such that multiple slices are created and there is no need to call the indexRandomForMultipleSlices for this test. As you said, we can also avoid the modification of the function as well. @jed326 thanks for your feedback.

github-actions · 2024-03-21T20:00:30Z

❌ Gradle check result for 8e3688e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

jed326

Awesome, I think this is a much more straightforward solution. Thanks!

github-actions · 2024-03-21T20:56:54Z

✅ Gradle check result for d346c9f: SUCCESS

neetikasinghal · 2024-03-21T22:13:48Z

@reta could u also take a look and merge if it looks good to you?

server/src/internalClusterTest/java/org/opensearch/search/nested/SimpleNestedExplainIT.java

Signed-off-by: Neetika Singhal <[email protected]> (cherry picked from commit 4010ff1) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…12859) (cherry picked from commit 4010ff1) Signed-off-by: Neetika Singhal <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…ch-project#12776) Signed-off-by: Neetika Singhal <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>

github-actions bot added the skip-changelog label Mar 19, 2024

neetikasinghal force-pushed the upstream-main branch from f9cb4f5 to fce9d57 Compare March 19, 2024 23:33

jed326 reviewed Mar 20, 2024

View reviewed changes

server/src/internalClusterTest/java/org/opensearch/search/nested/SimpleNestedExplainIT.java Outdated Show resolved Hide resolved

test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java Outdated Show resolved Hide resolved

github-actions bot added bug Something isn't working flaky-test Random test failure that succeeds on second run Search Search query, autocomplete ...etc labels Mar 20, 2024

neetikasinghal commented Mar 20, 2024

View reviewed changes

test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java Outdated Show resolved Hide resolved

neetikasinghal force-pushed the upstream-main branch from fce9d57 to 8e3688e Compare March 21, 2024 19:54

Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness

d346c9f

Signed-off-by: Neetika Singhal <[email protected]>

neetikasinghal force-pushed the upstream-main branch from 8e3688e to d346c9f Compare March 21, 2024 19:55

jed326 approved these changes Mar 21, 2024

View reviewed changes

reta reviewed Mar 21, 2024

View reviewed changes

server/src/internalClusterTest/java/org/opensearch/search/nested/SimpleNestedExplainIT.java Show resolved Hide resolved

jed326 reviewed Mar 21, 2024

View reviewed changes

server/src/internalClusterTest/java/org/opensearch/search/nested/SimpleNestedExplainIT.java Show resolved Hide resolved

reta approved these changes Mar 21, 2024

View reviewed changes

dblock approved these changes Mar 22, 2024

View reviewed changes

dblock merged commit 4010ff1 into opensearch-project:main Mar 22, 2024
33 of 40 checks passed

dblock added the backport 2.x Backport to 2.x branch label Mar 22, 2024

opensearch-trigger-bot bot mentioned this pull request Mar 22, 2024

[Backport 2.x] Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness #12859

Merged

neetikasinghal self-assigned this Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness #12776

Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness #12776

neetikasinghal commented Mar 19, 2024 •

edited

Loading

neetikasinghal commented Mar 19, 2024

github-actions bot commented Mar 19, 2024

github-actions bot commented Mar 19, 2024 •

edited

Loading

jed326 commented Mar 20, 2024

jed326 left a comment

github-actions bot commented Mar 20, 2024

codecov bot commented Mar 20, 2024 •

edited

Loading

neetikasinghal commented Mar 20, 2024

jed326 commented Mar 20, 2024

neetikasinghal commented Mar 20, 2024

jed326 commented Mar 20, 2024

neetikasinghal commented Mar 21, 2024

github-actions bot commented Mar 21, 2024

jed326 left a comment

github-actions bot commented Mar 21, 2024

neetikasinghal commented Mar 21, 2024

Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness #12776

Fix SimpleNestedExplainIT.testExplainMultipleDocs flakiness #12776

Conversation

neetikasinghal commented Mar 19, 2024 • edited Loading

Description

Related Issues

Check List

neetikasinghal commented Mar 19, 2024

github-actions bot commented Mar 19, 2024

github-actions bot commented Mar 19, 2024 • edited Loading

Compatibility status:

Incompatible components

Skipped components

Compatible components

jed326 commented Mar 20, 2024

jed326 left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 20, 2024

codecov bot commented Mar 20, 2024 • edited Loading

Codecov Report

neetikasinghal commented Mar 20, 2024

jed326 commented Mar 20, 2024

neetikasinghal commented Mar 20, 2024

jed326 commented Mar 20, 2024

neetikasinghal commented Mar 21, 2024

github-actions bot commented Mar 21, 2024

jed326 left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 21, 2024

neetikasinghal commented Mar 21, 2024

neetikasinghal commented Mar 19, 2024 •

edited

Loading

github-actions bot commented Mar 19, 2024 •

edited

Loading

codecov bot commented Mar 20, 2024 •

edited

Loading