-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform unreferenced file cleanup for any operation failure and mute flaky test #12128
Conversation
Compatibility status:Checks if related components are compatible with change 56424ec Incompatible componentsIncompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/alerting.git] |
❌ Gradle check result for 22ba8eb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…flaky tests Signed-off-by: RS146BIJAY <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #12128 +/- ##
============================================
+ Coverage 71.30% 71.34% +0.04%
- Complexity 59393 59507 +114
============================================
Files 4925 4925
Lines 279540 279539 -1
Branches 40646 40645 -1
============================================
+ Hits 199333 199448 +115
+ Misses 63580 63517 -63
+ Partials 16627 16574 -53 ☔ View full report in Codecov by Sentry. |
Signed-off-by: rishavz_sagar <[email protected]>
❕ Gradle check result for 56424ec: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
// clean up all unreferenced files on best effort basis created during failed merge and reset the | ||
// shard state back to last Lucene Commit | ||
if (shouldCleanupUnreferencedFiles() && isOperationFailureDueToIOException(failure)) { | ||
logger.info("Cleaning up unreferenced files created during failed merge due to: {}", reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would be doing extra clean ups on may be other shards as well due to this . Please evaluate if we can avoid the same and do clean up only on the responsible shard instead
This PR is stalled because it has been open for 30 days with no activity. |
if (shouldCleanupUnreferencedFiles() && isOperationFailureDueToIOException(failure)) { | ||
logger.info("Cleaning up unreferenced files created during failed merge due to: {}", reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to log if the exception isn't IOE
@RS146BIJAY : are you still working on this change ? |
This PR is stalled because it has been open for 30 days with no activity. |
Description
As of now, we cleanup unreferenced files whenever last write is performed by merge and it caused disk to get full and shard to fail. Incase some other operation performs the last write and caused disk to get 100% full and a merge is ongoing, merge will just get aborted and no cleanup will be performed. Since when closing the shard it is the other operation causing disk full and not segment merge.
In order to fix this we need to cleanup unreferenced files when any operation failed due to disk full.
Related Issues
#12054
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.