Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Remove compounding retries within PrimaryShardReplicationSource #12800

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport 11644d5 from #12043.

This change removes retries within PrimaryShardReplicationSource and relies on retries in one place at the start of replication.
This is done within SegmentReplicationTargetService's processLatestReceivedCheckpoint after a failure/success occurs.
The timeout on these retries is the cause of flaky failures from SegmentReplication's bwc test within IndexingIT, that can occur
on node disconnect.  The retries will persist for over ~1m to the same primary node that has been relocated/shut down and cause the test to timeout.

This change also includes simplifications to the cancellation flow on the target service before the shard is closed.
Previously we "request" a cancel that does not remove the target from the ongoing replications collection until a cancellation failure is thrown.
The transport calls from PrimaryShardReplicationSource are no longer wrapped in CancellableThreads by the client so a call to "cancel" will not throw.
Instead we now immediately remove the target and decref/close it.

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit 11644d5)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@mch2
Copy link
Member

mch2 commented Mar 20, 2024

this backport unmutes what had been flaky testIndexingWithSegRep. Will run it a bunch to ensure it is no longer flaky before merge here.

Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 406d725

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git]

Copy link
Contributor

✅ Gradle check result for 406d725: SUCCESS

@dblock dblock merged commit 7fa96d2 into 2.x Mar 20, 2024
53 of 78 checks passed
@github-actions github-actions bot deleted the backport/backport-12043-to-2.x branch March 20, 2024 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants