Skip to content

Commit

Permalink
Fix flaky test SegmentReplicationWithNodeToNodeIndexShardTests#testRe…
Browse files Browse the repository at this point in the history
…plicaClosesWhileReplicating_AfterGetCheckpoint (opensearch-project#12695)

This fixes a race condition in the test where the primary shard will still have an open file ref while shutting down.
This happens because we are fetching file refs inside the resolveCheckpointInfoResponseListener method right  after calling beforeIndexShardClosed.
BeforeIndexShardClosed will resolve replication listeners immediately and leave a possibility
of the primary attempting shut down before those refs are closed. We can resolve this using latches, but this test really doesn't need to simulate a primary response at all so removed it entirely.

Signed-off-by: Marc Handalian <[email protected]>
  • Loading branch information
mch2 authored Mar 18, 2024
1 parent 21b28f2 commit 5e2034c
Showing 1 changed file with 0 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,6 @@ public void testReplicaClosesWhileReplicating_AfterGetCheckpoint() throws Except
IndexShard primary = shards.getPrimary();
final IndexShard replica = shards.getReplicas().get(0);

final int numDocs = shards.indexDocs(randomInt(10));
primary.refresh("Test");

final SegmentReplicationSourceFactory sourceFactory = mock(SegmentReplicationSourceFactory.class);
Expand All @@ -124,7 +123,6 @@ public void getCheckpointMetadata(
) {
// trigger a cancellation by closing the replica.
targetService.beforeIndexShardClosed(replica.shardId, replica, Settings.EMPTY);
resolveCheckpointInfoResponseListener(listener, primary);
}

@Override
Expand All @@ -141,7 +139,6 @@ public void getSegmentFiles(
};
when(sourceFactory.get(any())).thenReturn(source);
startReplicationAndAssertCancellation(replica, primary, targetService);

shards.removeReplica(replica);
closeShards(replica);
}
Expand Down

0 comments on commit 5e2034c

Please sign in to comment.