[BUG] Segment Replication - Resize API check can incorrectly pass when determining replica staleness. #10342
Labels
bug
Something isn't working
Indexing:Replication
Issues and PRs related to core replication framework eg segrep
Describe the bug
Today with SegRep we add a check during resize to ensure replicas are caught up before continuing with these operations.
To fix #10123 we need to flip how these metrics are computed by making staleness computations on the replica. This means that these APIs are subject to a delay where replicas must first receive the latest primary checkpoint.
To give an accurate picture of staleness, we need to fetch the on-reader SegmentInfos version from each shard and compare it with its primary. I'm thinking we create a separate transport level API to do this and report back because the cat segRep API is a lot heavier and is intended to give an at a glance view.
Expected behavior
The check should consider the SegmentInfos version instead of an estimated bytes computation.
The text was updated successfully, but these errors were encountered: