Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Differentiate post merge checkpoints #10872

Open
1 of 2 tasks
mch2 opened this issue Oct 23, 2023 · 1 comment
Open
1 of 2 tasks

[Segment Replication] Differentiate post merge checkpoints #10872

mch2 opened this issue Oct 23, 2023 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep

Comments

@mch2
Copy link
Member

mch2 commented Oct 23, 2023

Is your feature request related to a problem? Please describe.
Today SegRep uses the ReplicationCheckpoint to compute staleness of replicas, report stats, and enforce backpressure for lagging replicas. It would be great to differentiate in metrics & exclude from backpressure computations the lag between checkpoints that are strictly post-merge refreshes where the searchable doc count does not change.

Describe the solution you'd like
While this data should still be surfaced it should be excluded from backpressure computations.
/_cat/segment_replication should still show ongoing lag but have a separate column to identify the lag for syncing to a merged checkpoint.
This should also be surfaced through _nodes/stats API as a separate metric for both bytes & time lag.

Perhaps we can restructure the cat SR api to show a row for each checkpoint the replica is lagging and a isMerge column.

Describe alternatives you've considered
Leaving as is / silently excluding these checkpoints.

@mch2 mch2 added enhancement Enhancement or improvement to existing feature or request untriaged labels Oct 23, 2023
@anasalkouz anasalkouz added Indexing:Replication Issues and PRs related to core replication framework eg segrep and removed untriaged labels Nov 9, 2023
@mch2
Copy link
Member Author

mch2 commented Dec 14, 2023

Rather than differentiating checkpoints - we should make an implementation of IndexWriter.IndexReaderWarmer similar to Lucene NRT's PreCopyMergedSegmentWarmer.

At this point the gaps are as follows:

  • Support copying of files that are not associated with any ReplicationCheckpoint out to replicas. This should be abstracted to support both node-node and remote store based copy.
  • Create a new function through IndexShard to handle initiating the copy that does not ack until all replicas are current.
  • Pass the function to an implementation of IndexReaderWarmer that can be wired into EngineConfig so that the warmer can be set in IndexWriterConfig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep
Projects
None yet
Development

No branches or pull requests

2 participants