-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Breaking changes for CCR plugin in OS 2.0 #2482
Comments
@saratvemulapalli What was this deprecated for? |
@saikaranam-amazon the change was intentional because soft deletes are not optional anymore and history doesnt exist in translog instead has to be fetched from lucene index. Could you try exploring the option of using:
Please see the conversation history on: #2077 (comment) Give this a shot and let us know how it goes. I'll keep the issue open |
@saratvemulapalli Please see this issue for context: #1100 Please provide the option to fetch the translog ops based on history source. |
Yes, previously as well that used to be the case. We have modified to hold the translog. This method is not alternative to the previous one as they don't give the same performance. |
Gentle reminder! - Any update? |
@saikaranam-amazon Does the option suggested above by @saratvemulapalli not work? |
No @dblock , we will need the previous method due to the above reason. |
@saikaranam-amazon What do you propose? @andrross @kartg Could I please get you two to pitch in here, what should we do? |
Please add the previous method back to support retrieval based on the source. - Peer recovery can still use the new method. |
@saikaranam-amazon from what I understand looks like we are not writing to translog anymore. From what you are saying its definitely an impact in performance when using lucene indices. But it feels like thats the only way. @nknize is our understanding correct, what would you suggest? |
@saikaranam-amazon also in terms of the numbers, do you happen to know how bad it would be to read from lucene index vs reading from translog? |
@saratvemulapalli Yes, captured in this issue: #1100
|
What version of Lucene did you benchmark? /cc @andrross For clarification, operations are still available and can be replayed. It's just stored in a Lucene index instead of a Translog file. Hence "HistorySource" being removed as an option. |
I just checked the commit. You were benchmarking with Lucene 8.9 which had a performance regression bug for Stored fields decompression that was fixed in Lucene 8.10 I suggest updating CCR to use the Lucene index source changes and rerun benchmarks. |
Ack. Let me check and run with 8.10 |
For 2.0 core: Ran with latest 2.0.0 (alpha) artifacts(April 6th 2022): Instance: i3.xlarge Test setup:
Seeing a regression of ~10%(avg) diff and Reference(Consider avg. CPU and total time for the fetch operations from the leader index): cc: @nknize |
are you using the OpenSearch default codec or are you relying on rally to set the codec for you? |
No, ESRally didn't set the codec. The index is created with the default codec. |
Can you link to the line where you're setting the codec? Did you run with any other tracks beside the NOAA set? |
No, I didn't set any codec for the index. It is the default codec that every index picks on new index creation in OpenSearch. |
Just a couple thoughts here.
|
This was used for the Tlog logic and not for the new method from the OS core.
I would like to add another option to add the methods back to the core package for 2.0 and have the path for deprecation in later 2.x release. Regarding the previous runs and various datasets used:
Setup is same as before
|
Thank you for posting this @saikaranam-amazon A few quick questions:
|
Since public preview is right around the corner and CCR will need to migrate to segrep for 3.0+ anyway, how about a middle ground for 2.0+:
Feel free to thumb up #2872 if agree |
Sorry, I am new to CCR implementation, but my biggest question is why would we rely on replaying indexing operations from |
|
@saikaranam-amazon please also confirm that you agree with the entire plan above, especially #2872 (#2886 (comment)). If we don't do this I'd rather not reintroduce the deprecated methods and ship with the regression. |
Concur with dB. We do not want to add technical debt with out a plan to remove it. |
@dblock @nknize Thanks for the interim workaround of #2886. We will have a plan in place to move away from deprecated method in OS 2.1 (refactoring core and plugin will need some design work). However, for merging CCR into core, we can explore options for OS 3.0 (java migration, segrep adoption etc.) - we might have to settle on a hybrid model where we continue to have functionality like security in the replication plugin and move replication engine to core. We also need to keep longer term items such as active-active replication in mind while rearchitecting. |
Thanks for the feedback Mike. There were a bunch of considerations for CCR to rely on translog at the time.
That said, we totally realize the immense gains we can get with segment replication (thanks to your blogs as well as other success stories.) With segment replication making its way into OpenSearch for same-cluster replicas, it is a good time to visit CCR based on segrep. |
We need to do this in 2.x instead of opening more core extension points to override mechansims we shouldn't be overriding; we've already be burnt by that once when KNN and CCR were incompatible due to both overridding the Engine.
As a first step we can refactor CCR to core and secure the endpoints separately in the security module. This fits with the design of core in 1.x/2.x and the secure endpoints will refactor to core once security becomes a first class citizen.
💯 We can implement in a later phase but CCR should be baked into the design consideration of segrep from the beginning to minimize the early tech debt surface area /cc @mch2 @kartg @anasalkouz @mikemccand |
Just dropping some thoughts out loud on this one: This sounds like a good use-case fit for the streaming index api (future home for the xlog and user defined durability policies). One idea is to have CCR filters in the streaming index api route copies of indexing requests to the follower cluster if filter is matched. Another idea, there could be co-located IndexWriters in the streaming index api of the leader cluster that filter the documents and flush follower segments to be replicated by segrep of the follower cluster only. In the former case you're using a form of docrep for inter-cluster replication but paying the penalty for additional network traffic (and possible failures). In the latter you're parallelizing the writes on the leader node and using segrep for inter and intra-cluster replication. Incidentally this also provides the "fail early" protection we have today by indexing in memory and only replicating when flushing. |
@saikaranam-amazon Care to add a summary with links to various PRs and close? |
+1, thanks for the details. More replies below:
Lucene has a powerful
That's a good point -- if you somehow need CCR more frequently than refreshes than replicating segments might be an issue. But does this really happen in practice?
Actually this is a big benefit of NRT segment replication -- the primary and replica ARE homogeneous. If replica is promoted, due to either primary node down, or simple load balancing freedom in the cluster (master can pick a different replica that has colder CPU than the hot primary to protect overall cluster throughput), it's the exact same segments. This is the same reason why scan/scroll can also be safely routed to any primary or its replicas with no loss of accuracy.
NRT segment replication fixes that too -- it can load balance nearly arbitrarily across primary + its replicas on every "pull next bytes" request. Just needs the right lease logic to keep the point-in-time snapshots lit.
I would rather we deprecate the inefficient and fragile "reindex all docs" approach and encourage users to switch to NRT segment replication. I agree keeping both options long-term is a bad choice.
Thanks. I've long felt (decade at least) that it was crazy that ES chose document replication. Lucene is naturally write-once at the file level and that makes for an obvious replication unit. And it yields awesome transactional semantics to users, point-in-time searching and efficient incremental backups, like ZFS's choice of write-once block level storage yielding efficient whole-filesystem snapshots to users. |
This is true! I wonder then, w/ segment replication is there even a need for CCR? Couldn't we just snapshot in the leader cluster and restore in the follower and use FLR on the follower to filter unneeded docs? Is there a use case I'm missing where CCR is needed at all if we're using a segment level replication design? Seems more appropriate for document level. |
+1 to explore this. Let the cloud storage layer (S3 or whatever) handle the geo-distribution of the bits. Snapshot in one geo and restore in the other. |
@nknize - +1 - opened a dedicated github issue to explore this in more detail. @mikemccand - Thanks, let me address the points below.
True, definitely worth considering. One of the reason for considering filtering at source while replicating was to provide a security posture where filtered out documents don't move out of cluster. However given that current CCR doesn't implement filtering yet, we could consider seeking input from community to see if there is real need for this usecase in the RFC.
Since CCR targets Disaster Recovery usecases, the thought was to ensure ack'd writes are replicated as soon as possible.
Yes agree. However since we didn't have segment replication implementation for same-cluster replica at the time, we were limited by the cons of non-homogeneous segments.
+1
Yes. The gains demonstrated in segment replication looks great. |
hi team, as this issue was tracking 2.0 CCR support (which is resolved), can we go ahead and close this issue? We will continue to discuss core migration under #2872 and segrep under #3020. @saikaranam-amazon can we please open one more issue on removing dependency on getHistory in 2.1? Thanks all. |
@nknize @mikemccand Thanks. Nice discussion and thoughts - continuing the discussion further in 373. |
Yes, we do have same documents across the runs. And we did set the codec that is optimized for speed in both runs.
Sure. Opened issue under CCR to track this: opensearch-project/cross-cluster-replication#375
Understood and based on previous runs and subsequent runs performed (for this issue - #2482), we do see performance impact and already opened issue to track the work for providing extension points (opensearch-project/cross-cluster-replication#375)
The analysis and the next steps will be captured in #3020 and discussion can be continued in that issue. |
Agree with eventual state for the CCR and we will address the plugin migration(to core as module) and segment replication in the up-coming releases, as migration to core will require lot of effort(kotlin/java migration, security module etc) and planning based on the already scheduled releases for 2.x. we will continue the discussion to move the plugin as module under core in #2872 and Segment replication under CCR under: #3020 |
For 2.0, we have added the deprecated translog operations under the engine and PR(#2886) is merged for 2.0. Based on the further discussion, we have following issues for different tracks:
Please continue the relevant discussions in the above issues. Closing this one. |
Describe the bug
CCR depends on IndexShard(getHistoryOperations) to retrieve translog operations for leader index.
In 2.0, this method is removed in this PR: #2077
Expected behavior
The text was updated successfully, but these errors were encountered: