-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Allow shard idle on indices with zero replicas #7761
Comments
@andrross @Bukhtawar I know there was concern on #7736 adding bi-modal behavior with/without replicas. I think the perf benefit here is large enough to accept this with some warning to users (and docs). I tried adding some validation to search.idle.after and replica count but this further coupled the settings & given they are in separate files did not cover all cases, particularly after index creation to add/remove replicas. I've re-opened this PR under #8173 and added a warning when search.idle is updated from default values. |
Thanks @mch2. Wondering if we could do something better. As in still honour Its also possible that there has been no indexing done on the shard in which case the primary forwarding might be moot. But once we get to polling mechanism where replicas are periodically checking for newer checkpoints, they can also consider getting a "hint" in response to disambiguate between "no-indexing shard-idle" vs "indexing but shard_idle" and accordingly selecting if primary forwarding is needed or not. Alternatively we can consider the coordinator logic to select which shards to route searches to, to prefer primary shard first to ensure requests are sent to primary, which does a refresh on demand, replicates segments before requests can be routed to replicas. Essentially I don't want indexing heavy workloads to suffer because they had a replica for availability to start with. We can even think about a simpler way to not do any forwarding to primary but return back stale records for the initial request, which will force a refresh on-demand on-primary. |
Are you suggesting we ping the primary before serving the search req, wait for segments to get replicated, and then issue the req on replica? Or simply ping the primary to start the replication cycle and then perform the req, knowing it would be stale? I think I'd prefer the latter here and have users route critical requests to primary where staleness is not an option.
Even with our existing push mechanism we would know if the replica has received a published checkpoint from the primary and that we are in no-indexing vs indexing shard_idle and to route the req. We might be able to also determine staleness by comparing translog global cp to the local processed cp of the replica.
The thinking behind disabling search_idle in the first place is that using Segment Replication would still outweigh the benefits of search_idle for indexing workloads. Related issue. The workload wouldn't suffer compared to docrep in terms of throughput, however it would increase refresh/flush/merge counts when refresh interval is low. I do like this simple approach of pinging the primary to refresh & start the cycle and serving a stale read from the replica. This is not much different than setting a high refresh interval with SR today and issuing a search. The risk is that read could be significantly behind. |
One other thing I'm thinking through is that in an idle state when a primary does eventually refresh would yield larger segments & more data that needs to be copied out to nodes and/or remote store. With SR and remote store pressure this may increase the likelihood of backpressure kicking in, particularly with synchronous remote store uploads. |
So to be clear here, this would also be a behavior change with search_idle where when the req does eventually come in, we are serving a stale read vs solely increasing latency. This may be more of problem for system level indices. |
@Bukhtawar I am working on a better solution here. Until then are we ok with #8173 to mitigate the perf issue with 0 replicas? |
@Bukhtawar : Can you please respond to this query. We would like to move forward with this solution in case there are no concerns. |
Opened separate issue to explore supporting idle with segrep, closing this as completed. |
Moving conversation from #7736 to a new issue.
Today indices using segment replication silently ignore shard idle. This is intentional because with segment replication replicas are only refreshed externally through their primary. So we cannot depend on a search request to specifically hit a primary in order to trigger segment copy.
There are few issues with this current implementation.
index.search.idle.after
it is ignored silently.40 shards 0 replicas.
baseline = docrep
Contender = segrep
Expected behavior
There should not be a performance hit for indices using SEGMENT replication type when there are no replicas.
Users should be aware that any specified value for index.search.idle.after will have no impact if they are using SEGMENT replication type with replicas.
The text was updated successfully, but these errors were encountered: