You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let me explain the scenario. We have a set of redis pods(say 3 master shards, each having 1 replica associated) behind a kubernetes Service IP. So initially, while connecting to hiredis-cluster we pass the Service IP as the target IP. hiredis-cluster then discovers the set of ips behind the service later by means of 'cluster nodes' command and updates its internal map to have all the new ips (3 master ips) as reachable ips.
But let's say at some point in time all the 3 master ips (and its replicas as well) went down simultaneously. I wanted to understand what the recovery mechanism is in such a scenario. Since hiredis-cluster doesn't have the k8s service Ip stored, it won't be able to relearn the new IPs (after reboot) if my understanding is correct.
Is reinitialising the hiredis-cluster the only way forward?
The text was updated successfully, but these errors were encountered:
Interesting, as of now you assessment that a re-initialization is needed in this scenario is correct.
The initially given addresses are replaced with the knowledge (addresses) from the Redis cluster.
Is this triggered by tests or is it a likely scenario with the Redis operator you are using?
What do you believe is the best way to handle this scenario?
Should the initially added nodes be kept and used as a last resort when attempting to get the slot information from the cluster.
..or should there be an additional API for this?
Ya as part of some resiliency tests, this particular scenario was attempted.
I was thinking storing the k8s fqdn/ip which is initially passed to hiredis-cluster in the cache would be the best option. Since service is not bound to change at any point (for power failure sort of scenarios), if we tag that to the current cluster discovery mechanism, we could fallback to the k8s service if all the previously learned nodes return an error or something like that.
zuiderkwast
changed the title
Behaviour when all redis master shards go down simultaneously.
Behaviour when all redis nodes go down simultaneously.
Mar 16, 2023
Hi @bjosv,
Let me explain the scenario. We have a set of redis pods(say 3 master shards, each having 1 replica associated) behind a kubernetes Service IP. So initially, while connecting to hiredis-cluster we pass the Service IP as the target IP. hiredis-cluster then discovers the set of ips behind the service later by means of 'cluster nodes' command and updates its internal map to have all the new ips (3 master ips) as reachable ips.
But let's say at some point in time all the 3 master ips (and its replicas as well) went down simultaneously. I wanted to understand what the recovery mechanism is in such a scenario. Since hiredis-cluster doesn't have the k8s service Ip stored, it won't be able to relearn the new IPs (after reboot) if my understanding is correct.
Is reinitialising the hiredis-cluster the only way forward?
The text was updated successfully, but these errors were encountered: