Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different RedisFailover's sentinels join together #6

Open
dbackeus opened this issue Nov 12, 2024 · 5 comments
Open

Different RedisFailover's sentinels join together #6

dbackeus opened this issue Nov 12, 2024 · 5 comments

Comments

@dbackeus
Copy link

dbackeus commented Nov 12, 2024

I'm reopening what was probably the most critical bug before the fork: spotahome#550

It can result in complete data loss for a cluster. Not just in theory, it happened in our production k8s cluster. Eg. if cluster A gets mixed up with cluster B and cluster A ends up being elected master for all nodes, the previously existing data of cluster B will get completely overwritten by cluster A's.

I still believe the appropriate solution is to start relying on hostnames instead of injecting pod IP's into config as mentioned in spotahome#550 (comment)

@samof76
Copy link
Collaborator

samof76 commented Dec 4, 2024

This will fix it? #3

Also was planning to implement, network policies so no egress is allow from the sentinels except within the namespace.

@samof76
Copy link
Collaborator

samof76 commented Dec 4, 2024

@dbackeus Can you please simulate with those changes? in the #3

@dbackeus
Copy link
Author

dbackeus commented Dec 4, 2024

Bad timing, we ditched the operator in favour of our own templates two weeks back. If you're interested here's what we ended up with: reclaim-the-stack/get-started@736a3af

So I won't be spending time on testing. This would be my feedback though:

Your PR should prevent sentinels from accidentally electing new masters across redis clusters, which is certainly better than what we have currently.

However, it can still be confusing that cross cluster sentinels start interacting, and it would not prevent the same problem from occurring on the Redis side.

Eg. redis-cluster-A-1 is following redis-cluster-A-2. Some disaster occurs forcing some pods to reschedule and you end up with redis-cluster-B-1 getting the IP of redis-cluster-A-1 and causing redis-cluster-A-2 to re-sync from cluster B.

The root problem is relying on IP's in the first place. Hostnames is really the way to go in Kubernetes since IP's are not reliable / deterministic.

@samof76
Copy link
Collaborator

samof76 commented Dec 5, 2024

I agree with you. This definitely not solve the entire problem, within the cluster, and across if you have unique names for your redis failover, which we have currently, as do not and will not run into this issue. I would definitely pick this one as feature to be enableb by a flag in the CRD. Do you have comments on this?

@samof76 samof76 pinned this issue Dec 5, 2024
@dbackeus
Copy link
Author

No additional comments, no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants