You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 26, 2023. It is now read-only.
Redis replicas can take a while to sync data when they're being added to an existing cluster with a lot of data. It would be nice to show operators that even if the node is not ok it's at least working on getting there.
The text was updated successfully, but these errors were encountered:
I'll expand on this. In a cluster redeployment, the node whose logs we are watching was a master, and its replica 656dc9b7acaefd7065849db76bf0648460aa83e9 is promoted while it shut down. After it comes back online, it becomes a replica again.
1436339:M 23 Jun 2022 16:31:49.560 # Configuration change detected. Reconfiguring myself as a replica of 656dc9b7acaefd7065849db76bf0648460aa83e9
Once it reports healthy, Nomad goes on to 656dc9b7acaefd7065849db76bf0648460aa83e9 and shuts it down while the sync is ongoing:
Jun 23, '22 09:33:04 -0700 Killed Task successfully killed
Jun 23, '22 09:33:04 -0700 Terminated Exit Code: 0
Jun 23, '22 09:33:02 -0700 Killing Sent interrupt. Waiting 5m0s before force killing
This shows up in the Redis replica as:
1436339:S 23 Jun 2022 16:33:04.065 # I/O error trying to sync with MASTER: connection lost
1436339:S 23 Jun 2022 16:33:06.803 # Error condition on socket for SYNC: (null)
1436339:S 23 Jun 2022 16:33:09.823 # Error condition on socket for SYNC: (null)
1436339:S 23 Jun 2022 16:33:17.878 # Error condition on socket for SYNC: (null)
1436339:S 23 Jun 2022 16:33:18.783 # Currently unable to failover: Disconnected from master for longer than allowed. Please check the 'cluster-replica-validity-factor' configuration option.
And then we lose data.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Redis replicas can take a while to sync data when they're being added to an existing cluster with a lot of data. It would be nice to show operators that even if the node is not
ok
it's at least working on getting there.The text was updated successfully, but these errors were encountered: