Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading FTL can lead to ASM followers failing to sync #2011

Closed
matt2e opened this issue Jul 9, 2024 · 0 comments · Fixed by #2028
Closed

Upgrading FTL can lead to ASM followers failing to sync #2011

matt2e opened this issue Jul 9, 2024 · 0 comments · Fixed by #2028
Assignees

Comments

@matt2e
Copy link
Collaborator

matt2e commented Jul 9, 2024

Upgrade happens, 3 leases get expired
Unable to sync asm-follower: error getting secrets list from leader: unavailable: read tcp 10.1.141.197:58350->10.1.104.37:8892: read: connection reset by peer
This happens for about 4 minutes after upgrade
Then the controller was manually restarted again to try and solve it
4 leases get expired
Errors stop happening

@matt2e matt2e self-assigned this Jul 9, 2024
@ftl-robot ftl-robot mentioned this issue Jul 9, 2024
matt2e added a commit that referenced this issue Jul 10, 2024
fixes #2011

Cause: 
- Coordinator only does it coordination logic when something calls
`Get()`
- If leader/follower coordinator's `Get()` function was not called then
an expired leader/follower would not change
- ASM follower syncs with what it expects to be the leader even if that
leader is not alive. Even if we get a new leader, the follower would not
be swapped out for a valid one until `Get()` was called. So it would
churn away failing to sync.

Fix:
- Proactively call `Get()` periodically so that coordination occurs even
if no calls to Get() would otherwise happen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant