Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't log at error level in lease timeout #2151

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

stuartwdouglas
Copy link
Collaborator

We don't want error alerts if a controller is just failing over, this PR adds some infrasturcture to allow for a lower level of error reporting when the errors all occur within the lease TTL.

fixes #2133

@stuartwdouglas stuartwdouglas requested review from a team and matt2e and removed request for a team July 24, 2024 07:34
@ftl-robot ftl-robot mentioned this pull request Jul 24, 2024
@stuartwdouglas stuartwdouglas force-pushed the stuartwdouglas/error-filtering branch 2 times, most recently from a0dfc60 to cfd52fc Compare July 24, 2024 23:23
return false
}
return true
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it'd be neater to do if !ok and then return early

@@ -51,7 +53,14 @@ func (f *asmFollower) sync(ctx context.Context, values *xsync.MapOf[Ref, SyncedV
IncludeValues: &includeValues,
}))
if err != nil {
return fmt.Errorf("error getting secrets list from leader: %w", err)
if f.errorFilter.ReportLeaseError() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this is too general, like was mentioned on this morning's call. What @alecthomas said about responses that include errors vs network errors feels like a good way to differentiate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an example somewhere of how to determine the type of error? The library docs mention connect.Error but that does not seem to match what is in the datadog logs.

I never thought I would miss Java checked exceptions, but they do make situations like this easier.

Copy link
Collaborator

@wesbillman wesbillman Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not useful for this case, but here's an example of checking connect errors: https://github.com/TBD54566975/ftl/blob/main/backend/controller/ingress/handler.go#L55-L62

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated it to use the return codes to decide on if it should attempt to filter

Copy link
Collaborator

@matt2e matt2e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good direction @stuartwdouglas as this does feel like something that would be useful for more things that end up using leader/follower.

I think you mentioned on the call this morning that you thought it might be useful for anything that used leases? I'm less sure about that but maybe I havent thought of use cases, or maybe I'm misremembering. Either way my feel is what you've done in the code here is the right place, but I'm interested to hear if you had other ideas?

@stuartwdouglas stuartwdouglas force-pushed the stuartwdouglas/error-filtering branch 6 times, most recently from 726d3ef to 8dd27a1 Compare July 29, 2024 00:40
We don't want error alerts if a controller is just failing over, this PR adds some infrasturcture to allow for a lower level of error reporting when the errors all occur within the lease TTL.

fixes #2133
@stuartwdouglas stuartwdouglas force-pushed the stuartwdouglas/error-filtering branch from 8dd27a1 to dc6903d Compare August 6, 2024 22:08
@alecthomas
Copy link
Collaborator

@matt2e are you happy with this now?

Copy link
Collaborator

@matt2e matt2e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 looks good

@stuartwdouglas stuartwdouglas added this pull request to the merge queue Aug 12, 2024
Merged via the queue into main with commit 1fa898f Aug 12, 2024
67 checks passed
@stuartwdouglas stuartwdouglas deleted the stuartwdouglas/error-filtering branch August 12, 2024 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lower ASM follower sync error log level for expected initial case
4 participants