Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exponential backoff for connection retries #65

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

jrisc
Copy link
Collaborator

@jrisc jrisc commented Nov 18, 2024

Calls to socket.connect() are non-blocking, hence all subsequent calls to socket.sendall() will fail if the target KDC service is temporarily or indefinitely unreachable. Since the kdcproxy task uses busy-looping, it results in the journal to be flooded with warning logs.

This pull request introduces a per-socket reactivation delay which increases exponentially as the number of reties is incremented, until timeout is reached (i.e. 100ms, 200ms, 400ms, 800ms, 1.6s, 3.2s, ...).

(Also replaces the default root logger with a dedicated kdcproxy one)

@jrisc jrisc force-pushed the unreachable_kdc branch 2 times, most recently from a8d5c15 to eee16d0 Compare November 18, 2024 09:14
Copy link
Member

@simo5 simo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jrisc jrisc force-pushed the unreachable_kdc branch 2 times, most recently from 6cd07a4 to 4284384 Compare November 18, 2024 16:02
@jrisc jrisc changed the title Use exponential backoff for connection retries [WIP] Use exponential backoff for connection retries Nov 18, 2024
@jrisc
Copy link
Collaborator Author

jrisc commented Nov 18, 2024

I was trying to add the connection type, IP address, and port number in the warning message, but apparently such information cannot be fetched from the socket in Python 2. So I am doing to make some additional changes to report the timeout back to the parent function (where the address is available) to report the timeout rather than each connection failure.

@simo5
Copy link
Member

simo5 commented Nov 18, 2024

We still care about python 2 ?

@jrisc
Copy link
Collaborator Author

jrisc commented Nov 18, 2024

Yes, for RHEL 7 ELS until 2028.

@jrisc
Copy link
Collaborator Author

jrisc commented Nov 20, 2024

I made some improvements to log the timeout only once per Application.__await_reply() call. It is now logged this way:

WARNING:kdcproxy:Exchange with udp:[10.0.46.123]:88 failed: Timeout while sending packets after 2.00s and 4 tries: [Errno 32] Broken pipe

@simo5 I tested this code for Python 2. Could you enabled the CI tests so we can confirm it works with Python 3 too?

@jrisc jrisc changed the title [WIP] Use exponential backoff for connection retries Use exponential backoff for connection retries Nov 20, 2024
@jrisc jrisc force-pushed the unreachable_kdc branch 2 times, most recently from c054d97 to 75eb76f Compare November 21, 2024 17:14
Calls to socket.connect() are non-blocking, hence all subsequent calls
to socket.sendall() will fail if the target KDC service is temporarily
or indefinitely unreachable. Since the kdcproxy task uses busy-looping,
it results in the journal to be flooded with warning logs.

This commit introduces a per-socket reactivation delay which increases
exponentially as the number of reties is incremented, until timeout is
reached (i.e. 100ms, 200ms, 400ms, 800ms, 1.6s, 3.2s, ...).

Signed-off-by: Julien Rische <[email protected]>
Signed-off-by: Julien Rische <[email protected]>
Signed-off-by: Julien Rische <[email protected]>
@jrisc jrisc merged commit 0606ca5 into latchset:main Nov 22, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants