Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Set a maximum retries for Docker driver to avoid deadlock. #15026

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeschkies
Copy link
Contributor

What this PR does / why we need it:
There's a long ticket claiming that the Docker driver would deadlock when the configured Loki endpoint becomes unreachable. The root cause is that the Loki client retries forever until it can reach Loki again. This looks like a deadlock.

This issues is documented with a workaround. However, users still struggle. That's why this change proposes to make the workaround the default behavior.

Which issue(s) this PR fixes:
Fixes #2361

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@@ -153,7 +151,7 @@ func parseConfig(logCtx logger.Info) (*config, error) {
clientConfig.URL = flagext.URLValue{URL: url}

// parse timeout
if err := parseDuration(cfgTimeoutKey, logCtx, func(d time.Duration) { clientConfig.Timeout = d }); err != nil {
if err := parseDurationWithDefault(cfgTimeoutKey, logCtx, func(d time.Duration) { clientConfig.Timeout = d }, 10*time.Second); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just set the default on the config struct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean here? That's also used by Promtail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeschkies jeschkies changed the title [fix] Set a maximum retries for Docker driver to avoid deadlock. fix: Set a maximum retries for Docker driver to avoid deadlock. Nov 20, 2024
BackoffConfig: backoff.Config{
MinBackoff: client.MinBackoff,
MaxBackoff: client.MaxBackoff,
MaxRetries: client.MaxRetries,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The default is 10 so the driver should not deadlock it can just take almost an hour until the client gives up. For production use this should be fine.

Copy link
Contributor

@cyriltovena cyriltovena Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I call a deadlock nobody wants to wait more than 5s :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants