feat: always output config reloader file even if application is not ready #1076

TheSpiritXIII · 2024-07-16T19:05:32Z

This change causes the config-reloader to start running before Prometheus is running.

The problem is that the config-reloader never runs if the Prometheus instance starts crash-looping. This can create a small delay before Prometheus actually starts scraping metrics. Since the config-reloader won't update configs until Prometheus is ready, the config-reloader today has to wait for the Prometheus instance to recover first: the Prometheus instance will startup with the old configuration, and only after will be updated with the new configuration.

When we move flags to the configuration-side, this becomes critical. This is because the rule-evaluator will not startup without the appropriate configuration. Since it never starts up, the config-reloader will never write the configuration. They are essentially both waiting for each other, like a deadlock. Without this, this PR fails: #1059

As mentioned earlier, the config-reloader must hit a reload URL. As a "hack", I made it hit a readiness endpoint served by itself that functionally does nothing. When Prometheus comes up, it starts hitting the Prometheus' readiness endpoint.

cmd/config-reloader/main.go

pintohutch · 2024-08-02T02:53:14Z

I am a little confused, given configuration reloading can happen at any time duration of the Prometheus runtime. We also have the initContainer ensuring an empty config is present before Prometheus starts to prevent transient crashlooping.

Can you elaborate on why this is necessary or why the other PR is failing without this?

…eady

TheSpiritXIII · 2024-08-02T14:39:22Z

@pintohutch thanks, I expanded on the PR description. Please let me know if it doesn't make sense!

bwplotka

Thanks a lot!

I think the description is out-dated. We decided to reload by restarting internal rule manager, so this PR is not strictly blocking security hardening. HOWEVER I fully agree we should do this for other reasons (logic and faster scrape/recording).

I disagree with the way it's done here, I proposed some alternatives (:

cc @dashpole for knowledge sharing 👍🏽 and @bernot-dev for opinions (:

Skipping this for 0.14 (if my understanding is correct, we don't need this)

bwplotka · 2024-09-18T08:41:48Z

cmd/config-reloader/main.go

+	tempReloader := newReloader(nil, &url.URL{
+		Scheme: "http",
+		// Since Prometheus is not ready, we won't be able to hit the reload URL so hit itself.
+		Host: listenAddr.String(),


Yea, I would advocate to stop hacking like that (: Bit too much.

All the reloader code is upstream. I technically maintain it with Thanos team. Either we fork/vendor that code or we fix there, this is too hacky.

On top of that literally nothing technically fails (as is - stops working) if you just run normal reloader as we have now (without creating new) even if Prometheus is down. The reloading will retry, increment metric and log error. - expected if Prometheus is slowly starting.

To me all we need to do is either:

Actually do what has been proposed 1y+ ago, just remove readiness check. Same arguments why it's not necessary apply now.

We could keep readiness check but start reloader early in the background if we want readiness crash (I don't think that's useful)

Change reloader upstream, so there is a lib function that only generates config, easy.

Embed config-reloader code to rule-eval.

TheSpiritXIII force-pushed the TheSpiritXIII/config-ready branch 2 times, most recently from 6c4038c to 42d972c Compare July 17, 2024 14:56

TheSpiritXIII force-pushed the TheSpiritXIII/config-output branch from c2dac78 to 2ba5c2e Compare July 17, 2024 19:22

TheSpiritXIII marked this pull request as ready for review July 31, 2024 14:02

TheSpiritXIII requested review from pintohutch, bwplotka and bernot-dev July 31, 2024 14:02

bernot-dev reviewed Jul 31, 2024

View reviewed changes

cmd/config-reloader/main.go Outdated Show resolved Hide resolved

TheSpiritXIII mentioned this pull request Aug 2, 2024

feat: add config-reloader readiness #1075

Open

TheSpiritXIII force-pushed the TheSpiritXIII/config-ready branch from 42d972c to 28e6cf4 Compare August 2, 2024 14:30

feat: always output config reloader file even if application is not r…

de41003

…eady

TheSpiritXIII force-pushed the TheSpiritXIII/config-output branch from 2ba5c2e to de41003 Compare August 2, 2024 14:38

TheSpiritXIII removed request for pintohutch and bwplotka August 5, 2024 17:53

bwplotka requested changes Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: always output config reloader file even if application is not ready #1076

feat: always output config reloader file even if application is not ready #1076

TheSpiritXIII commented Jul 16, 2024 •

edited

Loading

pintohutch commented Aug 2, 2024

TheSpiritXIII commented Aug 2, 2024

bwplotka left a comment

bwplotka Sep 18, 2024

feat: always output config reloader file even if application is not ready #1076

Are you sure you want to change the base?

feat: always output config reloader file even if application is not ready #1076

Conversation

TheSpiritXIII commented Jul 16, 2024 • edited Loading

pintohutch commented Aug 2, 2024

TheSpiritXIII commented Aug 2, 2024

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Sep 18, 2024

Choose a reason for hiding this comment

TheSpiritXIII commented Jul 16, 2024 •

edited

Loading