External hostname is not updated if an ingress is added after relating a charm to Prometheus #368

simskij · 2022-09-21T14:13:02Z

Bug Description

See title. If you first relate to Prometheus and then to Traefik, it all works as expected. The other way around, no cigar.

To Reproduce

Environment

Relevant log output

Additional context

We could have used the ingress established/revoked events, but these are unfortunately fired prematurely

sed-i · 2022-10-13T07:33:43Z

Reproduction

After relating a charm to traefik, its metrics endpoint is not updated and prometheus reports health: down because it is no longer reachable via the local ip.

am, prom, trfk deployed and all in active/idle.
juju relate am:self-metrics-endpoint prom
- curl 10.1.13.182:9090/api/v1/targets | jq '.data.activeTargets'
  - "scrapeUrl": "http://10.1.13.180:9093/metrics",
  - "globalUrl": "http://10.1.13.180:9093/metrics",
  - "health": "up",
juju relate am trfk
- curl 10.1.13.182:9090/api/v1/targets | jq '.data.activeTargets'
  - "scrapeUrl": "http://10.1.13.180:9093/metrics",
  - "globalUrl": "http://10.1.13.180:9093/metrics",
  - "health": "down",
juju run-action trfk/0 show-proxied-endpoints --wait
- proxied-endpoints: '{"am": {"url": "http://10.128.0.3:80/wlcm2-am"}}'

update-status didn't help because the lib doesn't automatically observe it for sidecars, and alertmanager didn't pass custom refresh events.

Proposal 1

Users of MetricsEndpointProvider must be instructed to always set custom refresh events

self.metrics = MetricsEndpointProvider(
    # ...
    refresh_event=[  # needed for ingress
        self.ingress.on.ready_for_unit,
        self.ingress.on.revoked_for_unit,
        self.on.update_status,
    ]

Proposal 2

MetricsEndpointProvider should always observe update-status by default.

Proposal 3

MetricsEndpointProvider should update relation data every re-init.
I.e. the contructor MetricsEndpointProvider should call self._set_scrape_job_spec every instantiation, instead of registring it as an observer.

Proposal 4

Roll the responsibility to the user by introducing an update_endpoint method like we do in PrometheusRemoteWriteProvider.

Ideas? @dstathis @Abuelodelanada @rbarry82
cc: @PietroPasotti

rbarry82 · 2022-10-13T11:39:48Z

I think proposal #3 is preferable by far. It's idempotent, users don't have to do anything at all, it doesn't depend on update-status-interval or calling other events, and it can easily be removed from the library constructor when stripPrefix middleware lands in traefik, which makes this problem more or less disappear entirely (at least from an in-model/cluster perspective, as well as any external targets which have routable endpoints and don't need a path specified by any reverse proxy).

sed-i · 2022-10-13T13:02:56Z

Tested manually and the combination of:

Update metrics endpoint relation data every re-init #384 and
fetch-lib alertmanager-k8s-operator#103 (with the modified prom lib)

solves the issue.

With which charm did you experience this @simskij ? You may need to update charm code:

fetch-lib for prometheus_scrape
pass external_url to MetricsEndpointProvider
use correct port number for the job port = urlparse(self._external_url).port or 80

simskij · 2022-10-13T16:09:15Z

I saw it with the loki datasource in grafana after deploying it as a bundle.

sed-i · 2022-10-13T20:38:50Z

I saw it with the loki datasource in grafana after deploying it as a bundle.

If it's a loki datasource issue then perhaps it's not related to prometheus_scrape?

Maybe we need to manually call update_source in loki?
BTW, update_source seems very different from refresh_event.
@dstathis @rbarry82

rbarry82 · 2022-10-13T21:31:15Z

Maybe we need to manually call update_source in loki? BTW, update_source seems very different from refresh_event. @dstathis @rbarry82

update_source is just a superset of _set_unit_details which also allows passing additional fields, and was added explicitly for consumers to say "I have an ingress now, so update out-of-band in case GrafanaSourceProvider._source_url from the constructor is out of date".

Since Loki already uses the property in the constructor, update_source would be called when an ingress is added, yes, which allows setting/updating the Grafana relation data immediately after ingress_ready rather than waiting for some other event to re-trigger the constructor. We could do the same thing in grafana_source as is done here, but it would make sense from Loki's codebase to add it just after update_endpoint(...), since the semantics are the same. The Prometheus libraries have just obsessively avoiding having any public API at all which could be used for this purpose.

simskij · 2022-10-17T10:11:58Z

My bad, I saw it in Prometheus too, but it seems to have been resolved now.

simskij added Status: Triage Type: Bug labels Sep 21, 2022

This was referenced Oct 13, 2022

Update metrics endpoint relation data every re-init #384

Merged

Always refresh metrics endpoint reldata on update-status #385

Closed

fetch-lib canonical/alertmanager-k8s-operator#103

Merged

sed-i closed this as completed in #384 Oct 13, 2022

sed-i mentioned this issue Oct 18, 2022

Need to fetch-lib prometheus_scrape canonical/alertmanager-k8s-operator#105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External hostname is not updated if an ingress is added after relating a charm to Prometheus #368

External hostname is not updated if an ingress is added after relating a charm to Prometheus #368

simskij commented Sep 21, 2022

sed-i commented Oct 13, 2022 •

edited

Loading

rbarry82 commented Oct 13, 2022

sed-i commented Oct 13, 2022

simskij commented Oct 13, 2022

sed-i commented Oct 13, 2022

rbarry82 commented Oct 13, 2022

simskij commented Oct 17, 2022

External hostname is not updated if an ingress is added *after* relating a charm to Prometheus #368

External hostname is not updated if an ingress is added *after* relating a charm to Prometheus #368

Comments

simskij commented Sep 21, 2022

Bug Description

To Reproduce

Environment

Relevant log output

Additional context

sed-i commented Oct 13, 2022 • edited Loading

Reproduction

Proposal 1

Proposal 2

Proposal 3

Proposal 4

rbarry82 commented Oct 13, 2022

sed-i commented Oct 13, 2022

simskij commented Oct 13, 2022

sed-i commented Oct 13, 2022

rbarry82 commented Oct 13, 2022

simskij commented Oct 17, 2022

External hostname is not updated if an ingress is added after relating a charm to Prometheus #368

External hostname is not updated if an ingress is added after relating a charm to Prometheus #368

sed-i commented Oct 13, 2022 •

edited

Loading