Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External hostname is not updated if an ingress is added *after* relating a charm to Prometheus #368

Closed
simskij opened this issue Sep 21, 2022 · 7 comments · Fixed by #384

Comments

@simskij
Copy link
Member

simskij commented Sep 21, 2022

Bug Description

See title. If you first relate to Prometheus and then to Traefik, it all works as expected. The other way around, no cigar.

To Reproduce

Environment

Relevant log output

-

Additional context

We could have used the ingress established/revoked events, but these are unfortunately fired prematurely

@sed-i
Copy link
Contributor

sed-i commented Oct 13, 2022

Reproduction

After relating a charm to traefik, its metrics endpoint is not updated and prometheus reports health: down because it is no longer reachable via the local ip.

  1. am, prom, trfk deployed and all in active/idle.
  2. juju relate am:self-metrics-endpoint prom
  3. juju relate am trfk
  4. juju run-action trfk/0 show-proxied-endpoints --wait

Proposal 1

Users of MetricsEndpointProvider must be instructed to always set custom refresh events

self.metrics = MetricsEndpointProvider(
    # ...
    refresh_event=[  # needed for ingress
        self.ingress.on.ready_for_unit,
        self.ingress.on.revoked_for_unit,
        self.on.update_status,
    ]

Proposal 2

MetricsEndpointProvider should always observe update-status by default.

Proposal 3

MetricsEndpointProvider should update relation data every re-init.
I.e. the contructor MetricsEndpointProvider should call self._set_scrape_job_spec every instantiation, instead of registring it as an observer.

Proposal 4

Roll the responsibility to the user by introducing an update_endpoint method like we do in PrometheusRemoteWriteProvider.

Ideas? @dstathis @Abuelodelanada @rbarry82
cc: @PietroPasotti

@rbarry82
Copy link
Contributor

I think proposal #3 is preferable by far. It's idempotent, users don't have to do anything at all, it doesn't depend on update-status-interval or calling other events, and it can easily be removed from the library constructor when stripPrefix middleware lands in traefik, which makes this problem more or less disappear entirely (at least from an in-model/cluster perspective, as well as any external targets which have routable endpoints and don't need a path specified by any reverse proxy).

@sed-i
Copy link
Contributor

sed-i commented Oct 13, 2022

Tested manually and the combination of:

solves the issue.

With which charm did you experience this @simskij ? You may need to update charm code:

  • fetch-lib for prometheus_scrape
  • pass external_url to MetricsEndpointProvider
  • use correct port number for the job port = urlparse(self._external_url).port or 80

@simskij
Copy link
Member Author

simskij commented Oct 13, 2022

I saw it with the loki datasource in grafana after deploying it as a bundle.

@sed-i
Copy link
Contributor

sed-i commented Oct 13, 2022

I saw it with the loki datasource in grafana after deploying it as a bundle.

If it's a loki datasource issue then perhaps it's not related to prometheus_scrape?

Maybe we need to manually call update_source in loki?
BTW, update_source seems very different from refresh_event.
@dstathis @rbarry82

@rbarry82
Copy link
Contributor

Maybe we need to manually call update_source in loki? BTW, update_source seems very different from refresh_event. @dstathis @rbarry82

update_source is just a superset of _set_unit_details which also allows passing additional fields, and was added explicitly for consumers to say "I have an ingress now, so update out-of-band in case GrafanaSourceProvider._source_url from the constructor is out of date".

Since Loki already uses the property in the constructor, update_source would be called when an ingress is added, yes, which allows setting/updating the Grafana relation data immediately after ingress_ready rather than waiting for some other event to re-trigger the constructor. We could do the same thing in grafana_source as is done here, but it would make sense from Loki's codebase to add it just after update_endpoint(...), since the semantics are the same. The Prometheus libraries have just obsessively avoiding having any public API at all which could be used for this purpose.

@simskij
Copy link
Member Author

simskij commented Oct 17, 2022

My bad, I saw it in Prometheus too, but it seems to have been resolved now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants