Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus client metrics support #711

Closed
wants to merge 49 commits into from

Conversation

libretto
Copy link
Contributor

@libretto libretto commented Sep 6, 2023

This code introduces the Metrics class, which allows clients migrating to Karapace from alternative products to collect metrics similar to those they are accustomed to. It supports using Prometheus as an alternative to StatsD for collecting statistics in Karapace.

@libretto libretto requested review from a team as code owners September 6, 2023 22:48
@libretto
Copy link
Contributor Author

@jjaakola-aiven @aiven-anton could You review this PR too?

@libretto
Copy link
Contributor Author

@aiven-anton done. Review please.

@libretto
Copy link
Contributor Author

libretto commented Jan 11, 2024

BTW. Is the usage of .gauge() and .timing() in the following functions acceptable?

def are_we_master(self, is_master: bool) -> None:
    if not self.is_ready or self.stats_client is None:
        return
    if not isinstance(self.stats_client, StatsClient):
        raise RuntimeError("no StatsClient available")
    self.stats_client.gauge("master-slave-role", int(is_master))


def latency(self, latency_ms: float) -> None:
    if not self.is_ready or self.stats_client is None:
        return
    if not isinstance(self.stats_client, StatsClient):
        raise RuntimeError("no StatsClient available")
    self.stats_client.timing("latency_ms", latency_ms)

aiven-anton
aiven-anton previously approved these changes Jan 25, 2024
Copy link
Contributor

@eliax1996 eliax1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some tests?

@libretto
Copy link
Contributor Author

libretto commented May 1, 2024

Can we have some tests?

We can certainly add some tests. However, the current version of Karapace Stats lacks test coverage, so we have no existing framework to guide us. At this point, it seems that only unit tests are feasible. I'm unsure how to implement integration tests for this part task.

@eliax1996
Copy link
Contributor

Can we have some tests?

We can certainly add some tests. However, the current version of Karapace Stats lacks test coverage, so we have no existing framework to guide us. At this point, it seems that only unit tests are feasible. I'm unsure how to implement integration tests for this part task.

Let's kick off with the unit tests. I'll provide an example soon on how to execute an integration test. It's crucial to have a tangible test that ensures our output its compliant to the standard format.
We need to verify that compliant consumers can effectively utilize and extract the metrics.

Without integration tests, even if the feature is technically sound, we lack a mechanism preventing us from the introduction of code that might break the consumers



class Metrics(metaclass=Singleton):
stats_client: StatsClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats_client: StatsClient | None = None, I've run this locally with metrics_extended set to False and its failing because the stats_client its not going to be set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and unit test for this case is added.

@nosahama
Copy link
Contributor

nosahama commented Jun 17, 2024

Thoughts:

  • The PR description makes mention of only prometheus, it's confusing to see statsd implementations.
  • It'd be nice to see also somewhere in the code, all possible metrics and their related tags if any, that way I as a programming would be working based off an interface than just passing string values to the metrics handler.
  • I feel this could be simplified by providing only Prometheus support and using integrations when needed which should also be left to the users, i.e. if StatsD metrics are then needed, one could add the related mapping and deploy for themselves a statsd-exporter.
  • If properly tested and benchmarked, it doesn't hurt to run the WSGI server (stats) as it's own thread, but it doesn't feel right in 2024 to write this much application code just to export some metrics with all the tools available.
  • Prometheus is pull based, and as we already have a webserver running, we need to add an endpoint to scrape the metrics and probably a directory for the prometheus multiprocess registry.
  • Even though we want to provide similar metrics as confluent, we do not need to provide all of them via the application, we can provide a parent metric and then use promql and recording rules to get the rates, avg, etc.

I will continue the work by @eliax1996 and try to add a sample metric and also some integration test.

@libretto
Copy link
Contributor Author

libretto commented Jun 17, 2024

Thoughts:

* The PR description makes mention of only prometheus, it's confusing to see statsd implementations.

* It'd be nice to see also somewhere in the code, all possible metrics and their related tags if any, that way I as a programming would be working based off an interface than just passing string values to the metrics handler.

* I feel this could be simplified by providing only Prometheus support and using integrations when needed which should also be left to the users, i.e. if StatsD metrics are then needed, one could add the related mapping and deploy for themselves a [statsd-exporter](https://github.com/prometheus/statsd_exporter).

* If properly tested and benchmarked, it doesn't hurt to run the WSGI server (stats) as it's own thread, but it doesn't feel right in 2024 to write this much application code just to export some metrics with all the tools available.

* Prometheus is pull based, and as we already have a webserver running, we need to add an endpoint to scrape the metrics and probably a directory for the prometheus multiprocess registry.

I will continue the work by @eliax1996 and try to add a sample metric and also some integration test.

This PR introduces the Metrics class, aiming to implement metrics similar to those found in comparable products. We intend to support both StatsD and Prometheus. Are you suggesting that we should enable the concurrent use of both?
I just revised the PR description for better clarity.

@nosahama
Copy link
Contributor

nosahama commented Jun 17, 2024

Thoughts:

* The PR description makes mention of only prometheus, it's confusing to see statsd implementations.

* It'd be nice to see also somewhere in the code, all possible metrics and their related tags if any, that way I as a programming would be working based off an interface than just passing string values to the metrics handler.

* I feel this could be simplified by providing only Prometheus support and using integrations when needed which should also be left to the users, i.e. if StatsD metrics are then needed, one could add the related mapping and deploy for themselves a [statsd-exporter](https://github.com/prometheus/statsd_exporter).

* If properly tested and benchmarked, it doesn't hurt to run the WSGI server (stats) as it's own thread, but it doesn't feel right in 2024 to write this much application code just to export some metrics with all the tools available.

* Prometheus is pull based, and as we already have a webserver running, we need to add an endpoint to scrape the metrics and probably a directory for the prometheus multiprocess registry.

I will continue the work by @eliax1996 and try to add a sample metric and also some integration test.

This PR introduces the Metrics class, aiming to implement metrics similar to those found in comparable products. We intend to support both StatsD and Prometheus. Are you suggesting that we should enable the concurrent use of both? I just revised the PR description for better clarity.

I do not think it is necessary to have both StatsD and Prometheus within the application itself, especially if there are tools to integrate one on top of the other. Nevertheless, we already provide statsd support within the code, this is something to discuss internally, we'd go ahead and add prometheus metrics and then reason around what to do with the current stats implementation which both calls statsd and sentry at the same time.

@nosahama
Copy link
Contributor

@libretto I will close this PR once #902 is merged.
In #902 we've introduced the building blocks for instrumenting the service using Prometheus. Following this, we can add more metrics as needed, until it is sufficient enough to provide compatibility with Confluent SR metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants