Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature, observability: Added Prometheus Metrics & Instrumentation #902

Merged
merged 4 commits into from
Jun 19, 2024

Conversation

nosahama
Copy link
Contributor

@nosahama nosahama commented Jun 18, 2024

About this change - What it does

Add Observability to the Karapace services (registry, rest) via prometheus metrics and instrumentation.

Why this way

Using prometheus, we can take advantage of it's labels, dimensional data model, available tools and ecosystem. This provides the building blocks of adding observability into the service, with focus on extensibility and testing (unit and integration).

Caveats

  • Right now, we have added a few metric types majorly for the HTTP requests, but this can be extended with more metrics for the different operations of the system, i.e. schema reader metrics, etc.

  • We use the karapace prefix to act as a namespace and identifier for the service metrics.

  • A sample prometheus service is added to the docker compose setup to show the pulled metrics via the prometheus.scrape_configs, the UI is shown in the screenshot below.

  • To support StatsD, we can add a statsd-exporter service to the docker compose setup to test the provided mapping.

  • We've added scrape jobs for both the karapace-rest and karapace-registry services, both services expose metrics via the /metrics endpoint. The screenshot below shows the local prometheus UI:

    Screenshot 2024-06-18 at 14 37 04

Follow ups

  • See if we want to toggle the metrics enablement via environment variables
  • Add more metrics to the service, checkout the confluent metrics for SR
  • Configure buckets for the Histogram metric
  • Add grafana dashboards to the docker compose setup
  • Test the StatsD exporter if we truly want to still support StatsD
  • Add a few prometheus recording rules and alerts

References:

@nosahama nosahama requested review from a team as code owners June 18, 2024 12:54
@nosahama nosahama force-pushed the nosahama/prometheus-metrics branch 2 times, most recently from ccedf97 to 926af08 Compare June 18, 2024 13:31
Copy link

github-actions bot commented Jun 18, 2024

Coverage report

This PR does not seem to contain any modification to coverable code.

eliax1996
eliax1996 previously approved these changes Jun 19, 2024
Copy link
Contributor

@eliax1996 eliax1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, other than this I see no reason for not merging! Great job

karapace/instrumentation/prometheus.py Outdated Show resolved Hide resolved
tests/unit/instrumentation/test_prometheus.py Show resolved Hide resolved
@nosahama nosahama force-pushed the nosahama/prometheus-metrics branch 3 times, most recently from 40bc4a3 to 84f1c3b Compare June 19, 2024 11:00
@nosahama nosahama force-pushed the nosahama/prometheus-metrics branch 3 times, most recently from 84aa69c to b07e65b Compare June 19, 2024 11:48
nosahama added 4 commits June 19, 2024 14:06
- we add the `PrometheusInstrumentation` class to house available metrics
- we use a middleware to automatically instrument the HTTP requests mertrics, i.e. total, in progress, latency, etc.
- we add unit tests

[EC-299]
@nosahama nosahama force-pushed the nosahama/prometheus-metrics branch from b07e65b to addaaa2 Compare June 19, 2024 12:06
@eliax1996 eliax1996 enabled auto-merge June 19, 2024 12:58
@eliax1996 eliax1996 merged commit 370ee46 into main Jun 19, 2024
9 checks passed
@eliax1996 eliax1996 deleted the nosahama/prometheus-metrics branch June 19, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants