Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: limit scrape_duration_seconds visualization to CHT instance #73

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2052,7 +2052,7 @@
"uid": "PBFA97CFB590B2093"
},
"editorMode": "code",
"expr": "scrape_duration_seconds",
"expr": "scrape_duration_seconds{instance=\"$cht_instance\"}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"expr": "scrape_duration_seconds{instance=\"$cht_instance\"}",
"expr": "scrape_duration_seconds unless (scrape_duration_seconds{job=\"cht\", instance!=\"$cht_instance\"} or scrape_duration_seconds{job=\"postgres\", cht_instance!=\"$cht_instance\"})",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Hi @jkuester, in our case, the monitoring stack is used beyond CHT and the changes suggested still include targets unrelated to CHT. It is also not quite clear to me why I should be interested in scrape duration for grafana for example, whilst I'm interested in processes/metrics that directly relate to CHT.

In my opinion, we should be explicit with the jobs we intend to target then exclude out-of-context instances though I'm still keen on understanding your POV.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay, I think I see where you are coming from here. This whole Monitoring Stack row was originally intended to be just an at-a-glance health-check for the Watchdog processes themselves (so Grafana, Prometheus, and the exporters). That being said, I understand the value in being able to easily check the scrape history for the CHT instance in context. It is almost like we are technically trying to achieve different goals:

  1. See overall health of Prometheus by seeing the historical trends for the for the scrape times of all targets. This lets you easily identify outliers or understand when a target is not getting scraped as expected.
  2. See the historical scrape performance for the particular CHT instance in context (informing, among other things, the freshness of the metrics for that instance).

Here is my proposal: lets have separate panels to address both of there goals! In this PR, we can leave the panel in the Monitoring Stack row as it was and instead add a new panel on the CHT Admin Details dashboard (maybe up at the top outside of all the existing rows?) that just shows the scrape data for the CHT instance in context (along with the Postgres instance for that instance if it exists). Then, I have logged #78 which can be addressed later to move the whole Monitoring Stack row out of the CHT Admin Overview dashboard.

Let me know what you think!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Monitoring Stack row should be moved out of the CHT Admin dashboard.

"instant": false,
"legendFormat": "__auto",
"range": true,
Expand Down