Gaps in metrics #6322

arnitolog · 2024-02-06T23:30:00Z

arnitolog
Feb 6, 2024

Hello,

I have a cluster of agents that scrapes a bunch of metrics via Prometheus ServiceMonitor CRD.
And I'm investigating the cause of the gaps in some of my metrics. It looks like they are happening when a new agent is starting and picking up the task to scrape these metrics or the agent who scraped these metrics died.
here is the metric that I investigated:

the gap is 5 minutes
and here is a graph where we can see that a new agent was started at that time:

There are no errors in the logs.
So, the question is, are there any configuration options or arguments that declare how the agent's cluster handles new joins or leaves? Or probably any other suggestions on how to ensure high availability for metrics?

tpaschalis · 2024-02-08T15:56:50Z

tpaschalis
Feb 8, 2024
Maintainer

Hey, thanks for reaching out. This can be a problem with both clustering and hashmod sharding; although it may be a little easier to see in clustering due to the fact it can scale dynamically.

I've opened https://github.com/grafana/agent/issues/6333 to track a solution to this.

0 replies

arnitolog · 2024-02-08T15:59:18Z

arnitolog
Feb 8, 2024
Author

thanks @tpaschalis

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gaps in metrics #6322

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Gaps in metrics #6322

arnitolog Feb 6, 2024

Replies: 2 comments

tpaschalis Feb 8, 2024 Maintainer

arnitolog Feb 8, 2024 Author

arnitolog
Feb 6, 2024

tpaschalis
Feb 8, 2024
Maintainer

arnitolog
Feb 8, 2024
Author