prometheus.operator.podmonitors does not discover all targets consistently #5839
Labels
bug
Something isn't working
frozen-due-to-age
Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
What's wrong?
We are encountering an issue with agent
v0.37.2 flow mode
withprometheus.operator.podmonitors
. Clustering is enabled for both the agent and the component.One of the podMonitor CR is discovering a different number of pods over time. We run this CR in multiple clusters. It would discover all targets, then after ~12hrs, it would only discover some of them for ~12hrs; after that, it would discover all targets again. The pattern then repeats itself. We are sure that the number of pods remained the same during that period.
Note that this is only happening in one CR, all the other scrape jobs are fine.
One interesting piece of information is that we have another CR that shares a similar name with the problematic one. For example, the working one is called “demo/v1-otelcollector” and the non-working one is called “demo/v1-otelcollector-tempo”. I wonder if this arrangement is hitting a bug.
We did not see the same issue with static mode operator.
Steps to reproduce
Create 2 deployments first. These will be monitored by podmonitors.
Then create the podmonitors in the same namespace.
System information
No response
Software version
Grafana Agent v0.37.2
Configuration
No response
Logs
No response
The text was updated successfully, but these errors were encountered: