Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add alert if an OpenSearch scrape fails (#507)
If a scrape fails, this might indicate that a unit is not in a healthy state. OpenSearch right now does not have a metric saying that one node is down. E.g. If the systemd service is stopped in one node, the cluster (N nodes) will drop the faulty node because connectivity issues and the metrics will show that the cluster now has N-1 nodes without saying that one node has failed. With this new alert, at least a notification will appear if one node stop being responsive. How to test: - Deploy opensearch units - Stop the opensearch daemon in one of the units The grafana-agent injects the juju topology at the alert rule, so the expression `up < 1` will filter just for OpenSearch apps: ![image](https://github.com/user-attachments/assets/d09b22be-a571-4ec2-b76d-a654186df327) The alert will trigger: ![image](https://github.com/user-attachments/assets/a1ace958-1188-4b2b-840c-febfd639dd57)
- Loading branch information