You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to configure our Alertmanager to send us alerts on Discord such that we
can be informed of anything not being right as part of the monitoring setup on
lovelace.
The text was updated successfully, but these errors were encountered:
As discussed in the dev-ops channel, I think we can reach a configuration here that utilizes our existing High-Availability AlertManager setup.
We can set up token access for Prometheus on Ansible machines to push alerts through to the Kubernetes HA AlertManager.
Some notes:
This does not mean we cannot route alerts differently, push to different areas based on different severity or anything like this, we still maintain full granular control over alert routing, even more so since we centralise it.
We can write a small healthcheck (systemd timer, cronjob, etc.) to check whether the alertmanager server is healthy and responding to requests, if it is not then we can trigger a rudimentary alert from netcup reporting that the AlertManager is down. I don't think this needs to be a separate instance of AlertManager as that feels overcomplex and leads to duplication of routing configuration.
Just to clarify this from a discussion on Discord, this is about adding a "dead
man's switch" alert that will route to Discord in case the Netcup Prometheus
instance can't contact the Alertmanager in Kubernetes properly. To cover this
case we want to:
add alerts in Prometheus in case it cannot talk to Alertmanager properly
(there are built-in metrics for this exported by Prometheus)
add a local alertmanager configured to send alerts to Discord
configure the local alertmanager & Prometheus instances such that only the
newly added alerts from above are routed to the local alertmanager and
everything else is still routed to the Kubernetes alertmanager.
jchristgit
changed the title
Set up Prometheus alerting on Discord in Ansible
Netcup Prometheus alerting for unreachable Alertmanagers
May 1, 2024
We need to configure our Alertmanager to send us alerts on Discord such that we
can be informed of anything not being right as part of the monitoring setup on
lovelace.
The text was updated successfully, but these errors were encountered: