You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just found out about the sloth project today and after a lot of reading of the docs I think it is totally what I look for. but the only question that still is up in my head is how I can add a weekly returning maintenance window to the SLO calculations, cause an outage in this time window it would be not counted against the SLO at all.
Greetings
The text was updated successfully, but these errors were encountered:
At least you first need prometheus to record maintenance windows. Either some system that reports this as metric, or if its a fixed time, you could build a recording rule with the day_of_week and hour functions.
Then I'd probably cut my losses and just define a inhibit rule to avoid sending alerts during maintenance period (maybe add a buffer around the period). The slo calculations and boards would still take errors during maintenance period into account though. I would find that advantageous though as I'd anyways encourage trying to limit impact of maintenance periods.
Still if you really need calculations to exclude maintenance periods, then the approach would likely depend on your query. Assuming maintenance_period recording rule that reports 1 during mainteance, 0 otherwise, then maybe queries like this would do the trick
Basically if you take rate over full window period, you wouldn't know which errors happened during maintenance period. sum_over_time should still ensure the error ratio is a quite good approximation for the entire window period. > 0 or vector(1) will be quite important to include as the error ratios would otherwise have 0 denominator inside any maintenance period
I just found out about the sloth project today and after a lot of reading of the docs I think it is totally what I look for. but the only question that still is up in my head is how I can add a weekly returning maintenance window to the SLO calculations, cause an outage in this time window it would be not counted against the SLO at all.
Greetings
The text was updated successfully, but these errors were encountered: