-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability to disable specific alert rules Loki #11241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Daniel-Vaz
I appreciate what you're trying to do but I think we need a more generic solution. Can we take a step back and talk about the problem you're trying to solve? Why do you want to disable these specific alerts?
Hi @dannykopping , thank you for the feedback !
So currently I'm a consumer of this Loki Chart, and since the clusters where I have Loki deployed I also have the Prom Operator, I want to enable the Chart suggested Scrapes\Rules\Alerts. The Problem I'm facing, is that in some of my clusters indeed we have some "issues" in the overall architecture that result on Loki reporting quite frequently some We could just silence them. But I believe that giving the flexibility to the Chart consumer to pick and chose which alerts they want to have enabled or not would be a nice feature. For example, on the Kube Prometheus Stack Chart they support this disabling specific alerts capability. Do you have any other suggestion on how this capability could be implemented ? |
Trivy scan found the following vulnerabilities:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that sounds like a good enough reason to me - and good to see some prior art where this is done the same way. Thanks for that
LGTM! I'll just request a PR from a colleague who knows Helm before we merge.
@Daniel-Vaz have you tried testing when all alerts are set to false and the rules block under |
Thank you very much @dannykopping !
@shantanualsi Indeed in that situation the object will fail to be applied because it as no rules under the groups section. Either way I think that if we find a way to address this corner-case it would indeed increase the reliability of the Chart. |
in our case, you can technically have I'm also thinking that |
The way I implemented it is not the case I believe. Since for the {{- if not (.Values.monitoring.rules.disabled.LokiCanaryLatency | default false) }}
- name: "loki_canaries_alerts"
rules:
- alert: "LokiCanaryLatency"
annotations:
message: |
{{`{{`}} $labels.job {{`}}`}} is experiencing {{`{{`}} printf "%.2f" $value {{`}}`}}s 99th percentile latency.
expr: |
histogram_quantile(0.99, sum(rate(loki_canary_response_latency_seconds_bucket[5m])) by (le, namespace, job)) > 5
for: "15m"
labels:
severity: "warning"
{{- if .Values.monitoring.rules.additionalRuleLabels }}
{{ toYaml .Values.monitoring.rules.additionalRuleLabels | indent 10 }}
{{- end }}
{{- end }} I did this because that rules group section had only that single alert. If more are added in he future, then that needs to be updated indeed.
I believe that hack script is used in order to have a sync from a external place where alerts are defined. They dont write their own alerts definitions inside the Kube-Prometheus-Stack Chart, they instead just use those script to sync the rules from a external place and format them accordingly. The usage for those scripts is written here.
I kind of agree with you on this one.
We could also move the monitoring:
rules:
alerting:
LokiTooManyCompactorsRunning: true
LokiRequestLatency: false
# By default all values should be true |
Sorry for the delay.
Apologies, I wasn't clear enough here.. in case if all rules under Can you please add some documentation on the values.yaml for the edge case mentioning to set |
@shantanualsi Thank you :D ! Updated the values.yaml comments, whenever you want feel free to have a look. |
Also, please note that when you update the Helm charts, you have to bump the version of the Helm Chart. This is mentioned in the PR checklist. For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR |
@Daniel-Vaz can you also please increment the helm version and add a changelog as suggested by @JStickler |
Done, Thank you for the warning and sorry for missing that part previously. |
Head branch was pushed to by a user without write access
@Daniel-Vaz there seem to be some CI failures (the helm linter and the helm docs checks, specifically). Please address these, and then we can get this merged. |
I updated the version reference to be
|
Hhmm, i just ran it and it worked fine; did you run this from the root of the project? You can always copy the changes in manually from this commit: |
@dannykopping |
Oh man what a nightmare, this CI is failing on whitespace now 💀 🙃 |
@dannykopping I see the workflow failing indeed on what it seems to be a removed line\whitespace.... $ helm-docs
INFO[2024-01-17T07:53:09Z] Found Chart directories [production/helm/loki]
INFO[2024-01-17T07:53:09Z] Generating README Documentation for chart production/helm/loki
$ make helm-lint
make -BC production/helm/loki lint
make[1]: Entering directory '/home/user/repos/loki/production/helm/loki'
yamllint -c /home/user/repos/loki/production/helm/loki/src/.yamllint.yaml /home/user/repos/loki/production/helm/loki/src
make[1]: Leaving directory '/home/user/repos/loki/production/helm/loki'
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean |
I think just manually delete that line then, and let's get this merged. |
Fingers crossed 🤞 |
😌 thanks for your patience @Daniel-Vaz! We appreciate the contribution a lot! |
**What this PR does / why we need it**: Currently using the Loki Chart we can only either enable\disable ALL alert rules. For specific environments and use-cases sometimes not all alert Rules are useful to have enabled. With this PR change, we can cleanly and through the Chart values disable specific Alerts.
**What this PR does / why we need it**: Currently using the Loki Chart we can only either enable\disable ALL alert rules. For specific environments and use-cases sometimes not all alert Rules are useful to have enabled. With this PR change, we can cleanly and through the Chart values disable specific Alerts.
What this PR does / why we need it:
Currently using the Loki Chart we can only either enable\disable ALL alert rules.
For specific environments and use-cases sometimes not all alert Rules are useful to have enabled.
With this PR change, we can cleanly and through the Chart values disable specific Alerts.
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)values.yaml
comments addedExtra Note:
This is my first time doing a PR for this project. I forked the main branch and implemented these changes. Did not change the Chart.yaml version neither the Changelog, but if needed (or any other missing action) just request me and I will do it.
Thank you in advance for all your awesome work.