Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-35095: drop redundant KSM selector from KubeCPUOvercommit #2422

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions assets/control-plane/prometheus-rule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ spec:
summary: StatefulSet update has not been rolled out.
expr: |
(
max without (revision) (
max by(namespace, statefulset, job, cluster) (
kube_statefulset_status_current_revision{namespace=~"(openshift-.*|kube-.*|default)",job="kube-state-metrics"}
unless
kube_statefulset_status_update_revision{namespace=~"(openshift-.*|kube-.*|default)",job="kube-state-metrics"}
Expand Down Expand Up @@ -232,7 +232,7 @@ spec:
description: Cluster {{ $labels.cluster }} has overcommitted CPU resource requests for Pods by {{ $value }} CPU shares and cannot tolerate node failure.
summary: Cluster has overcommitted CPU resource requests.
expr: |
sum(namespace_cpu:kube_pod_container_resource_requests:sum{job="kube-state-metrics",}) by (cluster) - (sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) by (cluster) - max(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) by (cluster)) > 0
sum(namespace_cpu:kube_pod_container_resource_requests:sum{}) by (cluster) - (sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) by (cluster) - max(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) by (cluster)) > 0
and
(sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) by (cluster) - max(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) by (cluster)) > 0
for: 10m
Expand Down Expand Up @@ -336,7 +336,7 @@ spec:
description: The kubernetes apiserver has terminated {{ $value | humanizePercentage }} of its incoming requests.
summary: The kubernetes apiserver has terminated {{ $value | humanizePercentage }} of its incoming requests.
expr: |
sum(rate(apiserver_request_terminations_total{job="apiserver"}[10m])) / ( sum(rate(apiserver_request_total{job="apiserver"}[10m])) + sum(rate(apiserver_request_terminations_total{job="apiserver"}[10m])) ) > 0.20
sum by(cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m])) / ( sum by(cluster) (rate(apiserver_request_total{job="apiserver"}[10m])) + sum by(cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m])) ) > 0.20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR but I'd recommend to remove the KubeAPITerminatedRequests alerting rule since we're not owners for the API server. And same goes with KubeAPIDown.

for: 5m
labels:
severity: warning
Expand Down Expand Up @@ -477,7 +477,7 @@ spec:
max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_memory_swap
- name: k8s.rules.container_resource
- name: k8s.rules.container_memory_requests
rules:
- expr: |
kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"} * on (namespace, pod, cluster)
Expand All @@ -496,6 +496,8 @@ spec:
)
)
record: namespace_memory:kube_pod_container_resource_requests:sum
- name: k8s.rules.container_cpu_requests
rules:
- expr: |
kube_pod_container_resource_requests{resource="cpu",job="kube-state-metrics"} * on (namespace, pod, cluster)
group_left() max by (namespace, pod, cluster) (
Expand All @@ -513,6 +515,8 @@ spec:
)
)
record: namespace_cpu:kube_pod_container_resource_requests:sum
- name: k8s.rules.container_memory_limits
rules:
- expr: |
kube_pod_container_resource_limits{resource="memory",job="kube-state-metrics"} * on (namespace, pod, cluster)
group_left() max by (namespace, pod, cluster) (
Expand All @@ -530,6 +534,8 @@ spec:
)
)
record: namespace_memory:kube_pod_container_resource_limits:sum
- name: k8s.rules.container_cpu_limits
rules:
- expr: |
kube_pod_container_resource_limits{resource="cpu",job="kube-state-metrics"} * on (namespace, pod, cluster)
group_left() max by (namespace, pod, cluster) (
Expand Down
3 changes: 1 addition & 2 deletions jsonnet/jsonnetfile.json
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,13 @@
"version": "main"
},
{
"name": "kubernetes-mixin is pinned because newer versions are breaking MON-3837",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @simonpasquier @machine424

Since #2482 removed CMO's dependency on k/k-mixin for dashboards, I believe we can unpin and sync with upstream to update the recording rules that depend on the same.

Copy link
Member Author

@rexagod rexagod Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up PR will unpin the kubernetes-monitoring/kubernetes-mixin version.

Ah, never mind, I just noticed Simon's comment there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can remove the whole entry for kubernetes-monitoring/kubernetes-mixin and just depend on the version pulled by kube-prometheus.

"source": {
"git": {
"remote": "https://github.com/kubernetes-monitoring/kubernetes-mixin.git",
"subdir": ""
}
},
"version": "b247371d1780f530587a8d9dd04ccb19ea970ba0"
"version": "cb72d737459a655e7575c09f7859815ae3690981"
}
],
"legacyImports": true
Expand Down
23 changes: 11 additions & 12 deletions jsonnet/jsonnetfile.lock.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"subdir": "contrib/mixin"
}
},
"version": "bd8896364a1d9adb554b538232004e5b2d6c850a",
"version": "12b47c2b935b6040a3874c512b2dd7f89dc0b599",
"sum": "IXI3LQIT9NmTPJAk8WLUJd5+qZfcGpeNCyWIK7oEpws="
},
{
Expand Down Expand Up @@ -88,7 +88,7 @@
"subdir": "grafana-builder"
}
},
"version": "ab84b9f67c7a7f61e0c0a311afb47a1af4f5903f",
"version": "a8fc2139d881ae632a8c956eb9dd4b84b24f362e",
"sum": "yxqWcq/N3E/a/XreeU6EuE6X7kYPnG0AspAQFKOjASo="
},
{
Expand Down Expand Up @@ -129,9 +129,8 @@
"subdir": ""
}
},
"version": "b247371d1780f530587a8d9dd04ccb19ea970ba0",
"sum": "7M2QHK3WhOc1xT7T7KhL9iKsCYTfsIXpmcItffAcbL0=",
"name": "kubernetes-mixin is pinned because newer versions are breaking MON-3837"
"version": "cb72d737459a655e7575c09f7859815ae3690981",
"sum": "JaPnO5N/KUBgA9v6qE7CYzp8OWDTpzjM0+l/SPqL4m4="
},
{
"source": {
Expand All @@ -140,7 +139,7 @@
"subdir": "jsonnet/kube-state-metrics"
}
},
"version": "0738de0be2ba1607aac8b58a0d783891664d48a9",
"version": "17151aca659e0659259b5e1f5675acf849281ade",
"sum": "lO7jUSzAIy8Yk9pOWJIWgPRhubkWzVh56W6wtYfbVH4="
},
{
Expand All @@ -150,7 +149,7 @@
"subdir": "jsonnet/kube-state-metrics-mixin"
}
},
"version": "0738de0be2ba1607aac8b58a0d783891664d48a9",
"version": "17151aca659e0659259b5e1f5675acf849281ade",
"sum": "qclI7LwucTjBef3PkGBkKxF0mfZPbHnn4rlNWKGtR4c="
},
{
Expand Down Expand Up @@ -192,7 +191,7 @@
"subdir": "jsonnet/mixin"
}
},
"version": "3c35a6d7baf761cc2e4426d508528e913cc9aab2",
"version": "5f512d21361a300889388c3a174f4349fef3604d",
"sum": "gi+knjdxs2T715iIQIntrimbHRgHnpM8IFBJDD1gYfs=",
"name": "prometheus-operator-mixin"
},
Expand All @@ -213,7 +212,7 @@
"subdir": "doc/alertmanager-mixin"
}
},
"version": "ff8c09d60174f27850830ceed6be38b3cf86974b",
"version": "f6b942cf9b3a503d59192eada300d2ad97cba82f",
"sum": "Mf4h1BYLle2nrgjf/HXrBbl0Zk8N+xaoEM017o0BC+k=",
"name": "alertmanager"
},
Expand All @@ -224,7 +223,7 @@
"subdir": "docs/node-mixin"
}
},
"version": "07ee8efaa4f8e7260eb8611f3f42973cbbf8ce8f",
"version": "0fddfd1ba530c954dc042c5d138de82ecd4e4ff1",
"sum": "cQCW+1N0Xae5yXecCWDK2oAlN0luBS/5GrwBYSlaFms="
},
{
Expand All @@ -234,7 +233,7 @@
"subdir": "documentation/prometheus-mixin"
}
},
"version": "3c5551df68442dd07668987e0685d12d9c3138dd",
"version": "f9057544cb69261acd59628c2e81369adeb584c9",
"sum": "dYLcLzGH4yF3qB7OGC/7z4nqeTNjv42L7Q3BENU8XJI=",
"name": "prometheus"
},
Expand Down Expand Up @@ -266,7 +265,7 @@
"subdir": "mixin"
}
},
"version": "df3df36986e07b21aaa88adefb5fbf0b648129b8",
"version": "bfbabbb89a3119e904d48028fb2534446e31a642",
"sum": "ieCD4eMgGbOlrI8GmckGPHBGQDcLasE1rULYq56W/bs="
}
],
Expand Down