-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opencost grafana dashboard partly not working #2
Comments
@mattray any chance you can take a look? |
@andriktr I'll try to recreate and see what I find. I'm moving it over to the opencost-helm-chart repository since that's where it originated. |
@mattray Thanks a lot. |
I am also getting the same partial success with the grafana dashboard. If it helps the partial success in on a AWS EKS cluster and am not experiencing the issue on an Azure AKS cluster. |
Scape my last comment the dashboard is now partially working on an AKS cluster that was previously working |
Hi @mattray by the looks of things the project is really active so no how busy you must be but have you been able to replicate this issue. The dashboard is super useful when it was working so any progress on this would be useful |
Sorry, I've been swamped on other projects. A couple of folks brought it up at KubeCon that they'd be interested in working on it, but I haven't heard from anyone else yet. I'm not a Grafana expert by any means, so if someone's interested and wants to take this feel free. I'll try to circle back to this soon. |
@dwbrown2 did you resolve a similar issue here? kubecost/cost-analyzer-helm-chart#303 Could the same logic be applied? |
@sossickd I'd have to dig in to share for sure. I'm unfortunately tied up on other projects right now, would love to extra help if others are able to review. Will do my best to circle back when free. |
@dwbrown2, @mattray OK found the issue, this was caused when running the opencost deployment with more than one replica. Created a PR. opencost/opencost-helm-chart#157 This adds a variable to filter on pod, amended each pane that had the many-to-many error and filtered on the pod label. Not too sure if this is the best solution but fixed it in my case. |
Hmm... it sounds strange as I'm running opencost with single replica and still have same issues. |
@andriktr can you copy and paste one of the errors in a code snippet from one of the broken panes so i can see if its the same issue i was experiencing |
Errors I got are in the very first post of this issue. |
@andriktr would you mind pasting into a code snippet so i can copy easier? |
Sure: Here is the error output for Top 20 by Namespace dashboard:
|
Can you open up prometheus and enter in this query:
Do you get the many-to-many matching not allowed: matching labels must be unique on one side error? |
OK looking at the error a bit further it looks like your issue maybe slightly different to mine.
The query is returning tow matches from what i can see, the instance label looks like it might be being renamed to exported_instance, thats not happening for me. What doesn't look right to me is that the instance IP address is the same on both outputs. Has the node recently been destroyed? |
@andriktr can you tun the following query in prometheus and paste the return in a code snippet.
Also can you show me the output from a |
@andriktr OK this is different from what i am seeing. In my case the instance value matches the node value. Are you using any relabelings of metrics? Can you determine what Also what version of opencost and helm chart are you using? |
Yes, 10.162.208.9:9003 is the opencost pod ip. Version is |
@andriktr are you doing any relabelings of metrics? Trying to get my head around why you are getting a exported_instance label and the instance label is being transformed to 10.162.208.9:9003 If yu are using helm to deploy could you share your values? It would also be useful to share you kube-prometheus-stack helm values |
Hey we do not do any relabelings of metrics. I have my own helm chart but in general it more less same as official + some addons for AAD Pod Identity: serviceAccount:
create: true
annotations: {}
# eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/eksctl-opencost
automountServiceAccountToken: true
annotations: {}
#azure.workload.identity/inject-proxy-sidecar: "true"
service:
annotations: {}
labels: {}
type: ClusterIP
opencost:
exporter:
# The GCP Pricing API requires a key. This is supplied just for evaluation.
# cloudProviderApiKey: 'asdfasdfasdf'
# Default cluster ID to use if cluster_id is not set in Prometheus metrics.
defaultClusterId: "aks-experimental"
image:
registry: redacted
repository: kubecost-cost-model
tag: prod-1.107.0
resources:
requests:
cpu: '10m'
memory: '55M'
limits:
cpu: '999m'
memory: '1G'
extraEnv:
{}
# FOO: BAR
metrics:
serviceMonitor:
enabled: true
additionalLabels:
release: 'kube-prometheus-stack'
## The label to use to retrieve the job name from.
## jobLabel: "app.kubernetes.io/name"
namespace: 'kube-prometheus-stack'
namespaceSelector: {}
## Default: scrape .Release.Namespace only
## To scrape all, use the following:
## namespaceSelector:
## any: true
scrapeInterval: 30s
# honorLabels: true
targetLabels: []
relabelings: []
metricRelabelings: []
prometheus:
# username:
# password:
external:
enabled: false
url: 'https://mimir-dev-push.infra.alto.com/prometheus'
internal:
enabled: true
serviceName: kube-prometheus-stack-prometheus
namespaceName: kube-prometheus-stack
port: 9090
ui:
enabled: true
image:
registry: redacted
repository: opencost-ui
tag: prod-1.107.0
resources:
requests:
cpu: '10m'
memory: '55M'
limits:
cpu: '999m'
memory: '1G'
tolerations: []
# Baltic IF Custom Values
customAzureConfig:
enabled: true
azureTenantId: "redacted"
azureSubscriptionId: "redacted"
azurePodIdentity:
enabled: true
azureIdentity:
name: opencost-identity
resourceID: "redacted"
clientID: "redacted"
azureIdentityBinding:
name: opencost-identity-binding
selector: opencost-identity
azureWorkloadIdentity:
enabled: false
clientID: ""
ingress:
enabled: true
annotations: {}
labels: {}
ingress-class: internal-nginx
hosts:
- host: opencost-experimental.eu
paths:
- path: /
pathType: ImplementationSpecific
serviceName: opencost
servicePort: 9090
tls:
- hosts:
- opencost-experimental.eu
secretName: "" |
Hey,
I tested with 2 and with 1 replicas. Same result. Any hints on how to solve / work around? Thanks! |
Most probably main reason here is that opencost duplicates kube-state-metrics (uses same names) for it's metrics To check u can try to simply search for kube_node_info metrics in grafana explorer you probably will see it doubled with additional instances related to opencost. P.S. i have tried to adjust setting mentioned in https://www.opencost.io/docs/installation/helm#example-configuration ...
opencost:
exporter:
extraEnv:
EMIT_KSM_V1_METRICS: "false"
EMIT_KSM_V1_METRICS_ONLY: "true" however for some reason this not worked, so I ended up with uninstalling opencost and switching to aks-cost-analysis addon which is actually also based on opencost :) |
Does anyone have any update here? We are facing a similar issue. |
I was able to fix the issue with these settings: opencost:
metrics:
kubeStateMetrics:
emitKsmV1Metrics: false
emitKsmV1MetricsOnly: true I deployed the changes and waited an hour. After, I changed the time range on the dashboard to 15min in order to see if the changes work. However the dashboard uses a large fixed time range for some visualizations. These visualizations take some time until they show you the correct data. But some of the them should already work. |
Is there a clear description of which dashboards don't work somewhere? Would love any community support on this. |
@dlahn there are two different issues here:
For the first issue, I mentioned the fix above. It seems to me that we need to document this somewhere. |
We can put configuration/work-arounds notes in the README |
Sounds good! I will add it to the draft PR. |
@asdfgugus We have had this change for quite some time, but we are still running into this issue.
We have also made sure to drop out the exported_ labels. The instance is unique across all of these metrics. |
@asdfgugus Further to the above... Should the instance be unique to the opencost pod? At the momemt, we are using k8s-monitoring-helm and it sets the instance to be the same for the opencost scrape. However.. looking at the Top 20 Namespaces part of the dashboard as an example:
The 2nd part where it looks up |
Just an update here, our issue was that the |
Thanks for sharing your solution! As we collaborated on debugging via Slack, I'd like to expand on it. It is crucial to honor the labels, as I mentioned earlier. By honoring the labels, I mean ensuring that the scrape job does not append the @dlahn do you re-write them when scraping or querying? |
@asdfgugus I am re-writing at the scrape side. |
For anyone using k8s-monitoring-helm who may run into this issue, a fix has been made in the chart to add the |
Im still seeing the duplicate issue, even with
Would it be possible to disable certain metrics that are duplicated using value in values.yaml?
This was also mentioned in opencost/opencost#1571 How to identify the duplicated metrics? |
Hello guys, same issue for me, i can't use your dashboard cause of duplicate metrics. Even with your 2 env ver about ksmV1. |
@Momotoculteur, could you please check which metrics are affected?
|
Yes, you can disable metrics of OpenCost. When I remember correctly, the current dashboard only uses metrics produced by OpenCost. Therefore, I would first identify the duplicated metrics.
Edit the dashboard and check which queries are not working. You can also query the metrics store directly for these metrics: https://docs.kubecost.com/v/1.0x/architecture/user-metrics |
Hello @asdfgugus thanks for your quick answer. I have this setup:
Metrics endpoints from KSM & opencost are scrapped via Vector and sended in a grafana Mimir TSDB in AWS S3 buckets I scrapped others services like cAdvisor, metrics-server, custom Jenkins metrics for betclic company via a prometheus push gateway, node-exporter, nginx exporter, jfrogArtifactory,and other stuff but i think we do not care about these apps. All is deployed via helm chart via ArgoCD. I have tested 3 differents dashboard, but got same results :
The error is : Status: 422. Message: execution: found duplicate series for the match group Need extra information about specific metrics which cause issue on specific dashboards ? I try to setup opencost to expose none KSM metrics as i have already one which expose metrics needed for opencost like this :
I have also tested this setup following some previous tips, but that doesn't fix my problem
Last idea i have is to let emitKsmV1MetricsOnly: true and comment mine from my own KSMv2 to expose thats metrics https://docs.kubecost.com/architecture/ksm-metrics#ksm-metrics-emitted-by-kubecost, but currently that seem to not work as OpenCost need some metrics in V1 format.. |
Thanks @Momotoculteur for the details. Careful, this is the configuration for the official OpenCost Helm chart: opencost:
metrics:
kubeStateMetrics:
emitKsmV1Metrics: false
emitKsmV1MetricsOnly: true Btw. you can find the Helm chart on ArtifactHub: https://artifacthub.io/packages/helm/opencost/opencost |
I have exactly this configuration, sorry my copy/paste was wrong :) Edit: i try tonight to desactivate KSMv2 metrics which is emit already by OpenCost (and described in kubecost documentation) in order to avoid duplicate, but still have the issue i'm clearly lost now why i got that problem :( |
My duplicate metrics which cause this issues gave me that values : [
{__name__="node_cpu_hourly_cost", arch="amd64", instance="IP_XXX", instance_type="c6a.xlarge", node="IP_XXX", provider_id="aws:///eu-west-1", region="eu-west-1"
},
{__name__="node_cpu_hourly_cost", arch="amd64", instance="IP_XXX", instance_type="c6a.xlarge", node="IP_XXX", provider_id="aws:///eu-west-1", region="eu-west-1"
}
] i used Karpenter for node autoscaling and spot instance from AWS EDIT : I made some tests, in order to delete duplicate items from my left_join PROMQL request. I'm based on min, or max or avg funtion, and seems to works now Request from your dashboard:
Updated request :
|
@Momotoculteur |
I've only 14 days of data cause i just recently installed open cost helm chart. But i have no more problem now even with a large timestamp in my dashboards. |
Hi,
I'm trying to use the following grafana https://github.com/opencost/opencost-helm-chart/blob/main/examples/dashboard/kube-prometheus-stack-opencost-dashboard.json dashboard to view opencost data. In my case it's only working partly and some boards throws the following errors:
Any thoughts?
Thanks in advance.
The text was updated successfully, but these errors were encountered: