Add k8s.container.status.waiting metric to semantic conventions #1672

povilasv · 2024-12-11T06:17:02Z

Area(s)

area:k8s

Is your change request related to a problem? Please describe.

K8s Cluster receiver uses would like to monitor K8s CrashLoop Back off state. We had a multiple issues about it:

Previously I tried to model status.waiting as Resource attribute, but because it's mutable it cannot be Resource attribute. See #997

I would like to propose adding this as metric to unblock this PR open-telemetry/opentelemetry-collector-contrib#35668

Describe the solution you'd like

  # k8s.container.* metrics
  - id: metric.k8s.container.status.waiting
    type: metric
    metric_name: k8s.container.status.waiting
    stability: experimental
    brief: "Whether container is in waiting state. (0 for no, 1 for yes)"
    instrument: gauge
    unit: ""

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

povilasv · 2024-12-11T07:37:08Z

@TylerHelmuth / @ChrsMark / @dmitryax woukd appreciate thoughts / feeedback 🙇

ChrsMark · 2024-12-11T08:54:33Z

Thank's for filing this @povilasv, modeling this as a metric makes sense!

There is a proposal on how we should model such "state"/"phase"/"status" metrics in SemConv. It is already in use by hw metrics and jmx metrics as mentioned at #1032 (comment) (see #1554).
Process' status will also be modeled like this: #1212 (comment)
I have filed another one for k8s.namespace.phase: #1668 which I think will help us verify the proposal.

My only question here would be how the modeling should actually look like. What you propose aligns with kube-state-metrics but I would like to see how this will be combined with other statuses like running and terminated as well as how we reflect the reason part. Can we deal with them all at once so as to ensure that the decision will scale and cover all?

FWIWI, If I'm not mistaken we can't have k8s.container.status.waiting and k8s.container.status.waiting.reason (metric name cannot be namespace at the same time).

povilasv · 2024-12-12T14:57:38Z

Thanks for review, I'll try to work on this next week to model this differently

povilasv added enhancement New feature or request triage:needs-triage labels Dec 11, 2024

github-actions bot added the area:k8s label Dec 11, 2024

github-project-automation bot added this to K8s SemConv SIG Dec 11, 2024

povilasv linked a pull request Dec 11, 2024 that will close this issue

Add k8s.container.status.waiting metric #1673

Open

3 tasks

ChrsMark moved this to In Review in K8s SemConv SIG Dec 11, 2024

ChrsMark removed the triage:needs-triage label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add k8s.container.status.waiting metric to semantic conventions #1672

Add k8s.container.status.waiting metric to semantic conventions #1672

povilasv commented Dec 11, 2024

povilasv commented Dec 11, 2024

ChrsMark commented Dec 11, 2024

povilasv commented Dec 12, 2024

Add k8s.container.status.waiting metric to semantic conventions #1672

Add k8s.container.status.waiting metric to semantic conventions #1672

Comments

povilasv commented Dec 11, 2024

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

povilasv commented Dec 11, 2024

ChrsMark commented Dec 11, 2024

povilasv commented Dec 12, 2024