jmx state metrics #12369

SylvainJuge · 2024-09-30T16:38:29Z

This introduces the ability to capture "state metrics" that are captured from a JMX attribute value.

A "state metric" has the following properties:

the value is captured from a single MBean attribute
the MBean attribute providing state value can be mapped to a string
the state is reflected in a single "state attribute"
the state metric value will be either 0 or 1
the state metric will use the "state attribute" for breakdown, so removing the "state attribute" will always sum to 1.
in order to limit cardinality, the set of possible states must be known and provided in advance through a "state mapping".

This follows a similar way to map state as we have in the (experimental) hardware semantic conventions as seen here with the hw.state attribute.

Example from a common MBean in Tomcat:

⚠️ This is just an example of "how to capture state metrics", the actual definitions that might be added for Tomcat will not match this example

JMX object name: Catalina:type=Connector,port=*, JMX attribute name stateName, the value of stateName comes from LifeCycleState enum (12 distinct values).

We can capture this as tomcat.connector.state with two attributes

state with value: ok, failed or degraded (name is not limited to state and any name could be used here).
port that provides per-port breakdown (from the mbean wildcard)

The YAML configuration used:

rules:
  - bean: Catalina:type=Connector,port=*
    mapping:
      stateName:
        type: state
        metric: tomcat.connector.state
        metricAttribute:
          port: param(port)
          state:
            ok: STARTED
            failed: [STOPPED,FAILED]
            degraded: _

For a given port, let's say 8080 and this configuration 3 metrics are captured on every sample. The metric type will always be updowncounter and the value either 0 or 1.

When stateName = STARTED, we have:

tomcat.connector.state value = 1, attributes port = 8080 and state = ok
tomcat.connector.state value = 0, attributes port = 8080 and state = failed
tomcat.connector.state value = 0, attributes port = 8080 and state = degraded

When stateName = STOPPED or FAILED, we have:

tomcat.connector.state value = 0, attributes port = 8080 and state = ok
tomcat.connector.state value = 1, attributes port = 8080 and state = failed
tomcat.connector.state value = 0, attributes port = 8080 and state = degraded

For other values of stateName, we have:

tomcat.connector.state value = 0, attributes port = 8080 and state = ok
tomcat.connector.state value = 0, attributes port = 8080 and state = failed
tomcat.connector.state value = 1, attributes port = 8080 and state = degraded

The ability to limit cardinality is relevant here to reduce the cardinality from the 12 potential values to only 3, most of which will very likely never be relevant in practice.

…nstrumentation into state-jmx-metrics

SylvainJuge · 2024-10-01T08:16:26Z

...etrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/engine/RuleParserTest.java

+    Metric m2 = attr.get("ATTRIBUTE2");
+    assertThat(m2).isNotNull();
+    assertThat(m2.getMetric()).isEqualTo("METRIC_NAME2");
+    assertThat(m2.getDesc()).isEqualTo("DESCRIPTION2");
+    assertThat(m2.getUnit()).isEqualTo("UNIT2");
+
+    JmxRule def2 = defs.get(1);
+    assertThat(def2.getBeans()).containsExactly("OBJECT:NAME3=*");
+    assertThat(def2.getMetricAttribute()).isNull();
+
+    assertThat(def2.getMapping()).hasSize(1);
+    Metric m3 = def2.getMapping().get("ATTRIBUTE3");
+    assertThat(m3.getMetric()).isEqualTo("METRIC_NAME3");
+    assertThat(m3.getUnit()).isNull();


[for reviewer] this is mostly adding extra test assertions, not related to adding state metrics.

SylvainJuge · 2024-10-01T08:16:36Z

...etrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/engine/RuleParserTest.java

+    assertThat(attr)
+        .hasSize(5)
+        .containsKeys("ATTRIBUTE31", "ATTRIBUTE32", "ATTRIBUTE33", "ATTRIBUTE34", "ATTRIBUTE35");
+    assertThat(attr.get("ATTRIBUTE32")).isNull();
    assertThat(attr.get("ATTRIBUTE33")).isNull();
-    assertThat(attr.get("ATTRIBUTE34")).isNotNull();
+    Metric attribute34 = attr.get("ATTRIBUTE34");
+    assertThat(attribute34).isNotNull();
+    assertThat(attribute34.getMetric()).isEqualTo("METRIC_NAME34");
+    assertThat(attr.get("ATTRIBUTE35")).isNull();


[for reviewer] this is mostly adding extra test assertions, not related to adding state metrics.

SylvainJuge · 2024-10-01T08:16:50Z

...etrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/engine/RuleParserTest.java

+              assertThat(m.getAttributes())
+                  .hasSize(3)
+                  .extracting("attributeName")
+                  .contains("LABEL_KEY1", "LABEL_KEY2", "LABEL_KEY3");
+
+              MetricInfo metricInfo = m.getInfo();
+              assertThat(metricInfo.getMetricName()).isEqualTo("PREFIX.METRIC_NAME1");
+              assertThat(metricInfo.getDescription()).isEqualTo("DESCRIPTION1");
+              assertThat(metricInfo.getUnit()).isEqualTo("UNIT1");
+              assertThat(metricInfo.getType()).isEqualTo(MetricInfo.Type.COUNTER);
+            })
+        .anySatisfy(
+            m -> {
+              assertThat(m.getMetricValueExtractor().getAttributeName()).isEqualTo("ATTRIBUTE2");
+              assertThat(m.getAttributes())
+                  .hasSize(2)
+                  .extracting("attributeName")
+                  .contains("LABEL_KEY1", "LABEL_KEY2");
+
+              MetricInfo metricInfo = m.getInfo();
+              assertThat(metricInfo.getMetricName()).isEqualTo("PREFIX.METRIC_NAME2");
+              assertThat(metricInfo.getDescription()).isEqualTo("DESCRIPTION2");
+              assertThat(metricInfo.getUnit()).isEqualTo("UNIT2");
            })
        .anySatisfy(
            m -> {
              assertThat(m.getMetricValueExtractor().getAttributeName()).isEqualTo("ATTRIBUTE3");

-              MetricInfo mb3 = m.getInfo();
-              assertThat(mb3.getMetricName()).isEqualTo("PREFIX.ATTRIBUTE3");
+              MetricInfo metricInfo = m.getInfo();
+              assertThat(metricInfo.getMetricName()).isEqualTo("PREFIX.ATTRIBUTE3");
+              assertThat(metricInfo.getDescription()).isNull();
+
              // syntax extension - defining a default unit and type
-              assertThat(mb3.getType()).isEqualTo(MetricInfo.Type.UPDOWNCOUNTER);
-              assertThat(mb3.getUnit()).isEqualTo("DEFAULT_UNIT");
+              assertThat(metricInfo.getType())
+                  .describedAs("default type should match jmx rule definition")
+                  .isEqualTo(jmxDef.getMetricType());
+              assertThat(metricInfo.getUnit())
+                  .describedAs("default unit should match jmx rule definition")
+                  .isEqualTo(jmxDef.getUnit());


[for reviewer] this is mostly adding extra test assertions, not related to adding state metrics.

...etrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/engine/RuleParserTest.java

SylvainJuge · 2024-10-01T08:21:25Z

...ion/jmx-metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/JmxRule.java

+    return new ArrayList<>(set.values());
+  }
+
+  private static StateMapping getEffectiveStateMapping(Metric m, JmxRule rule) {


[for reviewer] for state mapping, the resulting mapping is not a combination of rule level and metric level definitions. Only one of them is used.

SylvainJuge · 2024-10-01T14:39:53Z

Making it ready for review now that it has been validated/tested with a real tomcat instance.

trask · 2024-10-02T02:49:52Z

cc @PeterF778 in case you have a chance to review

PeterF778 · 2024-10-02T16:57:35Z

The YAML configuration used:

rules:
  - bean: Catalina:type=Connector,port=*
    mapping:
      stateName:
        type: updowncounter
        metric: tomcat.connector.count
        stateMapping:
          ok: STARTED
          failed: [STOPPED,FAILED]
          degraded: '*'
        metricAttribute:
          state: statekey()
          state: param(port)

I believe you have a typo, it should be "port: param(port)".

Also, for the sake of better understanding, shouldn't the metric name be something like tomcat.connector.state rather than count. I do not see any real counts here.

...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java

PeterF778 · 2024-10-02T18:10:37Z

Some observations: the OpenTelemetry metric type must always be updowncounter, and the stateMapping can be used only for one metric attribute. Hence, we could use a different syntax (hopefully easier to read and maintain):

rules:
  - bean: Catalina:type=Connector,port=*
    mapping:
      stateName:
        type: state    # maps automatically to updowncounter with values 0 or 1
        metric: tomcat.connector.status
        metricAttribute:
          state:
            ok: STARTED
            failed: [STOPPED,FAILED]
            degraded: '*'
          port: param(port)

trask · 2024-10-02T20:22:48Z

the simplification seems nice.

would this attribute have to be named state, or would its type be implicitly understood due to the presence of the nested yaml?

          state:
            ok: STARTED
            failed: [STOPPED,FAILED]
            degraded: '*'

PeterF778 · 2024-10-02T21:39:02Z

would this attribute have to be named state, or would its type be implicitly understood due to the presence of the nested yaml?
          state:
            ok: STARTED
            failed: [STOPPED,FAILED]
            degraded: '*'

It could be named anything, the nested yaml would provide the way to map it to OTel metric.

SylvainJuge · 2024-10-03T08:35:08Z

Thanks a lot @PeterF778 for your suggestions, that really makes it simpler:

adding a state metric type that automatically maps to 0/1 updowncounter
using nested yaml structure to detect the state attribute
naming the metric tomcat.connector.state even if that's more something to "conventions for JMX state metrics", but given this is a new feature we can start spec/document it right now without having to change other metrics.

So I will update this PR to implement this.

…nstrumentation into state-jmx-metrics

SylvainJuge

@PeterF778 I've added the changes you suggested, let me know if there are further changes/improvements needed here.

SylvainJuge · 2024-10-09T08:11:07Z

...ibrary/src/main/java/io/opentelemetry/instrumentation/jmx/engine/BeanAttributeExtractor.java

+  protected Object getSampleValue(MBeanServerConnection connection, ObjectName objectName) {
+    return extractAttributeValue(connection, objectName, logger);
+  }


[for reviewer] this allows to bypass the sample value check for the "artificial" attribute value that we need for the state metric.

SylvainJuge · 2024-10-09T08:12:14Z

...trics/library/src/main/java/io/opentelemetry/instrumentation/jmx/engine/MetricRegistrar.java

+      case STATE:
+        {
+          // CHECKSTYLE:ON
+          throw new IllegalStateException("state metrics should not be registered");
+        }


[for reviewer] this is needed to have the switch statement cover all cases.

instrumentation/jmx-metrics/javaagent/README.md

PeterF778 · 2024-10-09T22:34:26Z

...mx-metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/StateMapping.java

+      }
+
+      if (defaultState == null) {
+        throw new IllegalStateException("missing default state");


Is requiring the default state necessary? I understand that if it is missing and the actual state string value is not in the list, the metric value will be zero for all combinations of provided metric attributes.

I agree that this might be "best practice" rather than strict requirement, but this would likely be the case for all the mappings embedded in the agent anyway, so preventing invalid metrics and making mappings more future proof is probably a better option, I'm fine to remove this as a strict requirement though.

SylvainJuge · 2024-10-14T08:27:24Z

Also I think changing the default wildcard value from * to _ could help avoid having to quote the default '*' as _ is a valid string literal in YAML, while * is not, I'll update the PR with this proposal as it makes mappings simpler to write.

PeterF778 · 2024-10-14T23:04:23Z

Also I think changing the default wildcard value from * to _ could help ...

Yes, this is fine. However, I just realized that the values of the string MBeans can be anything, in particular * and _ (or even empty string). But I do not have any ideas for fixing that.

SylvainJuge · 2024-10-15T08:01:19Z

I just realized that the values of the string MBeans can be anything, in particular * and _ (or even empty string). But I do not have any ideas for fixing that.

I agree, it can be anything, however I think it's definitely unlikely enough that we need to wait for any feedback on this to be an issue. The empty string might already be supported and we would just need to have a way to escape the _.

SylvainJuge · 2024-10-16T08:18:46Z

@trask do you think this could make it for the 2.9.0 release ?

...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java

…etry/instrumentation/jmx/yaml/MetricStructure.java Co-authored-by: Jay DeLuca <[email protected]>

trask · 2024-10-16T03:49:39Z

instrumentation/jmx-metrics/javaagent/README.md

+          connector_state:
+            ok: STARTED
+            failed: [STOPPED,FAILED]
+            degraded: _


I just thought of open-telemetry/opentelemetry-configuration#115

though we're already not doing this in the jmx metrics (prefering yaml brevity), so this comment should not hold up this PR

@jack-berg do you think this approach (using objects for brevity) in jmx metrics module is ok long-term?

If we would switch to using an array of objects, for example with name and value, then we could add a third one with default attribute key with value true|false to indicate which entry would be the default instead of using a "magic value" to represent this.

trask · 2024-10-16T03:50:24Z

...etrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/engine/RuleParserTest.java

@@ -312,25 +361,125 @@ void testConf8() throws Exception {
    assertThat(mb1.getUnit()).isNull();
  }

+  private static final String CONF9 =
+      "---                                   # keep stupid spotlessJava at bay\n"


trask · 2024-10-16T15:52:02Z

instrumentation/jmx-metrics/javaagent/README.md

+- a string literal or a string array
+- a `_` character to provide default option and avoid enumerating all values.
+
+Exactly one `_` value must be present in the mapping to ensure all possible values of the MBean attribute can be mapped to a state key.


I don't fully understand why this is a requirement. If there are only certain states that you are interested in, why not allow to only specify those?

This is not a requirement, but allows to ensure that state metrics are always captured consistently, this is more best practice than a technical requirement.

Doing so means that at a given time, the sum of all samples is equal to 1, which translates to "the state of the thing we observe is known at all times". I think it's worth doing that for now unless we have good examples where it can be problematic.

The example I have in mind when adding this restriction is again probably biased by current focus on Tomcat state, so I'll elaborate a bit:

Let's say we monitor 3 tomcat servers, each of them with a single connector and a state metric to get the connector state with on|off values. When consuming those metrics to create a dashboard:

Scenario 1:

default set to off to indicate something is not right.

creating a linear chart (sum) with breakdown on state will always sum to 3, and the breakdown provides an obvious clue when something is not in on state.

creating a pie-chart with the latest value with breakdown on state also provides a good representation of the 3 tomcat instances.

the end-user does not even need to know how many tomcat instances exist, you get information from the ratio of on|off in the charts and then can investigate what could be wrong if needed.

creating an extra dedicated unknown state could be used to gather those states as they might represent a gap in the metrics mapping.

Scenario 2:

state value on when connector is RUNNING, off when connector is STOPPED, other possible values are not mapped.

creating a linear chart (sum) with breakdown on state will sum to 3 when the state is either on or off, but will go down whenever an un-mapped state is present

the end-user needs to know that the "usual value is 3", and that "when it goes down to less than 3 something is unknown and maybe we have a gap in our mapping"

it is possible to trigger an alert by monitoring when the value goes down, but this is definitely more complex and depends mostly on the backend and metrics processing.

with some visualizations it's even more tricky, for example when creating a pie-chart of the latest reported values to represent the "current state", if not all instances can be represented because their state is unknown then they will be ignored. In an extreme case, if only a single instance is in on state and the two others are in an unknown state then the pie-chart could be misleading representing "100% is running and it's fine", when in practice it's "only 1/3 fine".

Given that we usually need to monitor things that are working properly and even more the ones that not working properly (which are not in an expected state), having this as a requirement makes it harder to shoot yourself in the foot by missing a state whose value comes from an implementation detail of the monitored product.

instrumentation/jmx-metrics/javaagent/README.md

trask · 2024-10-17T02:16:40Z

@trask do you think this could make it for the 2.9.0 release ?

let's discuss in Thu SIG meeting, we can hold release until Fri if we need a bit more time to discuss

trask · 2024-10-17T14:08:09Z

Have you looked at the state metrics in the collector? open-telemetry/semantic-conventions#1032 (comment)

SylvainJuge · 2024-10-17T14:50:18Z

Have you looked at the state metrics in the collector? open-telemetry/semantic-conventions#1032 (comment)

No, I wasn't aware of those "state metrics", but looking for those

For k8sclusterreceiver is the one where this comment author has challenges, and we can see that the metric modeling is not idea and far from "state metrics" as defined here:

k8s.pod.phase metric where the value encodes the state link

some metrics split in two metrics depending on the state, for example k8s.replicaset.available and k8s.replicaset.desired.

For kubeletstatsreceiver (link) does not seem to report metrics that look like "state metrics".

…nstrumentation into state-jmx-metrics

…try-java-instrumentation into state-jmx-metrics

SylvainJuge · 2024-10-17T19:37:40Z

I've restored the original wildcard *:

it must be quoted when used alone AND in a string array (I initially thought in an array it was fine without, but after checking it's not)
the case @breedx-splk raised during today's meeting about duplicate mappings is already covered by unit tests

So it's ready for a final review (just to be sure) and it should be good to be merged.

SylvainJuge added 8 commits September 20, 2024 16:25

minor test refactor

9272bfe

enhance tests

6f20784

add more assertions in tests

d1a195b

add state mapping + tests

3f2f151

impl + test

5973b01

Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…

4fd10aa

…nstrumentation into state-jmx-metrics

spotless

6155206

thou shall lint

af37bb4

SylvainJuge commented Oct 1, 2024

View reviewed changes

SylvainJuge mentioned this pull request Oct 1, 2024

Add ability to capture non-numeric MBean attributes as metrics #12229

Closed

SylvainJuge added 3 commits October 1, 2024 10:58

minor add @nullable

6551200

fix state metric detection

376acc5

lint again

679c460

SylvainJuge marked this pull request as ready for review October 1, 2024 14:39

SylvainJuge requested a review from a team as a code owner October 1, 2024 14:39

PeterF778 reviewed Oct 2, 2024

View reviewed changes

...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java Outdated Show resolved Hide resolved

PeterF778 reviewed Oct 2, 2024

View reviewed changes

...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java Outdated Show resolved Hide resolved

SylvainJuge added 4 commits October 8, 2024 17:28

implement new syntax

e5ff7c5

cleanup

8e73f10

Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…

e9e8e83

…nstrumentation into state-jmx-metrics

spotless

550b003

SylvainJuge commented Oct 9, 2024

View reviewed changes

add documentation

1bb4302

github-actions bot requested a review from theletterf October 9, 2024 08:41

minor code refactor to use lists instead of arrays

ebf58ff

PeterF778 reviewed Oct 9, 2024

View reviewed changes

instrumentation/jmx-metrics/javaagent/README.md Show resolved Hide resolved

PeterF778 reviewed Oct 9, 2024

View reviewed changes

instrumentation/jmx-metrics/javaagent/README.md Outdated Show resolved Hide resolved

PeterF778 reviewed Oct 9, 2024

View reviewed changes

port-review change in docs

6c10e6d

switch to _ + enhance doc for 2 state

652ab32

PeterF778 approved these changes Oct 15, 2024

View reviewed changes

jaydeluca reviewed Oct 16, 2024

View reviewed changes

...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java Show resolved Hide resolved

Update instrumentation/jmx-metrics/library/src/main/java/io/opentelem…

e24b9ae

…etry/instrumentation/jmx/yaml/MetricStructure.java Co-authored-by: Jay DeLuca <[email protected]>

laurit added this to the v2.9.0 milestone Oct 16, 2024

trask reviewed Oct 16, 2024

View reviewed changes

trask reviewed Oct 17, 2024

View reviewed changes

instrumentation/jmx-metrics/javaagent/README.md Outdated Show resolved Hide resolved

trask mentioned this pull request Oct 17, 2024

Update change log for upcoming release #12452

Merged

SylvainJuge added 4 commits October 17, 2024 20:58

Merge branch 'main' of github.com:open-telemetry/opentelemetry-java-i…

2698be3

…nstrumentation into state-jmx-metrics

restore '*' as wildcard + clarify quoting

e4b9fe5

Merge branch 'state-jmx-metrics' of github.com:SylvainJuge/openteleme…

111c26b

…try-java-instrumentation into state-jmx-metrics

restore '*' again

96bb6cc

trask approved these changes Oct 17, 2024

View reviewed changes

trask merged commit 238a201 into open-telemetry:main Oct 17, 2024
56 checks passed

SylvainJuge deleted the state-jmx-metrics branch October 18, 2024 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jmx state metrics #12369

jmx state metrics #12369

SylvainJuge commented Sep 30, 2024 •

edited

Loading

SylvainJuge Oct 1, 2024

SylvainJuge Oct 1, 2024

SylvainJuge Oct 1, 2024

SylvainJuge Oct 1, 2024

SylvainJuge commented Oct 1, 2024

trask commented Oct 2, 2024

PeterF778 commented Oct 2, 2024 •

edited

Loading

PeterF778 commented Oct 2, 2024

trask commented Oct 2, 2024 •

edited

Loading

PeterF778 commented Oct 2, 2024 •

edited

Loading

SylvainJuge commented Oct 3, 2024

SylvainJuge left a comment

SylvainJuge Oct 9, 2024

SylvainJuge Oct 9, 2024

PeterF778 Oct 9, 2024

SylvainJuge Oct 14, 2024

SylvainJuge commented Oct 14, 2024

PeterF778 commented Oct 14, 2024

SylvainJuge commented Oct 15, 2024

SylvainJuge commented Oct 16, 2024

trask Oct 16, 2024

SylvainJuge Oct 17, 2024

trask Oct 16, 2024

trask Oct 16, 2024

SylvainJuge Oct 17, 2024

trask commented Oct 17, 2024

trask commented Oct 17, 2024

SylvainJuge commented Oct 17, 2024

SylvainJuge commented Oct 17, 2024

jmx state metrics #12369

jmx state metrics #12369

Conversation

SylvainJuge commented Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SylvainJuge commented Oct 1, 2024

trask commented Oct 2, 2024

PeterF778 commented Oct 2, 2024 • edited Loading

PeterF778 commented Oct 2, 2024

trask commented Oct 2, 2024 • edited Loading

PeterF778 commented Oct 2, 2024 • edited Loading

SylvainJuge commented Oct 3, 2024

SylvainJuge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SylvainJuge commented Oct 14, 2024

PeterF778 commented Oct 14, 2024

SylvainJuge commented Oct 15, 2024

SylvainJuge commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trask commented Oct 17, 2024

trask commented Oct 17, 2024

SylvainJuge commented Oct 17, 2024

SylvainJuge commented Oct 17, 2024

SylvainJuge commented Sep 30, 2024 •

edited

Loading

PeterF778 commented Oct 2, 2024 •

edited

Loading

trask commented Oct 2, 2024 •

edited

Loading

PeterF778 commented Oct 2, 2024 •

edited

Loading