-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jmx state metrics #12369
jmx state metrics #12369
Conversation
Metric m2 = attr.get("ATTRIBUTE2"); | ||
assertThat(m2).isNotNull(); | ||
assertThat(m2.getMetric()).isEqualTo("METRIC_NAME2"); | ||
assertThat(m2.getDesc()).isEqualTo("DESCRIPTION2"); | ||
assertThat(m2.getUnit()).isEqualTo("UNIT2"); | ||
|
||
JmxRule def2 = defs.get(1); | ||
assertThat(def2.getBeans()).containsExactly("OBJECT:NAME3=*"); | ||
assertThat(def2.getMetricAttribute()).isNull(); | ||
|
||
assertThat(def2.getMapping()).hasSize(1); | ||
Metric m3 = def2.getMapping().get("ATTRIBUTE3"); | ||
assertThat(m3.getMetric()).isEqualTo("METRIC_NAME3"); | ||
assertThat(m3.getUnit()).isNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] this is mostly adding extra test assertions, not related to adding state metrics.
assertThat(attr) | ||
.hasSize(5) | ||
.containsKeys("ATTRIBUTE31", "ATTRIBUTE32", "ATTRIBUTE33", "ATTRIBUTE34", "ATTRIBUTE35"); | ||
assertThat(attr.get("ATTRIBUTE32")).isNull(); | ||
assertThat(attr.get("ATTRIBUTE33")).isNull(); | ||
assertThat(attr.get("ATTRIBUTE34")).isNotNull(); | ||
Metric attribute34 = attr.get("ATTRIBUTE34"); | ||
assertThat(attribute34).isNotNull(); | ||
assertThat(attribute34.getMetric()).isEqualTo("METRIC_NAME34"); | ||
assertThat(attr.get("ATTRIBUTE35")).isNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] this is mostly adding extra test assertions, not related to adding state metrics.
assertThat(m.getAttributes()) | ||
.hasSize(3) | ||
.extracting("attributeName") | ||
.contains("LABEL_KEY1", "LABEL_KEY2", "LABEL_KEY3"); | ||
|
||
MetricInfo metricInfo = m.getInfo(); | ||
assertThat(metricInfo.getMetricName()).isEqualTo("PREFIX.METRIC_NAME1"); | ||
assertThat(metricInfo.getDescription()).isEqualTo("DESCRIPTION1"); | ||
assertThat(metricInfo.getUnit()).isEqualTo("UNIT1"); | ||
assertThat(metricInfo.getType()).isEqualTo(MetricInfo.Type.COUNTER); | ||
}) | ||
.anySatisfy( | ||
m -> { | ||
assertThat(m.getMetricValueExtractor().getAttributeName()).isEqualTo("ATTRIBUTE2"); | ||
assertThat(m.getAttributes()) | ||
.hasSize(2) | ||
.extracting("attributeName") | ||
.contains("LABEL_KEY1", "LABEL_KEY2"); | ||
|
||
MetricInfo metricInfo = m.getInfo(); | ||
assertThat(metricInfo.getMetricName()).isEqualTo("PREFIX.METRIC_NAME2"); | ||
assertThat(metricInfo.getDescription()).isEqualTo("DESCRIPTION2"); | ||
assertThat(metricInfo.getUnit()).isEqualTo("UNIT2"); | ||
}) | ||
.anySatisfy( | ||
m -> { | ||
assertThat(m.getMetricValueExtractor().getAttributeName()).isEqualTo("ATTRIBUTE3"); | ||
|
||
MetricInfo mb3 = m.getInfo(); | ||
assertThat(mb3.getMetricName()).isEqualTo("PREFIX.ATTRIBUTE3"); | ||
MetricInfo metricInfo = m.getInfo(); | ||
assertThat(metricInfo.getMetricName()).isEqualTo("PREFIX.ATTRIBUTE3"); | ||
assertThat(metricInfo.getDescription()).isNull(); | ||
|
||
// syntax extension - defining a default unit and type | ||
assertThat(mb3.getType()).isEqualTo(MetricInfo.Type.UPDOWNCOUNTER); | ||
assertThat(mb3.getUnit()).isEqualTo("DEFAULT_UNIT"); | ||
assertThat(metricInfo.getType()) | ||
.describedAs("default type should match jmx rule definition") | ||
.isEqualTo(jmxDef.getMetricType()); | ||
assertThat(metricInfo.getUnit()) | ||
.describedAs("default unit should match jmx rule definition") | ||
.isEqualTo(jmxDef.getUnit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] this is mostly adding extra test assertions, not related to adding state metrics.
...etrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/engine/RuleParserTest.java
Outdated
Show resolved
Hide resolved
return new ArrayList<>(set.values()); | ||
} | ||
|
||
private static StateMapping getEffectiveStateMapping(Metric m, JmxRule rule) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] for state mapping, the resulting mapping is not a combination of rule level and metric level definitions. Only one of them is used.
Making it ready for review now that it has been validated/tested with a real tomcat instance. |
cc @PeterF778 in case you have a chance to review |
I believe you have a typo, it should be "port: param(port)". Also, for the sake of better understanding, shouldn't the metric name be something like |
...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java
Outdated
Show resolved
Hide resolved
...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java
Outdated
Show resolved
Hide resolved
Some observations: the OpenTelemetry metric type must always be
|
the simplification seems nice. would this attribute have to be named
|
It could be named anything, the nested yaml would provide the way to map it to OTel metric. |
Thanks a lot @PeterF778 for your suggestions, that really makes it simpler:
So I will update this PR to implement this. |
…nstrumentation into state-jmx-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterF778 I've added the changes you suggested, let me know if there are further changes/improvements needed here.
protected Object getSampleValue(MBeanServerConnection connection, ObjectName objectName) { | ||
return extractAttributeValue(connection, objectName, logger); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] this allows to bypass the sample value check for the "artificial" attribute value that we need for the state metric.
case STATE: | ||
{ | ||
// CHECKSTYLE:ON | ||
throw new IllegalStateException("state metrics should not be registered"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] this is needed to have the switch statement cover all cases.
} | ||
|
||
if (defaultState == null) { | ||
throw new IllegalStateException("missing default state"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is requiring the default state necessary? I understand that if it is missing and the actual state string value is not in the list, the metric value will be zero for all combinations of provided metric attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this might be "best practice" rather than strict requirement, but this would likely be the case for all the mappings embedded in the agent anyway, so preventing invalid metrics and making mappings more future proof is probably a better option, I'm fine to remove this as a strict requirement though.
Also I think changing the default wildcard value from |
Yes, this is fine. However, I just realized that the values of the string MBeans can be anything, in particular |
I agree, it can be anything, however I think it's definitely unlikely enough that we need to wait for any feedback on this to be an issue. The empty string might already be supported and we would just need to have a way to escape the |
@trask do you think this could make it for the 2.9.0 release ? |
...metrics/library/src/main/java/io/opentelemetry/instrumentation/jmx/yaml/MetricStructure.java
Show resolved
Hide resolved
…etry/instrumentation/jmx/yaml/MetricStructure.java Co-authored-by: Jay DeLuca <[email protected]>
connector_state: | ||
ok: STARTED | ||
failed: [STOPPED,FAILED] | ||
degraded: _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of open-telemetry/opentelemetry-configuration#115
though we're already not doing this in the jmx metrics (prefering yaml brevity), so this comment should not hold up this PR
@jack-berg do you think this approach (using objects for brevity) in jmx metrics module is ok long-term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would switch to using an array of objects, for example with name
and value
, then we could add a third one with default
attribute key with value true|false
to indicate which entry would be the default instead of using a "magic value" to represent this.
@@ -312,25 +361,125 @@ void testConf8() throws Exception { | |||
assertThat(mb1.getUnit()).isNull(); | |||
} | |||
|
|||
private static final String CONF9 = | |||
"--- # keep stupid spotlessJava at bay\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄
- a string literal or a string array | ||
- a `_` character to provide default option and avoid enumerating all values. | ||
|
||
Exactly one `_` value must be present in the mapping to ensure all possible values of the MBean attribute can be mapped to a state key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand why this is a requirement. If there are only certain states that you are interested in, why not allow to only specify those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a requirement, but allows to ensure that state metrics are always captured consistently, this is more best practice than a technical requirement.
Doing so means that at a given time, the sum of all samples is equal to 1, which translates to "the state of the thing we observe is known at all times". I think it's worth doing that for now unless we have good examples where it can be problematic.
The example I have in mind when adding this restriction is again probably biased by current focus on Tomcat state, so I'll elaborate a bit:
Let's say we monitor 3 tomcat servers, each of them with a single connector and a state metric to get the connector state with on|off
values. When consuming those metrics to create a dashboard:
Scenario 1:
- default set to
off
to indicate something is not right. - creating a linear chart (sum) with breakdown on state will always sum to 3, and the breakdown provides an obvious clue when something is not in
on
state. - creating a pie-chart with the latest value with breakdown on state also provides a good representation of the 3 tomcat instances.
- the end-user does not even need to know how many tomcat instances exist, you get information from the ratio of
on|off
in the charts and then can investigate what could be wrong if needed. - creating an extra dedicated
unknown
state could be used to gather those states as they might represent a gap in the metrics mapping.
Scenario 2:
- state value
on
when connector isRUNNING
,off
when connector isSTOPPED
, other possible values are not mapped. - creating a linear chart (sum) with breakdown on state will sum to 3 when the state is either
on
oroff
, but will go down whenever an un-mapped state is present - the end-user needs to know that the "usual value is 3", and that "when it goes down to less than 3 something is unknown and maybe we have a gap in our mapping"
- it is possible to trigger an alert by monitoring when the value goes down, but this is definitely more complex and depends mostly on the backend and metrics processing.
- with some visualizations it's even more tricky, for example when creating a pie-chart of the latest reported values to represent the "current state", if not all instances can be represented because their state is unknown then they will be ignored. In an extreme case, if only a single instance is in
on
state and the two others are in an unknown state then the pie-chart could be misleading representing "100% is running and it's fine", when in practice it's "only 1/3 fine".
Given that we usually need to monitor things that are working properly and even more the ones that not working properly (which are not in an expected state), having this as a requirement makes it harder to shoot yourself in the foot by missing a state whose value comes from an implementation detail of the monitored product.
let's discuss in Thu SIG meeting, we can hold release until Fri if we need a bit more time to discuss |
Have you looked at the state metrics in the collector? open-telemetry/semantic-conventions#1032 (comment) |
No, I wasn't aware of those "state metrics", but looking for those For
some metrics split in two metrics depending on the state, for example For |
…nstrumentation into state-jmx-metrics
…try-java-instrumentation into state-jmx-metrics
I've restored the original wildcard
So it's ready for a final review (just to be sure) and it should be good to be merged. |
Fixes #12229
This introduces the ability to capture "state metrics" that are captured from a JMX attribute value.
A "state metric" has the following properties:
1
.This follows a similar way to map state as we have in the (experimental) hardware semantic conventions as seen here with the
hw.state
attribute.Example from a common MBean in Tomcat:
JMX object name:
Catalina:type=Connector,port=*
, JMX attribute namestateName
, the value ofstateName
comes fromLifeCycleState
enum (12 distinct values).We can capture this as
tomcat.connector.state
with two attributesstate
with value:ok
,failed
ordegraded
(name is not limited tostate
and any name could be used here).port
that provides per-port breakdown (from the mbean wildcard)The YAML configuration used:
For a given port, let's say
8080
and this configuration 3 metrics are captured on every sample. The metric type will always beupdowncounter
and the value either0
or1
.When
stateName
=STARTED
, we have:tomcat.connector.state
value =1
, attributesport
=8080
andstate
=ok
tomcat.connector.state
value =0
, attributesport
=8080
andstate
=failed
tomcat.connector.state
value =0
, attributesport
=8080
andstate
=degraded
When
stateName
=STOPPED
orFAILED
, we have:tomcat.connector.state
value =0
, attributesport
=8080
andstate
=ok
tomcat.connector.state
value =1
, attributesport
=8080
andstate
=failed
tomcat.connector.state
value =0
, attributesport
=8080
andstate
=degraded
For other values of
stateName
, we have:tomcat.connector.state
value =0
, attributesport
=8080
andstate
=ok
tomcat.connector.state
value =0
, attributesport
=8080
andstate
=failed
tomcat.connector.state
value =1
, attributesport
=8080
andstate
=degraded
The ability to limit cardinality is relevant here to reduce the cardinality from the 12 potential values to only 3, most of which will very likely never be relevant in practice.