Skip to content

Commit

Permalink
fix(metrics): fix incorrect metric label and add docs (#2180)
Browse files Browse the repository at this point in the history
Signed-off-by: Derek Wang <[email protected]>
  • Loading branch information
whynowy authored Oct 23, 2024
1 parent f21e75b commit ee27af3
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 17 deletions.
35 changes: 19 additions & 16 deletions docs/operations/metrics/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,22 +69,25 @@ These metrics can be used to determine the latency of your pipeline.

### Errors

These metrics can be used to determine if there are any errors in the pipeline

| Metric name | Metric type | Labels | Description |
| --------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| `pipeline_data_processing_health` | Gauge | `pipeline=<pipeline-name>` | Pipeline data processing health status. 1: Healthy, 0: Unknown, -1: Warning, -2: Critical |
| `forwarder_platform_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` | Indicates any internal errors which could stop pipeline processing |
| `forwarder_read_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Indicates any errors while reading messages by the forwarder |
| `forwarder_write_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` `vertex_type=<vertex-type>` <br> <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Indicates any errors while writing messages by the forwarder |
| `forwarder_ack_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Indicates any errors while acknowledging messages by the forwarder |
| `kafka_source_offset_ack_errors` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` | Indicates any kafka acknowledgement errors |
| `kafka_sink_write_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` | Provides the number of errors while writing to the Kafka sink |
| `kafka_sink_write_timeout_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` | Provides the write timeouts while writing to the Kafka sink |
| `isb_jetstream_read_error_total` | Counter | `partition_name=<partition-name>` | Indicates any read errors with NATS Jetstream ISB |
| `isb_jetstream_write_error_total` | Counter | `partition_name=<partition-name>` | Indicates any write errors with NATS Jetstream ISB |
| `isb_redis_read_error_total` | Counter | `partition_name=<partition-name>` | Indicates any read errors with Redis ISB |
| `isb_redis_write_error_total` | Counter | `partition_name=<partition-name>` | Indicates any write errors with Redis ISB |
These metrics can be used to determine if there are any errors in the pipeline.

| Metric name | Metric type | Labels | Description |
| --------------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| `pipeline_data_processing_health` | Gauge | `pipeline=<pipeline-name>` | Pipeline data processing health status. 1: Healthy, 0: Unknown, -1: Warning, -2: Critical |
| `controller_isbsvc_health` | Gauge | `ns=<namespace>` <br> `isbsvc=<isbsvc-name>` | A metric to indicate whether the ISB Service is healthy. '1' means healthy, '0' means unhealthy |
| `controller_pipeline_health` | Gauge | `ns=<namespace>` <br> `pipeline=<pipeline-name>` | A metric to indicate whether the Pipeline is healthy. '1' means healthy, '0' means unhealthy |
| `controller_monovtx_health` | Gauge | `ns=<namespace>` <br> `mvtx_name=<mvtx-name>` | A metric to indicate whether the MonoVertex is healthy. '1' means healthy, '0' means unhealthy |
| `forwarder_platform_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` | Indicates any internal errors which could stop pipeline processing |
| `forwarder_read_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Indicates any errors while reading messages by the forwarder |
| `forwarder_write_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` `vertex_type=<vertex-type>` <br> <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Indicates any errors while writing messages by the forwarder |
| `forwarder_ack_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` <br> `vertex_type=<vertex-type>` <br> `replica=<replica-index>` <br> `partition_name=<partition-name>` | Indicates any errors while acknowledging messages by the forwarder |
| `kafka_source_offset_ack_errors` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` | Indicates any kafka acknowledgement errors |
| `kafka_sink_write_error_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` | Provides the number of errors while writing to the Kafka sink |
| `kafka_sink_write_timeout_total` | Counter | `pipeline=<pipeline-name>` <br> `vertex=<vertex-name>` | Provides the write timeouts while writing to the Kafka sink |
| `isb_jetstream_read_error_total` | Counter | `partition_name=<partition-name>` | Indicates any read errors with NATS Jetstream ISB |
| `isb_jetstream_write_error_total` | Counter | `partition_name=<partition-name>` | Indicates any write errors with NATS Jetstream ISB |
| `isb_redis_read_error_total` | Counter | `partition_name=<partition-name>` | Indicates any read errors with Redis ISB |
| `isb_redis_write_error_total` | Counter | `partition_name=<partition-name>` | Indicates any write errors with Redis ISB |

### Saturation

Expand Down
2 changes: 1 addition & 1 deletion pkg/reconciler/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ var (
Subsystem: "controller",
Name: "pipeline_health",
Help: "A metric to indicate whether the Pipeline is healthy. '1' means healthy, '0' means unhealthy",
}, []string{metrics.LabelNamespace, metrics.LabelISBService})
}, []string{metrics.LabelNamespace, metrics.LabelPipeline})

MonoVertexHealth = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Subsystem: "controller",
Expand Down

0 comments on commit ee27af3

Please sign in to comment.