diff --git a/docs/content/maintenance/metrics.md b/docs/content/maintenance/metrics.md new file mode 100644 index 000000000000..6a8542f29100 --- /dev/null +++ b/docs/content/maintenance/metrics.md @@ -0,0 +1,399 @@ +--- +title: "Metrics" +weight: 9 +type: docs +aliases: +- /maintenance/metrics.html +--- + + +# Paimon Metrics + +Paimon has built a metrics system to measure the behaviours of reading and writing, like how many manifest files it scanned in the last planning, how long it took in the last commit operation, how many files it deleted in the last compact operation. + +In Paimon's metrics system, metrics are updated and reported at different levels of granularity. Currently, the levels of **table** and **bucket** are provided, which means you can get metrics per table or bucket. + +There are three types of metrics provided in the Paimon metric system, `Gauge`, `Counter`, `Histogram`. +- `Gauge`: Provides a value of any type at a point in time. +- `Counter`: Used to count values by incrementing and decrementing. +- `Histogram`: Measure the statistical distribution of a set of values including the min, max, mean, standard deviation and percentile. + +Paimon has supported built-in metrics to measure operations of **commits**, **scans** and **compactions**, which can be bridged to any computing engine that supports, like Flink, Spark etc. + +## Metrics List + +Below is lists of Paimon built-in metrics. They are summarized into three types of metrics, scan metrics, commit metrics and compaction metrics. + +### Scan Metrics + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Metrics NameLevelTypeDescription
lastScanDurationTableGaugeThe time it took to complete the last scan.
scanDurationTableHistogramDistributions of the time taken by the last few scans.
lastScannedManifestsTableGaugeNumber of scanned manifest files in the last scan.
lastSkippedByPartitionAndStatsTableGaugeSkipped table files by partition filter and value / key stats information in the last scan.
lastSkippedByBucketAndLevelFilterTableGaugeSkipped table files by bucket, bucket key and level filter in the last scan.
lastSkippedByWholeBucketFilesFilterTableGaugeSkipped table files by bucket level value filter (only primary key table) in the last scan.
lastScanSkippedTableFilesTableGaugeTotal skipped table files in the last scan.
lastScanResultedTableFilesTableGaugeResulted table files in the last scan.
+ +### Commit Metrics + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Metrics NameLevelTypeDescription
lastCommitDurationTableGaugeThe time it took to complete the last commit.
commitDurationTableHistogramDistributions of the time taken by the last few commits.
lastCommitAttemptsTableGaugeThe number of attempts the last commit made.
lastTableFilesAddedTableGaugeNumber of added table files in the last commit, including newly created data files and compacted after.
lastTableFilesDeletedTableGaugeNumber of deleted table files in the last commit, which comes from compacted before.
lastTableFilesAppendedTableGaugeNumber of appended table files in the last commit, which means the newly created data files.
lastTableFilesCommitCompactedTableGaugeNumber of compacted table files in the last commit, including compacted before and after.
lastChangelogFilesAppendedTableGaugeNumber of appended changelog files in last commit.
lastChangelogFileCommitCompactedTableGaugeNumber of compacted changelog files in last commit.
lastGeneratedSnapshotsTableGaugeNumber of snapshot files generated in the last commit, maybe 1 snapshot or 2 snapshots.
lastDeltaRecordsAppendedTableGaugeDelta records count in last commit with APPEND commit kind.
lastChangelogRecordsAppendedTableGaugeChangelog records count in last commit with APPEND commit kind.
lastDeltaRecordsCommitCompactedTableGaugeDelta records count in last commit with COMPACT commit kind.
lastChangelogRecordsCommitCompactedTableGaugeChangelog records count in last commit with COMPACT commit kind.
lastPartitionsWrittenTableGaugeNumber of partitions written in the last commit.
lastBucketsWrittenTableGaugeNumber of buckets written in the last commit.
+ +### Compaction Metrics + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Metrics NameLevelTypeDescription
lastCompactionDurationBucketGaugeThe time it took to complete the last compaction.
compactionDurationBucketHistogramDistributions of the time taken by the last few compaction.
lastTableFilesCompactedBeforeBucketGaugeNumber of deleted files in the last compaction.
lastTableFilesCompactedAfterBucketGaugeNumber of added files in the last compaction.
lastChangelogFilesCompactedBucketGaugeNumber of changelog files compacted in last compaction.
lastRewriteInputFileSizeBucketGaugeSize of deleted files in the last compaction.
lastRewriteOutputFileSizeBucketGaugeSize of added files in the last compaction.
lastRewriteChangelogFileSizeBucketGaugeSize of changelog files compacted in last compaction.
+ +## Bridging To Flink + +Paimon has implemented bridging metrics to Flink's metrics system, which can be reported by Flink, and the lifecycle of metric groups are managed by Flink. + +Please join the `..` to get the complete metric identifier when using Flink to access Paimon, `metric_name` can be got from [Metric List]({{< ref "maintenance/metrics#metrics-list" >}}). + +For example, the identifier of metric `lastPartitionsWritten` for table `word_count` in Flink job named `insert_word_count` is: + +`localhost.taskmanager.localhost:60340-775a20.insert_word_count.Global Committer : word_count.0.paimon.table.word_count.commit.lastPartitionsWritten`. + +From Flink Web-UI, go to the committer operator's metrics, it's shown as: + +`0.Global_Committer___word_count.paimon.table.word_count.commit.lastPartitionsWritten`. + +{{< hint info >}} +1. Please refer to [System Scope](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-scope) to understand Flink `scope` +2. Scan metrics are only supported by Flink versions >= 1.18 +{{< /hint >}} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ScopeInfix
Scan Metrics<host>.jobmanager.<job_name><source_operator_name>.coordinator. enumerator.paimon.table.<table_name>.scan
Commit Metrics<host>.taskmanager.<tm_id>.<job_name>.<committer_operator_name>.<subtask_index>paimon.table.<table_name>.commit
Compaction Metrics<host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index>paimon.table.<table_name>.partition.<partition_string>.bucket.<bucket_index>.compaction
Flink Source Metrics<host>.taskmanager.<tm_id>.<job_name>.<source_operator_name>.<subtask_index>-
Flink Sink Metrics<host>.taskmanager.<tm_id>.<job_name>.<committer_operator_name>.<subtask_index>-
+ +### Flink Connector Standard Metrics + +When using Flink to read and write, Paimon has implemented some key standard Flink connector metrics to measure the source latency and output of sink, see [FLIP-33: Standardize Connector Metrics](https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics). Flink source / sink metrics implemented are listed here. + +#### Source Metrics (Flink) + + + + + + + + + + + + + + + + + + +
Metrics NameLevelTypeDescription
currentFetchEventTimeLagFlink Source OperatorGaugeTime difference between reading the data file and file creation.
+ +#### Sink Metrics (Flink) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Metrics NameLevelTypeDescription
numBytesOutTableCounterThe total number of output bytes.
numBytesOutPerSecondTableMeterThe output bytes per second.
numRecordsOutTableCounterThe total number of output records.
numRecordsOutPerSecondTableMeterThe output records per second.