From b8d18d4823bf0f7ded2e9963e917695602fb9502 Mon Sep 17 00:00:00 2001 From: Harshini Rangaswamy <108724024+harshini-rangaswamy@users.noreply.github.com> Date: Tue, 26 Nov 2024 10:09:35 +0100 Subject: [PATCH] add: Metrics for OpenSearch via Prometheus (#571) --- .../reference/kafka-metrics-prometheus.md | 279 +----------------- docs/products/opensearch/howto/os-metrics.md | 82 +++++ sidebars.ts | 1 + static/includes/host-metrics.md | 256 ++++++++++++++++ 4 files changed, 353 insertions(+), 265 deletions(-) create mode 100644 docs/products/opensearch/howto/os-metrics.md create mode 100644 static/includes/host-metrics.md diff --git a/docs/products/kafka/reference/kafka-metrics-prometheus.md b/docs/products/kafka/reference/kafka-metrics-prometheus.md index c1ebedbc9..571cd1656 100644 --- a/docs/products/kafka/reference/kafka-metrics-prometheus.md +++ b/docs/products/kafka/reference/kafka-metrics-prometheus.md @@ -1,15 +1,15 @@ --- title: Aiven for Apache Kafka® metrics available via Prometheus --- +import HostMetrics from "@site/static/includes/host-metrics.md"; Explore common metrics available via Prometheus for your Aiven for Apache Kafka® service. ## How to retrieve metrics +You can retrieve a complete list of metrics from your service by querying the Prometheus +endpoint. To do this: -To get the complete list of available metrics for your service, query the -Prometheus endpoint. To get started: - -1. Ensure you have the following information: +1. Gather the necessary details: - Aiven project certificate: `ca.pem`. To download the CA certificate, see [Download CA certificates](/docs/platform/concepts/tls-ssl-certificates#download-ca-certificates). @@ -17,7 +17,7 @@ Prometheus endpoint. To get started: - Aiven for Apache Kafka hostname: `` - Prometheus port: `` -1. Use the `curl` command to query the Prometheus endpoint: +1. Run the following `curl` command to query the Prometheus endpoint: ```bash curl --cacert ca.pem \ @@ -25,68 +25,15 @@ Prometheus endpoint. To get started: 'https://:/metrics' ``` -For detailed instructions on setting up Prometheus integration with -Aiven, see [Use Prometheus with Aiven](/docs/platform/howto/integrations/prometheus-metrics) - -## CPU utilization - -CPU utilization metrics offer insights into CPU usage. These metrics -include time spent on different processes, system load, and overall uptime. - -| Metric | Description | -|-------------------------|-------------------------------------------------------------------------------------------------------| -| `cpu_usage_guest` | CPU time spent running a virtual CPU for guest operating systems | -| `cpu_usage_guest_nice` | CPU time spent running low-priority virtual CPUs for guest operating systems. These processes can be interrupted by higher-priority tasks. Metric is measured in hundredths of a second | -| `cpu_usage_idle` | Time the CPU spends doing nothing | -| `cpu_usage_iowait` | Time waiting for I/O to complete | -| `cpu_usage_irq` | Time servicing interrupts | -| `cpu_usage_nice` | Time running user-niced processes | -| `cpu_usage_softirq` | Time servicing softirqs | -| `cpu_usage_steal` | Time spent in other operating systems when running in a virtualized environment | -| `cpu_usage_system` | Time spent running system processes | -| `cpu_usage_user` | Time spent running user processes | -| `system_load1` | System load average for the last minute | -| `system_load15` | System load average for the last 15 minutes | -| `system_load5` | System load average for the last 5 minutes | -| `system_n_cpus` | Number of CPU cores available | -| `system_n_users` | Number of users logged in | -| `system_uptime` | Time for which the system has been up and running | - -## Disk space utilization - -Disk space utilization metrics provide a snapshot of disk usage. These metrics include -information about free and used disk space, as well as `inode` usage and -total disk capacity. - -| Metric | Description | -|----------------------|-------------------------------| -| `disk_free` | Amount of free disk space | -| `disk_inodes_free` | Number of free inodes | -| `disk_inodes_total` | Total number of inodes | -| `disk_inodes_used` | Number of used inodes | -| `disk_total` | Total disk space | -| `disk_used` | Amount of used disk space | -| `disk_used_percent` | Percentage of disk space used| - -## Disk input and output - -Metrics such as `diskio_io_time` and `diskio_iops_in_progress` provide insights into -disk I/O operations. These metrics cover read/write operations, the duration of these -operations, and the number of bytes read/written. - -| Metric | Description | -|--------------------------|-------------------------------------------------------------------------------------------------------| -| `diskio_io_time` | Total time spent on I/O operations | -| `diskio_iops_in_progress`| Number of I/O operations currently in progress | -| `diskio_merged_reads` | Number of read operations that were merged | -| `diskio_merged_writes` | Number of write operations that were merged | -| `diskio_read_bytes` | Total bytes read from disk | -| `diskio_read_time` | Total time spent on read operations | -| `diskio_reads` | Total number of read operations | -| `diskio_weighted_io_time`| Weighted time spent on I/O operations, considering their duration and intensity | -| `diskio_write_bytes` | Total bytes written to disk | -| `diskio_write_time` | Total time spent on write operations | -| `diskio_writes` | Total number of write operations | +For more information about setting up Prometheus integration, +see [Use Prometheus with Aiven](/docs/platform/howto/integrations/prometheus-metrics) + + + +## Aiven for Apache Kafka-specific metrics + +Metrics specific to Apache Kafka provide detailed insights into the health and +performance of your Kafka clusters, including broker, controller, and topic-level metrics. ## Garbage collector `MXBean` @@ -349,201 +296,3 @@ data copying, fetching, deleting, and their associated lags and errors. | `kafka_server_BrokerTopicMetrics_RemoteDeleteErrorsPerSec_Count` | Number of errors per second encountered during remote delete | | `kafka_server_BrokerTopicMetrics_RemoteDeleteLagBytes_Value` | Number of bytes in non-active segments marked for deletion but not yet deleted from remote storage | | `kafka_server_BrokerTopicMetrics_RemoteDeleteLagSegments_Value` | Number of non-active segments marked for deletion but not yet deleted from remote storage | - -### Kernel - -The metrics listed below, such as `kernel_boot_time` and `kernel_context_switches`, provide -insights into the operations of your system's kernel. - - -| Metric | Description | -|----------------------------|--------------------------------------------------| -| `kernel_boot_time` | Time at which the system was last booted | -| `kernel_context_switches` | Number of context switches that have occurred in the kernel | -| `kernel_entropy_avail` | Amount of available entropy in the kernel's entropy pool | -| `kernel_interrupts` | Number of interrupts that have occurred | -| `kernel_processes_forked` | Number of processes that have been forked | - -### Generic memory - -The following metrics, including `mem_active` and `mem_available`, provide insights into -your system's memory usage. - -| Metric | Description | -|--------------------------|----------------------------------------------------------------| -| `mem_active` | Amount of actively used memory | -| `mem_available` | Amount of available memory | -| `mem_available_percent` | Percentage of available memory | -| `mem_buffered` | Amount of memory used for buffering I/O | -| `mem_cached` | Amount of memory used for caching | -| `mem_commit_limit` | Maximum amount of memory that can be committed | -| `mem_committed_as` | Total amount of committed memory | -| `mem_dirty` | Amount of memory waiting to be written to disk | -| `mem_free` | Amount of free memory | -| `mem_high_free` | Amount of free memory in the high memory zone | -| `mem_high_total` | Total amount of memory in the high memory zone | -| `mem_huge_pages_free` | Number of free huge pages | -| `mem_huge_page_size` | Size of huge pages | -| `mem_huge_pages_total` | Total number of huge pages | -| `mem_inactive` | Amount of inactive memory | -| `mem_low_free` | Amount of free memory in the low memory zone | -| `mem_low_total` | Total amount of memory in the low memory zone | -| `mem_mapped` | Amount of memory mapped into the process's address space | -| `mem_page_tables` | Amount of memory used by page tables | -| `mem_shared` | Amount of memory shared between processes | -| `mem_slab` | Amount of memory used by the kernel for data structure caches | -| `mem_swap_cached` | Amount of swap memory cached | -| `mem_swap_free` | Amount of free swap memory | -| `mem_swap_total` | Total amount of swap memory | -| `mem_total` | Total amount of memory | -| `mem_used` | Amount of used memory | -| `mem_used_percent` | Percentage of used memory | -| `mem_vmalloc_chunk` | Largest contiguous block of vmalloc memory available | -| `mem_vmalloc_total` | Total amount of vmalloc memory | -| `mem_vmalloc_used` | Amount of used vmalloc memory | -| `mem_wired` | Amount of wired memory | -| `mem_write_back` | Amount of memory being written back to disk | -| `mem_write_back_tmp` | Amount of temporary memory being written back to disk | - -### Network - -The following metrics, including `net_bytes_recv` and `net_packets_sent`, provide insights -into your system's network operations. - -| Metric | Description | -|------------------------------|---------------------------------------------------------------------------------| -| `net_bytes_recv` | Total bytes received on the network interfaces | -| `net_bytes_sent` | Total bytes sent on the network interfaces | -| `net_drop_in` | Incoming packets dropped | -| `net_drop_out` | Outgoing packets dropped | -| `net_err_in` | Incoming packets with errors | -| `net_err_out` | Outgoing packets with errors | -| `net_icmp_inaddrmaskreps` | Number of ICMP address mask replies received | -| `net_icmp_inaddrmasks` | Number of ICMP address mask requests received | -| `net_icmp_incsumerrors` | Number of ICMP checksum errors | -| `net_icmp_indestunreachs` | Number of ICMP destination unreachable messages received | -| `net_icmp_inechoreps` | Number of ICMP echo replies received | -| `net_icmp_inechos` | Number of ICMP echo requests received | -| `net_icmp_inerrors` | Number of ICMP messages received with errors | -| `net_icmp_inmsgs` | Total number of ICMP messages received | -| `net_icmp_inparmprobs` | Number of ICMP parameter problem messages received | -| `net_icmp_inredirects` | Number of ICMP redirect messages received | -| `net_icmp_insrcquenchs` | Number of ICMP source quench messages received | -| `net_icmp_intimeexcds` | Number of ICMP time exceeded messages received | -| `net_icmp_intimestampreps` | Number of ICMP timestamp reply messages received | -| `net_icmp_intimestamps` | Number of ICMP timestamp request messages received | -| `net_icmpmsg_intype3` | Number of ICMP type 3 (destination unreachable) messages received | -| `net_icmpmsg_intype8` | Number of ICMP type 8 (echo request) messages received | -| `net_icmpmsg_outtype0` | Number of ICMP type 0 (echo reply) messages sent | -| `net_icmpmsg_outtype3` | Number of ICMP type 3 (destination unreachable) messages sent | -| `net_icmp_outaddrmaskreps` | Number of ICMP address mask reply messages sent | -| `net_icmp_outaddrmasks` | Number of ICMP address mask request messages sent | -| `net_icmp_outdestunreachs` | Number of ICMP destination unreachable messages sent | -| `net_icmp_outechoreps` | Number of ICMP echo reply messages sent | -| `net_icmp_outechos` | Number of ICMP echo request messages sent | -| `net_icmp_outerrors` | Number of ICMP messages sent with errors | -| `net_icmp_outmsgs` | Total number of ICMP messages sent | -| `net_icmp_outparmprobs` | Number of ICMP parameter problem messages sent | -| `net_icmp_outredirects` | Number of ICMP redirect messages sent | -| `net_icmp_outsrcquenchs` | Number of ICMP source quench messages sent | -| `net_icmp_outtimeexcds` | Number of ICMP time exceeded messages sent | -| `net_icmp_outtimestampreps` | Number of ICMP timestamp reply messages sent | -| `net_icmp_outtimestamps` | Number of ICMP timestamp request messages sent | -| `net_ip_defaultttl` | Default time-to-live for IP packets | -| `net_ip_forwarding` | Indicates if IP forwarding is enabled | -| `net_ip_forwdatagrams` | Number of forwarded IP datagrams | -| `net_ip_fragcreates` | Number of IP fragments created | -| `net_ip_fragfails` | Number of failed IP fragmentations | -| `net_ip_fragoks` | Number of successful IP fragmentations | -| `net_ip_inaddrerrors` | Number of incoming IP packets with address errors | -| `net_ip_indelivers` | Number of incoming IP packets delivered to higher layers | -| `net_ip_indiscards` | Number of incoming IP packets discarded | -| `net_ip_inhdrerrors` | Number of incoming IP packets with header errors | -| `net_ip_inreceives` | Total number of incoming IP packets received | -| `net_ip_inunknownprotos` | Number of incoming IP packets with unknown protocols | -| `net_ip_outdiscards` | Number of outgoing IP packets discarded | -| `net_ip_outnoroutes` | Number of outgoing IP packets with no route available | -| `net_ip_outrequests` | Total number of outgoing IP packets requested to be sent | -| `net_ip_reasmfails` | Number of failed IP reassembly attempts | -| `net_ip_reasmoks` | Number of successful IP reassembly attempts | -| `net_ip_reasmreqds` | Number of IP fragments received needing reassembly | -| `net_ip_reasmtimeout` | Number of IP reassembly timeouts | -| `net_packets_recv` | Total number of packets received on the network interfaces | -| `net_packets_sent` | Total number of packets sent on the network interfaces | -| `netstat_tcp_close` | Number of TCP connections in the CLOSE state | -| `netstat_tcp_close_wait` | Number of TCP connections in the CLOSE_WAIT state | -| `netstat_tcp_closing` | Number of TCP connections in the CLOSING state | -| `netstat_tcp_established` | Number of TCP connections in the ESTABLISHED state | -| `netstat_tcp_fin_wait1` | Number of TCP connections in the FIN_WAIT_1 state | -| `netstat_tcp_fin_wait2` | Number of TCP connections in the FIN_WAIT_2 state | -| `netstat_tcp_last_ack` | Number of TCP connections in the LAST_ACK state | -| `netstat_tcp_listen` | Number of TCP connections in the LISTEN state | -| `netstat_tcp_none` | Number of TCP connections in the NONE state | -| `netstat_tcp_syn_recv` | Number of TCP connections in the SYN_RECV state | -| `netstat_tcp_syn_sent` | Number of TCP connections in the SYN_SENT state | -| `netstat_tcp_time_wait` | Number of TCP connections in the TIME_WAIT state | -| `netstat_udp_socket` | Number of UDP sockets | -| `net_tcp_activeopens` | Number of active TCP open connections | -| `net_tcp_attemptfails` | Number of failed TCP connection attempts | -| `net_tcp_currestab` | Number of currently established TCP connections | -| `net_tcp_estabresets` | Number of established TCP connections reset | -| `net_tcp_incsumerrors` | Number of TCP checksum errors in incoming packets | -| `net_tcp_inerrs` | Number of incoming TCP packets with errors | -| `net_tcp_insegs` | Number of TCP segments received | -| `net_tcp_maxconn` | Maximum number of TCP connections supported | -| `net_tcp_outrsts` | Number of TCP reset packets sent | -| `net_tcp_outsegs` | Number of TCP segments sent | -| `net_tcp_passiveopens` | Number of passive TCP open connections | -| `net_tcp_retranssegs` | Number of TCP segments retransmitted | -| `net_tcp_rtoalgorithm` | TCP retransmission timeout algorithm | -| `net_tcp_rtomax` | Maximum TCP retransmission timeout | -| `net_tcp_rtomin` | Minimum TCP retransmission timeout | -| `net_udp_ignoredmulti` | Number of UDP multicast packets ignored | -| `net_udp_incsumerrors` | Number of UDP checksum errors in incoming packets | -| `net_udp_indatagrams` | Number of UDP datagrams received | -| `net_udp_inerrors` | Number of incoming UDP packets with errors | -| `net_udplite_ignoredmulti` | Number of UDP-Lite multicast packets ignored | -| `net_udplite_incsumerrors` | Number of UDP-Lite checksum errors in incoming packets | -| `net_udplite_indatagrams` | Number of UDP-Lite datagrams received | -| `net_udplite_inerrors` | Number of incoming UDP-Lite packets with errors | -| `net_udplite_noports` | Number of UDP-Lite packets received with no port available | -| `net_udplite_outdatagrams` | Number of UDP-Lite datagrams sent | -| `net_udplite_rcvbuferrors` | Number of UDP-Lite receive buffer errors | -| `net_udplite_sndbuferrors` | Number of UDP-Lite send buffer errors | -| `net_udp_noports` | Number of UDP packets received with no port available | -| `net_udp_outdatagrams` | Number of UDP datagrams sent | -| `net_udp_rcvbuferrors` | Number of UDP receive buffer errors | -| `net_udp_sndbuferrors` | Number of UDP send buffer errors | - -### Process - -Metrics such as `processes_running` and `processes_zombies` provide insights into the -management of the system's processes. - -| Metric | Description | -|----------------------------|-----------------------------------------------------------------------------| -| `processes_blocked` | Number of processes that are blocked | -| `processes_dead` | Number of processes that have terminated | -| `processes_idle` | Number of processes that are idle | -| `processes_paging` | Number of processes that are paging | -| `processes_running` | Number of processes currently running | -| `processes_sleeping` | Number of processes that are sleeping | -| `processes_stopped` | Number of processes that are stopped | -| `processes_total` | Total number of processes | -| `processes_total_threads` | Total number of threads across all processes | -| `processes_unknown` | Number of processes in an unknown state | -| `processes_zombies` | Number of zombie processes (terminated but not reaped by parent process) | - -### Swap usage - -Metrics such as `swap_free` and `swap_used` provide insights into the usage of the -system's swap memory. - -| Metric | Description | -|--------------------|--------------------------------------------------| -| `swap_free` | Amount of free swap memory | -| `swap_in` | Amount of data swapped in from disk | -| `swap_out` | Amount of data swapped out to disk | -| `swap_total` | Total amount of swap memory | -| `swap_used` | Amount of used swap memory | -| `swap_used_percent`| Percentage of swap memory used | diff --git a/docs/products/opensearch/howto/os-metrics.md b/docs/products/opensearch/howto/os-metrics.md new file mode 100644 index 000000000..8a76b3aee --- /dev/null +++ b/docs/products/opensearch/howto/os-metrics.md @@ -0,0 +1,82 @@ +--- +title: Aiven for OpenSearch® metrics available via Prometheus +sidebar_label: OpenSearch metrics with Prometheus +--- + +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; +import HostMetrics from "@site/static/includes/host-metrics.md"; +import ConsoleLabel from "@site/src/components/ConsoleIcons" +import LimitedBadge from "@site/src/components/Badges/LimitedBadge"; + +Monitor and optimize your Aiven for OpenSearch service with metrics available via Prometheus. +These metrics help track cluster health, replication status, and overall performance. + +## Prerequisites + +- [Enable Prometheus integration](/docs/platform/howto/integrations/prometheus-metrics). +- Note the Prometheus **username** and **password** in the **Integration endpoints** + section of the [Aiven Console](https://console.aiven.io/). + +## Access Prometheus metrics + + + + +1. Open your service's page in the + [Aiven Console](https://console.aiven.io/). +1. In the **Connection information** section, click the **Prometheus** tab. +1. Copy the **Service URI**. +1. Paste the Service URI into your browser's address bar. +1. When prompted, enter your Prometheus credentials. +1. Click **Login**. + + + + +To retrieve metrics, run the following `curl` command: + +```bash +curl --user 'USERNAME:PASSWORD' PROMETHEUS_URL/metrics +``` + +Replace `USERNAME:PASSWORD` with your Prometheus credentials and `PROMETHEUS_URL` +with the Service URI from the **Connection information** section. + + + + + + +## OpenSearch-specific metrics + +These metrics provide insights into the performance and health of your Aiven for +OpenSearch service. + +### Node statistics + +Track node metrics such as CPU and memory usage, disk I/O, and JVM statistics. For more +information, see the +[node stats API](https://opensearch.org/docs/latest/api-reference/nodes-apis/nodes-stats/). + +### Cluster statistics + +Track cluster-level metrics, including the number of indices, shard distribution, and +memory usage. For more information, see the +[cluster stats API](https://opensearch.org/docs/latest/api-reference/cluster-api/cluster-stats/). + +### Cluster health + +Monitor cluster health with granular metrics available at the index level. Use `local` +with `level=index` to view index-specific health status. +For more information, see +the [cluster health API](https://opensearch.org/docs/latest/api-reference/cluster-api/cluster-health/). + +### Cross-cluster replication (CCR) stats + +- **Leader stats:** Monitor replication metrics from the leader cluster, including + replication lag and synchronization status. For more information, see + the [leader cluster stats API](https://opensearch.org/docs/latest/tuning-your-cluster/replication-plugin/api/#get-leader-cluster-stats). +- **Follower stats:** Monitor follower cluster metrics, including replication delays and + error rates, to maintain data consistency. For more information, see the + [follower cluster stats API](https://opensearch.org/docs/latest/tuning-your-cluster/replication-plugin/api/#get-follower-cluster-stats). diff --git a/sidebars.ts b/sidebars.ts index ae924099e..72d615081 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -1709,6 +1709,7 @@ const sidebars: SidebarsConfig = { 'products/opensearch/reference/restapi-limited-access', 'products/opensearch/reference/low-space-watermarks', + 'products/opensearch/howto/os-metrics', ], }, { diff --git a/static/includes/host-metrics.md b/static/includes/host-metrics.md new file mode 100644 index 000000000..a2cb332ec --- /dev/null +++ b/static/includes/host-metrics.md @@ -0,0 +1,256 @@ +## Host metrics +Host metrics provide insights into system-level performance, including CPU, memory, disk, and network usage. + +### CPU utilization + +CPU utilization metrics offer insights into CPU usage. These metrics +include time spent on different processes, system load, and overall uptime. + +| Metric | Description | +|-------------------------|-------------------------------------------------------------------------------------------------------| +| `cpu_usage_guest` | CPU time spent running a virtual CPU for guest operating systems | +| `cpu_usage_guest_nice` | CPU time running low-priority virtual CPUs for guest operating systems; interrupted by higher-priority tasks and measured in hundredths of a second| +| `cpu_usage_idle` | Time the CPU spends doing nothing | +| `cpu_usage_iowait` | Time waiting for I/O to complete | +| `cpu_usage_irq` | Time servicing interrupts | +| `cpu_usage_nice` | Time running user-niced processes | +| `cpu_usage_softirq` | Time servicing softirqs | +| `cpu_usage_steal` | Time spent in other operating systems when running in a virtualized environment | +| `cpu_usage_system` | Time spent running system processes | +| `cpu_usage_user` | Time spent running user processes | +| `system_load1` | System load average for the last minute | +| `system_load15` | System load average for the last 15 minutes | +| `system_load5` | System load average for the last 5 minutes | +| `system_n_cpus` | Number of CPU cores available | +| `system_n_users` | Number of users logged in | +| `system_uptime` | Time for which the system has been up and running | + +### Disk space utilization + +Disk space utilization metrics provide a snapshot of disk usage. These metrics include +information about free and used disk space, as well as `inode` usage and +total disk capacity. + +| Metric | Description | +|----------------------|-------------------------------| +| `disk_free` | Amount of free disk space | +| `disk_inodes_free` | Number of free inodes | +| `disk_inodes_total` | Total number of inodes | +| `disk_inodes_used` | Number of used inodes | +| `disk_total` | Total disk space | +| `disk_used` | Amount of used disk space | +| `disk_used_percent` | Percentage of disk space used| + +### Disk input and output + +Metrics such as `diskio_io_time` and `diskio_iops_in_progress` provide insights into +disk I/O operations. These metrics cover read/write operations, the duration of these +operations, and the number of bytes read/written. + +| Metric | Description | +|--------------------------|-------------------------------------------------------------------------------------------------------| +| `diskio_io_time` | Total time spent on I/O operations | +| `diskio_iops_in_progress`| Number of I/O operations currently in progress | +| `diskio_merged_reads` | Number of read operations that were merged | +| `diskio_merged_writes` | Number of write operations that were merged | +| `diskio_read_bytes` | Total bytes read from disk | +| `diskio_read_time` | Total time spent on read operations | +| `diskio_reads` | Total number of read operations | +| `diskio_weighted_io_time`| Weighted time spent on I/O operations, considering their duration and intensity | +| `diskio_write_bytes` | Total bytes written to disk | +| `diskio_write_time` | Total time spent on write operations | +| `diskio_writes` | Total number of write operations | + +### Generic memory + +The following metrics, including `mem_active` and `mem_available`, provide insights into +your system's memory usage. + +| Metric | Description | +|--------------------------|----------------------------------------------------------------| +| `mem_active` | Amount of actively used memory | +| `mem_available` | Amount of available memory | +| `mem_available_percent` | Percentage of available memory | +| `mem_buffered` | Amount of memory used for buffering I/O | +| `mem_cached` | Amount of memory used for caching | +| `mem_commit_limit` | Maximum amount of memory that can be committed | +| `mem_committed_as` | Total amount of committed memory | +| `mem_dirty` | Amount of memory waiting to be written to disk | +| `mem_free` | Amount of free memory | +| `mem_high_free` | Amount of free memory in the high memory zone | +| `mem_high_total` | Total amount of memory in the high memory zone | +| `mem_huge_pages_free` | Number of free huge pages | +| `mem_huge_page_size` | Size of huge pages | +| `mem_huge_pages_total` | Total number of huge pages | +| `mem_inactive` | Amount of inactive memory | +| `mem_low_free` | Amount of free memory in the low memory zone | +| `mem_low_total` | Total amount of memory in the low memory zone | +| `mem_mapped` | Amount of memory mapped into the process's address space | +| `mem_page_tables` | Amount of memory used by page tables | +| `mem_shared` | Amount of memory shared between processes | +| `mem_slab` | Amount of memory used by the kernel for data structure caches | +| `mem_swap_cached` | Amount of swap memory cached | +| `mem_swap_free` | Amount of free swap memory | +| `mem_swap_total` | Total amount of swap memory | +| `mem_total` | Total amount of memory | +| `mem_used` | Amount of used memory | +| `mem_used_percent` | Percentage of used memory | +| `mem_vmalloc_chunk` | Largest contiguous block of vmalloc memory available | +| `mem_vmalloc_total` | Total amount of vmalloc memory | +| `mem_vmalloc_used` | Amount of used vmalloc memory | +| `mem_wired` | Amount of wired memory | +| `mem_write_back` | Amount of memory being written back to disk | +| `mem_write_back_tmp` | Amount of temporary memory being written back to disk | + +### Network + +The following metrics, including `net_bytes_recv` and `net_packets_sent`, provide insights +into your system's network operations. + +| Metric | Description | +|------------------------------|---------------------------------------------------------------------------------| +| `net_bytes_recv` | Total bytes received on the network interfaces | +| `net_bytes_sent` | Total bytes sent on the network interfaces | +| `net_drop_in` | Incoming packets dropped | +| `net_drop_out` | Outgoing packets dropped | +| `net_err_in` | Incoming packets with errors | +| `net_err_out` | Outgoing packets with errors | +| `net_icmp_inaddrmaskreps` | Number of ICMP address mask replies received | +| `net_icmp_inaddrmasks` | Number of ICMP address mask requests received | +| `net_icmp_incsumerrors` | Number of ICMP checksum errors | +| `net_icmp_indestunreachs` | Number of ICMP destination unreachable messages received | +| `net_icmp_inechoreps` | Number of ICMP echo replies received | +| `net_icmp_inechos` | Number of ICMP echo requests received | +| `net_icmp_inerrors` | Number of ICMP messages received with errors | +| `net_icmp_inmsgs` | Total number of ICMP messages received | +| `net_icmp_inparmprobs` | Number of ICMP parameter problem messages received | +| `net_icmp_inredirects` | Number of ICMP redirect messages received | +| `net_icmp_insrcquenchs` | Number of ICMP source quench messages received | +| `net_icmp_intimeexcds` | Number of ICMP time exceeded messages received | +| `net_icmp_intimestampreps` | Number of ICMP timestamp reply messages received | +| `net_icmp_intimestamps` | Number of ICMP timestamp request messages received | +| `net_icmpmsg_intype3` | Number of ICMP type 3 (destination unreachable) messages received | +| `net_icmpmsg_intype8` | Number of ICMP type 8 (echo request) messages received | +| `net_icmpmsg_outtype0` | Number of ICMP type 0 (echo reply) messages sent | +| `net_icmpmsg_outtype3` | Number of ICMP type 3 (destination unreachable) messages sent | +| `net_icmp_outaddrmaskreps` | Number of ICMP address mask reply messages sent | +| `net_icmp_outaddrmasks` | Number of ICMP address mask request messages sent | +| `net_icmp_outdestunreachs` | Number of ICMP destination unreachable messages sent | +| `net_icmp_outechoreps` | Number of ICMP echo reply messages sent | +| `net_icmp_outechos` | Number of ICMP echo request messages sent | +| `net_icmp_outerrors` | Number of ICMP messages sent with errors | +| `net_icmp_outmsgs` | Total number of ICMP messages sent | +| `net_icmp_outparmprobs` | Number of ICMP parameter problem messages sent | +| `net_icmp_outredirects` | Number of ICMP redirect messages sent | +| `net_icmp_outsrcquenchs` | Number of ICMP source quench messages sent | +| `net_icmp_outtimeexcds` | Number of ICMP time exceeded messages sent | +| `net_icmp_outtimestampreps` | Number of ICMP timestamp reply messages sent | +| `net_icmp_outtimestamps` | Number of ICMP timestamp request messages sent | +| `net_icmp_outratelimitglobal`| Number of globally rate-limited ICMP messages sent | +| `net_icmp_outratelimithost` | Number of ICMP messages rate-limited per host | +| `net_ip_defaultttl` | Default time-to-live for IP packets | +| `net_ip_forwarding` | Indicates if IP forwarding is enabled | +| `net_ip_forwdatagrams` | Number of forwarded IP datagrams | +| `net_ip_fragcreates` | Number of IP fragments created | +| `net_ip_fragfails` | Number of failed IP fragmentations | +| `net_ip_fragoks` | Number of successful IP fragmentations | +| `net_ip_inaddrerrors` | Number of incoming IP packets with address errors | +| `net_ip_indelivers` | Number of incoming IP packets delivered to higher layers | +| `net_ip_indiscards` | Number of incoming IP packets discarded | +| `net_ip_inhdrerrors` | Number of incoming IP packets with header errors | +| `net_ip_inreceives` | Total number of incoming IP packets received | +| `net_ip_inunknownprotos` | Number of incoming IP packets with unknown protocols | +| `net_ip_outdiscards` | Number of outgoing IP packets discarded | +| `net_ip_outnoroutes` | Number of outgoing IP packets with no route available | +| `net_ip_outrequests` | Total number of outgoing IP packets requested to be sent | +| `net_ip_outtransmits` | Number of IP packets transmitted successfully | +| `net_ip_reasmfails` | Number of failed IP reassembly attempts | +| `net_ip_reasmoks` | Number of successful IP reassembly attempts | +| `net_ip_reasmreqds` | Number of IP fragments received needing reassembly | +| `net_ip_reasmtimeout` | Number of IP reassembly timeouts | +| `net_packets_recv` | Total number of packets received on the network interfaces | +| `net_packets_sent` | Total number of packets sent on the network interfaces | +| `netstat_tcp_close` | Number of TCP connections in the CLOSE state | +| `netstat_tcp_close_wait` | Number of TCP connections in the CLOSE_WAIT state | +| `netstat_tcp_closing` | Number of TCP connections in the CLOSING state | +| `netstat_tcp_established` | Number of TCP connections in the ESTABLISHED state | +| `netstat_tcp_fin_wait1` | Number of TCP connections in the FIN_WAIT_1 state | +| `netstat_tcp_fin_wait2` | Number of TCP connections in the FIN_WAIT_2 state | +| `netstat_tcp_last_ack` | Number of TCP connections in the LAST_ACK state | +| `netstat_tcp_listen` | Number of TCP connections in the LISTEN state | +| `netstat_tcp_none` | Number of TCP connections in the NONE state | +| `netstat_tcp_syn_recv` | Number of TCP connections in the SYN_RECV state | +| `netstat_tcp_syn_sent` | Number of TCP connections in the SYN_SENT state | +| `netstat_tcp_time_wait` | Number of TCP connections in the TIME_WAIT state | +| `netstat_udp_socket` | Number of UDP sockets | +| `net_tcp_activeopens` | Number of active TCP open connections | +| `net_tcp_attemptfails` | Number of failed TCP connection attempts | +| `net_tcp_currestab` | Number of currently established TCP connections | +| `net_tcp_estabresets` | Number of established TCP connections reset | +| `net_tcp_incsumerrors` | Number of TCP checksum errors in incoming packets | +| `net_tcp_inerrs` | Number of incoming TCP packets with errors | +| `net_tcp_insegs` | Number of TCP segments received | +| `net_tcp_maxconn` | Maximum number of TCP connections supported | +| `net_tcp_outrsts` | Number of TCP reset packets sent | +| `net_tcp_outsegs` | Number of TCP segments sent | +| `net_tcp_passiveopens` | Number of passive TCP open connections | +| `net_tcp_retranssegs` | Number of TCP segments retransmitted | +| `net_tcp_rtoalgorithm` | TCP retransmission timeout algorithm | +| `net_tcp_rtomax` | Maximum TCP retransmission timeout | +| `net_tcp_rtomin` | Minimum TCP retransmission timeout | +| `net_udp_ignoredmulti` | Number of UDP multicast packets ignored | +| `net_udp_incsumerrors` | Number of UDP checksum errors in incoming packets | +| `net_udp_indatagrams` | Number of UDP datagrams received | +| `net_udp_inerrors` | Number of incoming UDP packets with errors | +| `net_udp_memerrors` | Number of UDP packets dropped due to memory errors | +| `net_udplite_ignoredmulti` | Number of UDP-Lite multicast packets ignored | +| `net_udplite_incsumerrors` | Number of UDP-Lite checksum errors in incoming packets | +| `net_udplite_indatagrams` | Number of UDP-Lite datagrams received | +| `net_udplite_inerrors` | Number of incoming UDP-Lite packets with errors | +| `net_udplite_memerrors` | Number of UDP-L + +### Kernel + +The metrics listed below, such as `kernel_boot_time` and `kernel_context_switches`, provide +insights into the operations of your system's kernel. + +| Metric | Description | +|----------------------------|--------------------------------------------------| +| `kernel_boot_time` | Time at which the system was last booted | +| `kernel_context_switches` | Number of context switches that have occurred in the kernel | +| `kernel_entropy_avail` | Amount of available entropy in the kernel's entropy pool | +| `kernel_interrupts` | Number of interrupts that have occurred | +| `kernel_processes_forked` | Number of processes that have been forked | + +### Process + +Metrics such as `processes_running` and `processes_zombies` provide insights into the +management of the system's processes. + +| Metric | Description | +|----------------------------|-----------------------------------------------------------------------------| +| `processes_blocked` | Number of processes that are blocked | +| `processes_dead` | Number of processes that have terminated | +| `processes_idle` | Number of processes that are idle | +| `processes_paging` | Number of processes that are paging | +| `processes_running` | Number of processes currently running | +| `processes_sleeping` | Number of processes that are sleeping | +| `processes_stopped` | Number of processes that are stopped | +| `processes_total` | Total number of processes | +| `processes_total_threads` | Total number of threads across all processes | +| `processes_unknown` | Number of processes in an unknown state | +| `processes_zombies` | Number of zombie processes (terminated but not reaped by parent process) + +### Swap usage + +Metrics such as `swap_free` and `swap_used` provide insights into the usage of the +system's swap memory. + +| Metric | Description | +|--------------------|--------------------------------------------------| +| `swap_free` | Amount of free swap memory | +| `swap_in` | Amount of data swapped in from disk | +| `swap_out` | Amount of data swapped out to disk | +| `swap_total` | Total amount of swap memory | +| `swap_used` | Amount of used swap memory | +| `swap_used_percent`| Percentage of swap memory used |