Skip to content

Latest commit

 

History

History
161 lines (117 loc) · 11.7 KB

cgroups.md

File metadata and controls

161 lines (117 loc) · 11.7 KB

cgroups

Monitor Type: cgroups (Source)

Accepts Endpoints: No

Multiple Instances Allowed: Yes

Overview

Reports statistics about cgroups on Linux. This only supports cgroups v1 and not the newer v2 unified implementation.

For general information on cgroups, see http://man7.org/linux/man-pages/man7/cgroups.7.html.

For detailed information on cpu cgroup metrics, see Red Hat's guide to CPU management. Many of the metric descriptions come from that document. Note that the cpuacct cgroup is primarily an informational cgroup that gives detailed information on how long processes in a cgroup used the CPU.

For detailed information on memory cgroup metrics, see Red Hat's guide to the Memory cgroup. Many of the metric description come from that document. Also refer to the Linux Kernel's memory cgroup document.

Filtering

You can limit the cgroups for which metrics are generated with the cgroups config option to the monitor.

For example, the following will only monitor docker generated cgroups:

monitors:
 - type: cgroups
   cgroups:
    - "/docker/*"

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: cgroups
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
cgroups no list of strings The cgroup names to include/exclude, based on the full hierarchy path. This is an overridable set. If not provided, this defaults to all cgroups. E.g. to monitor all Docker container cgroups, you could use a value of ["/docker/*"].

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

Group cpu

All of the following metrics are part of the cpu metric group. All of the non-default metrics below can be turned on by adding cpu to the monitor config option extraGroups:

  • cgroup.cpu_cfs_period_us (gauge)
    The period of time in microseconds for how regularly a cgroup's access to CPU resources should be reallocated

  • cgroup.cpu_cfs_quota_us (gauge)
    The total amount of time in microseconds for which all tasks in a cgroup can run during one period. The period is in the metric cgroup.cpu_cfs_period_us.

  • cgroup.cpu_shares (gauge)
    The relative share of CPU that this cgroup gets. This number is divided into the sum total of all cpu share values to determine the share any individual cgroup is entitled to.

  • cgroup.cpu_stat_nr_periods (cumulative)
    Number of period intervals that have elapsed (the period length is in the metric cgroup.cpu_cfs_period_us)

  • cgroup.cpu_stat_nr_throttled (cumulative)
    Number of times tasks in a cgroup have been throttled

  • cgroup.cpu_stat_throttled_time (cumulative)
    The total time in nanoseconds for which tasks in a cgroup have been throttled

Group cpuacct

All of the following metrics are part of the cpuacct metric group. All of the non-default metrics below can be turned on by adding cpuacct to the monitor config option extraGroups:

  • cgroup.cpuacct_usage_ns (cumulative)
    Total time in nanoseconds spent using any CPU by tasks in this cgroup
  • cgroup.cpuacct_usage_system_ns (cumulative)
    Total time in nanoseconds spent in system (kernel) mode on any CPU by tasks in this cgroup
  • cgroup.cpuacct_usage_user_ns (cumulative)
    Total time in nanoseconds spent in user mode on any CPU by tasks in this cgroup

Group cpuacct-per-cpu

All of the following metrics are part of the cpuacct-per-cpu metric group. All of the non-default metrics below can be turned on by adding cpuacct-per-cpu to the monitor config option extraGroups:

  • cgroup.cpuacct_usage_ns_per_cpu (cumulative)
    Total time in nanoseconds spent using a specific CPU (core) by tasks in this cgroup. This metric will have the cpu dimension that specifies the specific cpu/core.

  • cgroup.cpuacct_usage_system_ns_per_cpu (cumulative)
    Total time in nanoseconds spent in system (kernel) mode on a specific CPU (core) by tasks in this cgroup. This metric will have the cpu dimension that specifies the specific cpu/core.

  • cgroup.cpuacct_usage_user_ns_per_cpu (cumulative)
    Total time in nanoseconds spent in user mode on a specific CPU (core) by tasks in this cgroup. This metric will have the cpu dimension that specifies the specific cpu/core.

Group memory

All of the following metrics are part of the memory metric group. All of the non-default metrics below can be turned on by adding memory to the monitor config option extraGroups:

  • cgroup.memory_failcnt (cumulative)
    The number of times that the memory limit has reached the limit_in_bytes (reported in metric cgroup.memory_limit_in_bytes).

  • cgroup.memory_limit_in_bytes (gauge)
    The maximum amount of user memory (including file cache). A value of 9223372036854771712 (the max 64-bit int aligned to the nearest memory page) indicates no limit and is the default.

  • cgroup.memory_max_usage_in_bytes (gauge)
    The maximum memory used by processes in the cgroup (in bytes)

  • cgroup.memory_stat_active_anon (gauge)
    Bytes of anonymous and swap cache memory on active LRU list

  • cgroup.memory_stat_active_file (gauge)
    Bytes of file-backed memory on active LRU list

  • cgroup.memory_stat_cache (gauge)
    Page cache, including tmpfs (shmem), in bytes

  • cgroup.memory_stat_dirty (gauge)
    Bytes that are waiting to get written back to the disk

  • cgroup.memory_stat_hierarchical_memory_limit (gauge)
    Bytes of memory limit with regard to hierarchy under which the memory cgroup is

  • cgroup.memory_stat_inactive_anon (gauge)
    Bytes of anonymous and swap cache memory on inactive LRU list

  • cgroup.memory_stat_inactive_file (gauge)
    Bytes of file-backed memory on inactive LRU list

  • cgroup.memory_stat_mapped_file (gauge)
    Bytes of mapped file (includes tmpfs/shmem)

  • cgroup.memory_stat_pgfault (cumulative)
    Total number of page faults incurred

  • cgroup.memory_stat_pgmajfault (cumulative)
    Number of major page faults incurred

  • cgroup.memory_stat_pgpgin (cumulative)
    Number of charging events to the memory cgroup. The charging event happens each time a page is accounted as either mapped anon page(RSS) or cache page(Page Cache) to the cgroup.

  • cgroup.memory_stat_pgpgout (cumulative)
    Number of uncharging events to the memory cgroup. The uncharging event happens each time a page is unaccounted from the cgroup.

  • cgroup.memory_stat_rss (gauge)
    Anonymous and swap cache, not including tmpfs (shmem), in bytes

  • cgroup.memory_stat_rss_huge (gauge)
    Bytes of anonymous transparent hugepages

  • cgroup.memory_stat_shmem (gauge)
    Bytes of shared memory

  • cgroup.memory_stat_total_active_anon (gauge)
    The equivalent of cgroup.memory_stat_active_anon that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_active_file (gauge)
    The equivalent of cgroup.memory_stat_active_file that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_cache (gauge)
    The equivalent of cgroup.memory_stat_cache that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_dirty (gauge)
    The equivalent of cgroup.memory_stat_dirty that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_inactive_anon (gauge)
    The equivalent of cgroup.memory_stat_inactive_anon that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_inactive_file (gauge)
    The equivalent of cgroup.memory_stat_inactive_file that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_mapped_file (gauge)
    The equivalent of cgroup.memory_stat_mapped_file that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_pgfault (cumulative)
    The equivalent of cgroup.memory_stat_pgfault that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_pgmajfault (cumulative)
    The equivalent of cgroup.memory_stat_pgmajfault that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_pgpgin (cumulative)
    The equivalent of cgroup.memory_stat_pgpgin that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_pgpgout (cumulative)
    The equivalent of cgroup.memory_stat_pgpgout that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_rss (gauge)
    The equivalent of cgroup.memory_stat_rss that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_rss_huge (gauge)
    The equivalent of cgroup.memory_stat_rss_huge that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_shmem (gauge)
    The equivalent of cgroup.memory_stat_shmem that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_unevictable (gauge)
    The equivalent of cgroup.memory_stat_unevictable that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_total_writeback (gauge)
    The equivalent of cgroup.memory_stat_writeback that also includes the sum total of that metric for all descendant cgroups

  • cgroup.memory_stat_unevictable (gauge)
    Bytes of memory that cannot be reclaimed (mlocked, etc).

  • cgroup.memory_stat_writeback (gauge)
    Bytes of file/anon cache that are queued for syncing to disk

Non-default metrics (version 4.7.0+)

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Dimensions

The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.

Name Description
cgroup The name of the cgroup being described. The name of a cgroup is the full relative path of the cgroup based on the cgroup controller's root directory.
cpu For metrics that end with _per_cpu, this dimension will indicate which cpu the time series refers to.