Skip to content

Latest commit

 

History

History
322 lines (276 loc) · 17.5 KB

collectd-hadoop.md

File metadata and controls

322 lines (276 loc) · 17.5 KB

collectd/hadoop

Monitor Type: collectd/hadoop (Source)

Accepts Endpoints: Yes

Multiple Instances Allowed: Yes

Overview

Collects metrics about a Hadoop 2.0+ cluster using the collectd Hadoop Python plugin. If a remote JMX port is exposed in the hadoop cluster, then you may also configure the collectd/hadoopjmx monitor to collect additional metrics about the hadoop cluster.

The collectd/hadoop monitor will collect metrics from the Resource Manager REST API for the following:

  • Cluster Metrics
  • Cluster Scheduler
  • Cluster Applications
  • Cluster Nodes
  • MapReduce Jobs

Metric Endpoints in Hadoop

See the following links for more information about specific metric endpoints:

https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/Metrics.html

https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html

Sample Config

Sample YAML configuration:

monitors:
- type: collectd/hadoop
  host: 127.0.0.1
  port: 8088

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: collectd/hadoop
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
pythonBinary no string Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well.
host yes string Resource Manager Hostname
port yes integer Resource Manager Port
verbose no bool Log verbose information about the plugin (default: false)

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

  • counter.hadoop.cluster.metrics.total_mb (cumulative)
  • counter.hadoop.cluster.metrics.total_nodes (cumulative)
  • counter.hadoop.cluster.metrics.total_virtual_cores (cumulative)
  • gauge.hadoop.cluster.metrics.active_nodes (gauge)
  • gauge.hadoop.cluster.metrics.allocated_mb (gauge)
  • gauge.hadoop.cluster.metrics.allocated_virtual_cores (gauge)
  • gauge.hadoop.cluster.metrics.apps_completed (gauge)
  • gauge.hadoop.cluster.metrics.apps_failed (gauge)
  • gauge.hadoop.cluster.metrics.apps_killed (gauge)
  • gauge.hadoop.cluster.metrics.apps_pending (gauge)
  • gauge.hadoop.cluster.metrics.apps_running (gauge)
  • gauge.hadoop.cluster.metrics.apps_submitted (gauge)
  • gauge.hadoop.cluster.metrics.available_mb (gauge)
  • gauge.hadoop.cluster.metrics.available_virtual_cores (gauge)
  • gauge.hadoop.cluster.metrics.containers_allocated (gauge)
  • gauge.hadoop.cluster.metrics.containers_pending (gauge)
  • gauge.hadoop.cluster.metrics.containers_reserved (gauge)
  • gauge.hadoop.cluster.metrics.decommissioned_nodes (gauge)
  • gauge.hadoop.cluster.metrics.lost_nodes (gauge)
  • gauge.hadoop.cluster.metrics.rebooted_nodes (gauge)
  • gauge.hadoop.cluster.metrics.reserved_mb (gauge)
  • gauge.hadoop.cluster.metrics.reserved_virtual_cores (gauge)
  • gauge.hadoop.cluster.metrics.total_mb (gauge)
  • gauge.hadoop.cluster.metrics.total_virtual_cores (gauge)
  • gauge.hadoop.cluster.metrics.unhealthy_nodes (gauge)
  • gauge.hadoop.mapreduce.job.elapsedTime (gauge)
  • gauge.hadoop.mapreduce.job.failedMapAttempts (gauge)
  • gauge.hadoop.mapreduce.job.failedReduceAttempts (gauge)
  • gauge.hadoop.mapreduce.job.mapsTotal (gauge)
  • gauge.hadoop.mapreduce.job.successfulMapAttempts (gauge)
  • gauge.hadoop.mapreduce.job.successfulReduceAttempts (gauge)
  • gauge.hadoop.resource.manager.apps.allocatedMB (gauge)
  • gauge.hadoop.resource.manager.apps.allocatedVCores (gauge)
  • gauge.hadoop.resource.manager.apps.clusterUsagePercentage (gauge)
  • gauge.hadoop.resource.manager.apps.memorySeconds (gauge)
  • gauge.hadoop.resource.manager.apps.priority (gauge)
  • gauge.hadoop.resource.manager.apps.progress (gauge)
  • gauge.hadoop.resource.manager.apps.queueUsagePercentage (gauge)
  • gauge.hadoop.resource.manager.apps.runningContainers (gauge)
  • gauge.hadoop.resource.manager.apps.vcoreSeconds (gauge)
  • gauge.hadoop.resource.manager.nodes.availMemoryMB (gauge)
  • gauge.hadoop.resource.manager.nodes.availableVirtualCores (gauge)
  • gauge.hadoop.resource.manager.nodes.numContainers (gauge)
  • gauge.hadoop.resource.manager.nodes.usedMemoryMB (gauge)
  • gauge.hadoop.resource.manager.nodes.usedVirtualCores (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.absoluteCapacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.absoluteMaxCapacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.absoluteUsedCapacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.allocatedContainers (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.capacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.maxApplications (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.maxApplicationsPerUser (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.maxCapacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.numActiveApplications (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.numApplications (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.numContainers (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.numPendingApplications (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.pendingContainers (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.reservedContainers (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.usedCapacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.userLimit (gauge)
  • gauge.hadoop.resource.manager.scheduler.leaf.queue.userLimitFactor (gauge)
  • gauge.hadoop.resource.manager.scheduler.root.queue.capacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.root.queue.maxCapacity (gauge)
  • gauge.hadoop.resource.manager.scheduler.root.queue.usedCapacity (gauge)

Group applications

All of the following metrics are part of the applications metric group. All of the non-default metrics below can be turned on by adding applications to the monitor config option extraGroups:

  • hadoop.resource.manager.apps.allocatedMB (gauge)
  • hadoop.resource.manager.apps.allocatedVCores (gauge)
  • hadoop.resource.manager.apps.clusterUsagePercentage (gauge)
  • hadoop.resource.manager.apps.memorySeconds (gauge)
  • hadoop.resource.manager.apps.numAMContainerPreempted (gauge)
  • hadoop.resource.manager.apps.numNonAMContainerPreempted (gauge)
  • hadoop.resource.manager.apps.preemptedResourceMB (gauge)
  • hadoop.resource.manager.apps.preemptedResourceVCores (gauge)
  • hadoop.resource.manager.apps.priority (gauge)
  • hadoop.resource.manager.apps.progress (gauge)
  • hadoop.resource.manager.apps.queueUsagePercentage (gauge)
  • hadoop.resource.manager.apps.runningContainers (gauge)
  • hadoop.resource.manager.apps.vcoreSeconds (gauge)

Group cluster

All of the following metrics are part of the cluster metric group. All of the non-default metrics below can be turned on by adding cluster to the monitor config option extraGroups:

  • hadoop.cluster.metrics.active_nodes (gauge)
  • hadoop.cluster.metrics.allocated_mb (gauge)
  • hadoop.cluster.metrics.allocated_virtual_cores (gauge)
  • hadoop.cluster.metrics.apps_completed (gauge)
  • hadoop.cluster.metrics.apps_failed (gauge)
  • hadoop.cluster.metrics.apps_killed (gauge)
  • hadoop.cluster.metrics.apps_pending (gauge)
  • hadoop.cluster.metrics.apps_running (gauge)
  • hadoop.cluster.metrics.apps_submitted (gauge)
  • hadoop.cluster.metrics.available_mb (gauge)
  • hadoop.cluster.metrics.available_virtual_cores (gauge)
  • hadoop.cluster.metrics.containers_allocated (gauge)
  • hadoop.cluster.metrics.containers_pending (gauge)
  • hadoop.cluster.metrics.containers_reserved (gauge)
  • hadoop.cluster.metrics.decommissioned_nodes (gauge)
  • hadoop.cluster.metrics.lost_nodes (gauge)
  • hadoop.cluster.metrics.rebooted_nodes (gauge)
  • hadoop.cluster.metrics.reserved_mb (gauge)
  • hadoop.cluster.metrics.reserved_virtual_cores (gauge)
  • hadoop.cluster.metrics.total_mb (counter)
  • hadoop.cluster.metrics.total_nodes (counter)
  • hadoop.cluster.metrics.total_virtual_cores (counter)
  • hadoop.cluster.metrics.unhealthy_nodes (gauge)

Group fifo-scheduler

All of the following metrics are part of the fifo-scheduler metric group. All of the non-default metrics below can be turned on by adding fifo-scheduler to the monitor config option extraGroups:

  • hadoop.resource.manager.scheduler.fifo.availNodeCapacity (gauge)
  • hadoop.resource.manager.scheduler.fifo.capacity (gauge)
  • hadoop.resource.manager.scheduler.fifo.maxQueueMemoryCapacity (gauge)
  • hadoop.resource.manager.scheduler.fifo.minQueueMemoryCapacity (gauge)
  • hadoop.resource.manager.scheduler.fifo.numContainers (gauge)
  • hadoop.resource.manager.scheduler.fifo.numNodes (gauge)
  • hadoop.resource.manager.scheduler.fifo.totalNodeCapacity (gauge)
  • hadoop.resource.manager.scheduler.fifo.usedCapacity (gauge)
  • hadoop.resource.manager.scheduler.fifo.usedNodeCapacity (gauge)

Group leaf-queue

All of the following metrics are part of the leaf-queue metric group. All of the non-default metrics below can be turned on by adding leaf-queue to the monitor config option extraGroups:

  • hadoop.resource.manager.scheduler.leaf.queue.absoluteCapacity (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.absoluteMaxCapacity (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.absoluteUsedCapacity (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.allocatedContainers (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.capacity (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.maxActiveApplications (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.maxActiveApplicationsPerUser (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.maxApplications (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.maxApplicationsPerUser (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.maxCapacity (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.numActiveApplications (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.numApplications (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.numContainers (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.numPendingApplications (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.pendingContainers (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.reservedContainers (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.usedCapacity (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.userLimit (gauge)
  • hadoop.resource.manager.scheduler.leaf.queue.userLimitFactor (gauge)

Group mapreduce-jobs

All of the following metrics are part of the mapreduce-jobs metric group. All of the non-default metrics below can be turned on by adding mapreduce-jobs to the monitor config option extraGroups:

  • hadoop.mapreduce.job.elapsedTime (gauge)
  • hadoop.mapreduce.job.failedMapAttempts (gauge)
  • hadoop.mapreduce.job.failedReduceAttempts (gauge)
  • hadoop.mapreduce.job.killedMapAttempts (gauge)
  • hadoop.mapreduce.job.killedReduceAttempts (gauge)
  • hadoop.mapreduce.job.mapsCompleted (gauge)
  • hadoop.mapreduce.job.mapsPending (gauge)
  • hadoop.mapreduce.job.mapsRunning (gauge)
  • hadoop.mapreduce.job.mapsTotal (gauge)
  • hadoop.mapreduce.job.newMapAttempts (gauge)
  • hadoop.mapreduce.job.newReduceAttempts (gauge)
  • hadoop.mapreduce.job.reducesCompleted (gauge)
  • hadoop.mapreduce.job.reducesPending (gauge)
  • hadoop.mapreduce.job.reducesTotal (gauge)
  • hadoop.mapreduce.job.runningMapAttempts (gauge)
  • hadoop.mapreduce.job.runningReduceAttempts (gauge)
  • hadoop.mapreduce.job.successfulMapAttempts (gauge)
  • hadoop.mapreduce.job.successfulReduceAttempts (gauge)

Group node-resources

All of the following metrics are part of the node-resources metric group. All of the non-default metrics below can be turned on by adding node-resources to the monitor config option extraGroups:

  • hadoop.resource.manager.node.nodeCPUUsage (gauge)
  • hadoop.resource.manager.node.nodePhysicalMemoryMB (gauge)
  • hadoop.resource.manager.node.nodeVirtualMemoryMB (gauge)

Group nodes

All of the following metrics are part of the nodes metric group. All of the non-default metrics below can be turned on by adding nodes to the monitor config option extraGroups:

  • hadoop.resource.manager.nodes.availMemoryMB (gauge)
  • hadoop.resource.manager.nodes.availableVirtualCores (gauge)
  • hadoop.resource.manager.nodes.numContainers (gauge)
  • hadoop.resource.manager.nodes.usedMemoryMB (gauge)
  • hadoop.resource.manager.nodes.usedVirtualCores (gauge)

Group queue-users

All of the following metrics are part of the queue-users metric group. All of the non-default metrics below can be turned on by adding queue-users to the monitor config option extraGroups:

  • hadoop.resource.manager.scheduler.queue.users.numActiveApplications (gauge)
  • hadoop.resource.manager.scheduler.queue.users.numPendingApplications (gauge)

Group resource-objects

All of the following metrics are part of the resource-objects metric group. All of the non-default metrics below can be turned on by adding resource-objects to the monitor config option extraGroups:

  • hadoop.resource.manager.scheduler.queue.resource.memory (gauge)
  • hadoop.resource.manager.scheduler.queue.resource.vCores (gauge)

Group root-queue

All of the following metrics are part of the root-queue metric group. All of the non-default metrics below can be turned on by adding root-queue to the monitor config option extraGroups:

  • hadoop.resource.manager.scheduler.root.queue.capacity (gauge)
  • hadoop.resource.manager.scheduler.root.queue.maxCapacity (gauge)
  • hadoop.resource.manager.scheduler.root.queue.usedCapacity (gauge)

Non-default metrics (version 4.7.0+)

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0)

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style whitelist filtering.

If you have a reference to the whitelist.json in your agent's top-level metricsToExclude config option, and you want to emit metrics that are not in that whitelist, then you need to add an item to the top-level metricsToInclude config option to override that whitelist (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.