Monitor Type: collectd/hadoop
(Source)
Accepts Endpoints: Yes
Multiple Instances Allowed: Yes
Collects metrics about a Hadoop 2.0+ cluster using the collectd Hadoop Python plugin. If a remote JMX port is exposed in the hadoop cluster, then you may also configure the collectd/hadoopjmx monitor to collect additional metrics about the hadoop cluster.
The collectd/hadoop
monitor will collect metrics from the Resource Manager
REST API for the following:
- Cluster Metrics
- Cluster Scheduler
- Cluster Applications
- Cluster Nodes
- MapReduce Jobs
See the following links for more information about specific metric endpoints:
https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/Metrics.html
https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html
Sample YAML configuration:
monitors:
- type: collectd/hadoop
host: 127.0.0.1
port: 8088
To activate this monitor in the Smart Agent, add the following to your agent config:
monitors: # All monitor config goes under this key
- type: collectd/hadoop
... # Additional config
For a list of monitor options that are common to all monitors, see Common Configuration.
Config option | Required | Type | Description |
---|---|---|---|
pythonBinary |
no | string |
Path to a python binary that should be used to execute the Python code. If not set, a built-in runtime will be used. Can include arguments to the binary as well. |
host |
yes | string |
Resource Manager Hostname |
port |
yes | integer |
Resource Manager Port |
verbose |
no | bool |
Log verbose information about the plugin (default: false ) |
These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.
counter.hadoop.cluster.metrics.total_mb
(cumulative)counter.hadoop.cluster.metrics.total_nodes
(cumulative)counter.hadoop.cluster.metrics.total_virtual_cores
(cumulative)gauge.hadoop.cluster.metrics.active_nodes
(gauge)gauge.hadoop.cluster.metrics.allocated_mb
(gauge)gauge.hadoop.cluster.metrics.allocated_virtual_cores
(gauge)gauge.hadoop.cluster.metrics.apps_completed
(gauge)gauge.hadoop.cluster.metrics.apps_failed
(gauge)gauge.hadoop.cluster.metrics.apps_killed
(gauge)gauge.hadoop.cluster.metrics.apps_pending
(gauge)gauge.hadoop.cluster.metrics.apps_running
(gauge)gauge.hadoop.cluster.metrics.apps_submitted
(gauge)gauge.hadoop.cluster.metrics.available_mb
(gauge)gauge.hadoop.cluster.metrics.available_virtual_cores
(gauge)gauge.hadoop.cluster.metrics.containers_allocated
(gauge)gauge.hadoop.cluster.metrics.containers_pending
(gauge)gauge.hadoop.cluster.metrics.containers_reserved
(gauge)gauge.hadoop.cluster.metrics.decommissioned_nodes
(gauge)gauge.hadoop.cluster.metrics.lost_nodes
(gauge)gauge.hadoop.cluster.metrics.rebooted_nodes
(gauge)gauge.hadoop.cluster.metrics.reserved_mb
(gauge)gauge.hadoop.cluster.metrics.reserved_virtual_cores
(gauge)gauge.hadoop.cluster.metrics.total_mb
(gauge)gauge.hadoop.cluster.metrics.total_virtual_cores
(gauge)gauge.hadoop.cluster.metrics.unhealthy_nodes
(gauge)gauge.hadoop.mapreduce.job.elapsedTime
(gauge)gauge.hadoop.mapreduce.job.failedMapAttempts
(gauge)gauge.hadoop.mapreduce.job.failedReduceAttempts
(gauge)gauge.hadoop.mapreduce.job.mapsTotal
(gauge)gauge.hadoop.mapreduce.job.successfulMapAttempts
(gauge)gauge.hadoop.mapreduce.job.successfulReduceAttempts
(gauge)gauge.hadoop.resource.manager.apps.allocatedMB
(gauge)gauge.hadoop.resource.manager.apps.allocatedVCores
(gauge)gauge.hadoop.resource.manager.apps.clusterUsagePercentage
(gauge)gauge.hadoop.resource.manager.apps.memorySeconds
(gauge)gauge.hadoop.resource.manager.apps.priority
(gauge)gauge.hadoop.resource.manager.apps.progress
(gauge)gauge.hadoop.resource.manager.apps.queueUsagePercentage
(gauge)gauge.hadoop.resource.manager.apps.runningContainers
(gauge)gauge.hadoop.resource.manager.apps.vcoreSeconds
(gauge)gauge.hadoop.resource.manager.nodes.availMemoryMB
(gauge)gauge.hadoop.resource.manager.nodes.availableVirtualCores
(gauge)gauge.hadoop.resource.manager.nodes.numContainers
(gauge)gauge.hadoop.resource.manager.nodes.usedMemoryMB
(gauge)gauge.hadoop.resource.manager.nodes.usedVirtualCores
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.absoluteCapacity
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.absoluteMaxCapacity
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.absoluteUsedCapacity
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.allocatedContainers
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.capacity
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.maxApplications
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.maxApplicationsPerUser
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.maxCapacity
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.numActiveApplications
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.numApplications
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.numContainers
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.numPendingApplications
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.pendingContainers
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.reservedContainers
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.usedCapacity
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.userLimit
(gauge)gauge.hadoop.resource.manager.scheduler.leaf.queue.userLimitFactor
(gauge)gauge.hadoop.resource.manager.scheduler.root.queue.capacity
(gauge)gauge.hadoop.resource.manager.scheduler.root.queue.maxCapacity
(gauge)gauge.hadoop.resource.manager.scheduler.root.queue.usedCapacity
(gauge)
All of the following metrics are part of the applications
metric group. All of
the non-default metrics below can be turned on by adding applications
to the
monitor config option extraGroups
:
hadoop.resource.manager.apps.allocatedMB
(gauge)hadoop.resource.manager.apps.allocatedVCores
(gauge)hadoop.resource.manager.apps.clusterUsagePercentage
(gauge)hadoop.resource.manager.apps.memorySeconds
(gauge)hadoop.resource.manager.apps.numAMContainerPreempted
(gauge)hadoop.resource.manager.apps.numNonAMContainerPreempted
(gauge)hadoop.resource.manager.apps.preemptedResourceMB
(gauge)hadoop.resource.manager.apps.preemptedResourceVCores
(gauge)hadoop.resource.manager.apps.priority
(gauge)hadoop.resource.manager.apps.progress
(gauge)hadoop.resource.manager.apps.queueUsagePercentage
(gauge)hadoop.resource.manager.apps.runningContainers
(gauge)hadoop.resource.manager.apps.vcoreSeconds
(gauge)
All of the following metrics are part of the cluster
metric group. All of
the non-default metrics below can be turned on by adding cluster
to the
monitor config option extraGroups
:
hadoop.cluster.metrics.active_nodes
(gauge)hadoop.cluster.metrics.allocated_mb
(gauge)hadoop.cluster.metrics.allocated_virtual_cores
(gauge)hadoop.cluster.metrics.apps_completed
(gauge)hadoop.cluster.metrics.apps_failed
(gauge)hadoop.cluster.metrics.apps_killed
(gauge)hadoop.cluster.metrics.apps_pending
(gauge)hadoop.cluster.metrics.apps_running
(gauge)hadoop.cluster.metrics.apps_submitted
(gauge)hadoop.cluster.metrics.available_mb
(gauge)hadoop.cluster.metrics.available_virtual_cores
(gauge)hadoop.cluster.metrics.containers_allocated
(gauge)hadoop.cluster.metrics.containers_pending
(gauge)hadoop.cluster.metrics.containers_reserved
(gauge)hadoop.cluster.metrics.decommissioned_nodes
(gauge)hadoop.cluster.metrics.lost_nodes
(gauge)hadoop.cluster.metrics.rebooted_nodes
(gauge)hadoop.cluster.metrics.reserved_mb
(gauge)hadoop.cluster.metrics.reserved_virtual_cores
(gauge)hadoop.cluster.metrics.total_mb
(counter)hadoop.cluster.metrics.total_nodes
(counter)hadoop.cluster.metrics.total_virtual_cores
(counter)hadoop.cluster.metrics.unhealthy_nodes
(gauge)
All of the following metrics are part of the fifo-scheduler
metric group. All of
the non-default metrics below can be turned on by adding fifo-scheduler
to the
monitor config option extraGroups
:
hadoop.resource.manager.scheduler.fifo.availNodeCapacity
(gauge)hadoop.resource.manager.scheduler.fifo.capacity
(gauge)hadoop.resource.manager.scheduler.fifo.maxQueueMemoryCapacity
(gauge)hadoop.resource.manager.scheduler.fifo.minQueueMemoryCapacity
(gauge)hadoop.resource.manager.scheduler.fifo.numContainers
(gauge)hadoop.resource.manager.scheduler.fifo.numNodes
(gauge)hadoop.resource.manager.scheduler.fifo.totalNodeCapacity
(gauge)hadoop.resource.manager.scheduler.fifo.usedCapacity
(gauge)hadoop.resource.manager.scheduler.fifo.usedNodeCapacity
(gauge)
All of the following metrics are part of the leaf-queue
metric group. All of
the non-default metrics below can be turned on by adding leaf-queue
to the
monitor config option extraGroups
:
hadoop.resource.manager.scheduler.leaf.queue.absoluteCapacity
(gauge)hadoop.resource.manager.scheduler.leaf.queue.absoluteMaxCapacity
(gauge)hadoop.resource.manager.scheduler.leaf.queue.absoluteUsedCapacity
(gauge)hadoop.resource.manager.scheduler.leaf.queue.allocatedContainers
(gauge)hadoop.resource.manager.scheduler.leaf.queue.capacity
(gauge)hadoop.resource.manager.scheduler.leaf.queue.maxActiveApplications
(gauge)hadoop.resource.manager.scheduler.leaf.queue.maxActiveApplicationsPerUser
(gauge)hadoop.resource.manager.scheduler.leaf.queue.maxApplications
(gauge)hadoop.resource.manager.scheduler.leaf.queue.maxApplicationsPerUser
(gauge)hadoop.resource.manager.scheduler.leaf.queue.maxCapacity
(gauge)hadoop.resource.manager.scheduler.leaf.queue.numActiveApplications
(gauge)hadoop.resource.manager.scheduler.leaf.queue.numApplications
(gauge)hadoop.resource.manager.scheduler.leaf.queue.numContainers
(gauge)hadoop.resource.manager.scheduler.leaf.queue.numPendingApplications
(gauge)hadoop.resource.manager.scheduler.leaf.queue.pendingContainers
(gauge)hadoop.resource.manager.scheduler.leaf.queue.reservedContainers
(gauge)hadoop.resource.manager.scheduler.leaf.queue.usedCapacity
(gauge)hadoop.resource.manager.scheduler.leaf.queue.userLimit
(gauge)hadoop.resource.manager.scheduler.leaf.queue.userLimitFactor
(gauge)
All of the following metrics are part of the mapreduce-jobs
metric group. All of
the non-default metrics below can be turned on by adding mapreduce-jobs
to the
monitor config option extraGroups
:
hadoop.mapreduce.job.elapsedTime
(gauge)hadoop.mapreduce.job.failedMapAttempts
(gauge)hadoop.mapreduce.job.failedReduceAttempts
(gauge)hadoop.mapreduce.job.killedMapAttempts
(gauge)hadoop.mapreduce.job.killedReduceAttempts
(gauge)hadoop.mapreduce.job.mapsCompleted
(gauge)hadoop.mapreduce.job.mapsPending
(gauge)hadoop.mapreduce.job.mapsRunning
(gauge)hadoop.mapreduce.job.mapsTotal
(gauge)hadoop.mapreduce.job.newMapAttempts
(gauge)hadoop.mapreduce.job.newReduceAttempts
(gauge)hadoop.mapreduce.job.reducesCompleted
(gauge)hadoop.mapreduce.job.reducesPending
(gauge)hadoop.mapreduce.job.reducesTotal
(gauge)hadoop.mapreduce.job.runningMapAttempts
(gauge)hadoop.mapreduce.job.runningReduceAttempts
(gauge)hadoop.mapreduce.job.successfulMapAttempts
(gauge)hadoop.mapreduce.job.successfulReduceAttempts
(gauge)
All of the following metrics are part of the node-resources
metric group. All of
the non-default metrics below can be turned on by adding node-resources
to the
monitor config option extraGroups
:
hadoop.resource.manager.node.nodeCPUUsage
(gauge)hadoop.resource.manager.node.nodePhysicalMemoryMB
(gauge)hadoop.resource.manager.node.nodeVirtualMemoryMB
(gauge)
All of the following metrics are part of the nodes
metric group. All of
the non-default metrics below can be turned on by adding nodes
to the
monitor config option extraGroups
:
hadoop.resource.manager.nodes.availMemoryMB
(gauge)hadoop.resource.manager.nodes.availableVirtualCores
(gauge)hadoop.resource.manager.nodes.numContainers
(gauge)hadoop.resource.manager.nodes.usedMemoryMB
(gauge)hadoop.resource.manager.nodes.usedVirtualCores
(gauge)
All of the following metrics are part of the queue-users
metric group. All of
the non-default metrics below can be turned on by adding queue-users
to the
monitor config option extraGroups
:
hadoop.resource.manager.scheduler.queue.users.numActiveApplications
(gauge)hadoop.resource.manager.scheduler.queue.users.numPendingApplications
(gauge)
All of the following metrics are part of the resource-objects
metric group. All of
the non-default metrics below can be turned on by adding resource-objects
to the
monitor config option extraGroups
:
hadoop.resource.manager.scheduler.queue.resource.memory
(gauge)hadoop.resource.manager.scheduler.queue.resource.vCores
(gauge)
All of the following metrics are part of the root-queue
metric group. All of
the non-default metrics below can be turned on by adding root-queue
to the
monitor config option extraGroups
:
hadoop.resource.manager.scheduler.root.queue.capacity
(gauge)hadoop.resource.manager.scheduler.root.queue.maxCapacity
(gauge)hadoop.resource.manager.scheduler.root.queue.usedCapacity
(gauge)
The following information applies to the agent version 4.7.0+ that has
enableBuiltInFiltering: true
set on the top level of the agent config.
To emit metrics that are not default, you can add those metrics in the
generic monitor-level extraMetrics
config option. Metrics that are derived
from specific configuration options that do not appear in the above list of
metrics do not need to be added to extraMetrics
.
To see a list of metrics that will be emitted you can run agent-status monitors
after configuring this monitor in a running agent instance.
The following information only applies to agent version older than 4.7.0. If
you have a newer agent and have set enableBuiltInFiltering: true
at the top
level of your agent config, see the section above. See upgrade instructions in
Old-style whitelist filtering.
If you have a reference to the whitelist.json
in your agent's top-level
metricsToExclude
config option, and you want to emit metrics that are not in
that whitelist, then you need to add an item to the top-level
metricsToInclude
config option to override that whitelist (see Inclusion
filtering. Or you can just
copy the whitelist.json, modify it, and reference that in metricsToExclude
.