Skip to content

Latest commit

 

History

History
182 lines (142 loc) · 14.2 KB

openshift-cluster.md

File metadata and controls

182 lines (142 loc) · 14.2 KB

openshift-cluster

Monitor Type: openshift-cluster (Source)

Accepts Endpoints: No

Multiple Instances Allowed: Yes

Overview

This monitor is for use with an OpenShift cluster. It includes all metrics from the kubernetes-cluster monitor with additional OpenShift-specific metrics. You only need to use one monitor or the other.

Collects cluster-level metrics from the Kubernetes API server. It uses the watch functionality of the K8s API to listen for updates about the cluster and maintains a cache of metrics that get sent on a regular interval.

Since the agent is generally running in multiple places in a K8s cluster and since it is generally more convenient to share the same configuration across all agent instances, this monitor by default makes use of a leader election process to ensure that it is the only agent sending metrics in a cluster. All of the agents running in the same namespace that have this monitor configured will decide amongst themselves which should send metrics for this monitor, and the rest will stand by ready to activate if the leader agent dies. You can override leader election by setting the config option alwaysClusterReporter to true, which will make the monitor always report metrics.

This monitor is similar to kube-state-metrics, and sends many of the same metrics, but in a way that is less verbose and better fitted for the SignalFx backend.

Configuration

To activate this monitor in the Smart Agent, add the following to your agent config:

monitors:  # All monitor config goes under this key
 - type: openshift-cluster
   ...  # Additional config

For a list of monitor options that are common to all monitors, see Common Configuration.

Config option Required Type Description
alwaysClusterReporter no bool If true, leader election is skipped and metrics are always reported. (default: false)
namespace no string If specified, only resources within the given namespace will be monitored. If omitted (blank) all supported resources across all namespaces will be monitored.
useNodeName no bool If set to true, the Kubernetes node name will be used as the dimension to which to sync properties about each respective node. This is necessary if your cluster's machines do not have unique machine-id values, as can happen when machine images are improperly cloned. (default: false)
kubernetesAPI no object (see below) Config for the K8s API client
nodeConditionTypesToReport no list of strings A list of node status condition types to report as metrics. The metrics will be reported as datapoints of the form kubernetes.node_<type_snake_cased> with a value of 0 corresponding to "False", 1 to "True", and -1 to "Unknown". (default: [Ready])

The nested kubernetesAPI config object has the following fields:

Config option Required Type Description
authType no string How to authenticate to the K8s API server. This can be one of none (for no auth), tls (to use manually specified TLS client certs, not recommended), serviceAccount (to use the standard service account token provided to the agent pod), or kubeConfig to use credentials from ~/.kube/config. (default: serviceAccount)
skipVerify no bool Whether to skip verifying the TLS cert from the API server. Almost never needed. (default: false)
clientCertPath no string The path to the TLS client cert on the pod's filesystem, if using tls auth.
clientKeyPath no string The path to the TLS client key on the pod's filesystem, if using tls auth.
caCertPath no string Path to a CA certificate to use when verifying the API server's TLS cert. Generally this is provided by K8s alongside the service account token, which will be picked up automatically, so this should rarely be necessary to specify.

Metrics

These are the metrics available for this monitor. Metrics that are categorized as container/host (default) are in bold and italics in the list below.

  • kubernetes.container_ready (gauge)
    Whether a container has passed its readiness probe (0 for no, 1 for yes)
  • kubernetes.container_restart_count (gauge)
    How many times the container has restarted in the recent past. This value is pulled directly from the K8s API and the value can go indefinitely high and be reset to 0 at any time depending on how your kubelet is configured to prune dead containers. It is best to not depend too much on the exact value but rather look at it as either == 0, in which case you can conclude there were no restarts in the recent past, or > 0, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.
  • kubernetes.daemon_set.current_scheduled (gauge)
    The number of nodes that are running at least 1 daemon pod and are supposed to run the daemon pod
  • kubernetes.daemon_set.desired_scheduled (gauge)
    The total number of nodes that should be running the daemon pod (including nodes currently running the daemon pod)
  • kubernetes.daemon_set.misscheduled (gauge)
    The number of nodes that are running the daemon pod, but are not supposed to run the daemon pod
  • kubernetes.daemon_set.ready (gauge)
    The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready
  • kubernetes.deployment.available (gauge)
    Total number of available pods (ready for at least minReadySeconds) targeted by this deployment.
  • kubernetes.deployment.desired (gauge)
    Number of desired pods in this deployment
  • kubernetes.namespace_phase (gauge)
    The current phase of namespaces (1 for active and 0 for terminating)
  • kubernetes.node_ready (gauge)
    Whether this node is ready (1), not ready (0) or in an unknown state (-1)
  • kubernetes.pod_phase (gauge)
    Current phase of the pod (1 - Pending, 2 - Running, 3 - Succeeded, 4 - Failed, 5 - Unknown)
  • kubernetes.replica_set.available (gauge)
    Total number of available pods (ready for at least minReadySeconds) targeted by this replica set
  • kubernetes.replica_set.desired (gauge)
    Number of desired pods in this replica set
  • kubernetes.replication_controller.available (gauge)
    Total number of available pods (ready for at least minReadySeconds) targeted by this replication controller.
  • kubernetes.replication_controller.desired (gauge)
    Number of desired pods
  • kubernetes.resource_quota_hard (gauge)
    The upper limit for a particular resource in a specific namespace. Will only be sent if a quota is specified. CPU requests/limits will be sent as millicores.
  • kubernetes.resource_quota_used (gauge)
    The usage for a particular resource in a specific namespace. Will only be sent if a quota is specified. CPU requests/limits will be sent as millicores.
  • openshift.appliedclusterquota.cpu.hard (gauge)
    Hard limit for number of cpu by namespace
  • openshift.appliedclusterquota.cpu.used (gauge)
    Consumed number of cpu by namespace
  • openshift.appliedclusterquota.memory.hard (gauge)
    Hard limit for amount of memory by namespace
  • openshift.appliedclusterquota.memory.used (gauge)
    Consumed amount of memory by namespace
  • openshift.appliedclusterquota.persistentvolumeclaims.hard (gauge)
    Hard limit for number of persistentvolumeclaims by namespace
  • openshift.appliedclusterquota.persistentvolumeclaims.used (gauge)
    Consumed number of persistentvolumeclaims by namespace
  • openshift.appliedclusterquota.pods.hard (gauge)
    Hard limit for number of pods by namespace
  • openshift.appliedclusterquota.pods.used (gauge)
    Consumed number of pods by namespace
  • openshift.appliedclusterquota.services.hard (gauge)
    Hard limit for number of services by namespace
  • openshift.appliedclusterquota.services.loadbalancers.hard (gauge)
    Hard limit for number of services.loadbalancers by namespace
  • openshift.appliedclusterquota.services.loadbalancers.used (gauge)
    Consumed number of services.loadbalancers by namespace
  • openshift.appliedclusterquota.services.nodeports.hard (gauge)
    Hard limit for number of services.nodeports by namespace
  • openshift.appliedclusterquota.services.nodeports.used (gauge)
    Consumed number of services.nodeports by namespace
  • openshift.appliedclusterquota.services.used (gauge)
    Consumed number of services by namespace
  • openshift.clusterquota.cpu.hard (gauge)
    Hard limit for number of cpu across all namespaces
  • openshift.clusterquota.cpu.used (gauge)
    Consumed number of cpu across all namespaces
  • openshift.clusterquota.memory.hard (gauge)
    Hard limit for amount of memory across all namespaces
  • openshift.clusterquota.memory.used (gauge)
    Consumed amount of memory across all namespaces
  • openshift.clusterquota.persistentvolumeclaims.hard (gauge)
    Hard limit for number of persistentvolumeclaims across all namespaces
  • openshift.clusterquota.persistentvolumeclaims.used (gauge)
    Consumed number of persistentvolumeclaims across all namespaces
  • openshift.clusterquota.pods.hard (gauge)
    Hard limit for number of pods across all namespaces
  • openshift.clusterquota.pods.used (gauge)
    Consumed number of pods across all namespaces
  • openshift.clusterquota.services.hard (gauge)
    Hard limit for number of services across all namespaces
  • openshift.clusterquota.services.loadbalancers.hard (gauge)
    Hard limit for number of services.loadbalancers across all namespaces
  • openshift.clusterquota.services.loadbalancers.used (gauge)
    Consumed number of services.loadbalancers across all namespaces
  • openshift.clusterquota.services.nodeports.hard (gauge)
    Hard limit for number of services.nodeports across all namespaces
  • openshift.clusterquota.services.nodeports.used (gauge)
    Consumed number of services.nodeports across all namespaces
  • openshift.clusterquota.services.used (gauge)
    Consumed number of services across all namespaces

Non-default metrics (version 4.7.0+)

The following information applies to the agent version 4.7.0+ that has enableBuiltInFiltering: true set on the top level of the agent config.

To emit metrics that are not default, you can add those metrics in the generic monitor-level extraMetrics config option. Metrics that are derived from specific configuration options that do not appear in the above list of metrics do not need to be added to extraMetrics.

To see a list of metrics that will be emitted you can run agent-status monitors after configuring this monitor in a running agent instance.

Legacy non-default metrics (version < 4.7.0)

The following information only applies to agent version older than 4.7.0. If you have a newer agent and have set enableBuiltInFiltering: true at the top level of your agent config, see the section above. See upgrade instructions in Old-style whitelist filtering.

If you have a reference to the whitelist.json in your agent's top-level metricsToExclude config option, and you want to emit metrics that are not in that whitelist, then you need to add an item to the top-level metricsToInclude config option to override that whitelist (see Inclusion filtering. Or you can just copy the whitelist.json, modify it, and reference that in metricsToExclude.

Dimensions

The following dimensions may occur on metrics emitted by this monitor. Some dimensions may be specific to certain metrics.

Name Description
kubernetes_name The name of the resource that the metric describes
kubernetes_namespace The namespace of the resource that the metric describes
kubernetes_node The name of the node, as defined by the name field of the node resource.
kubernetes_pod_uid The UID of the pod that the metric describes
machine_id The machine ID from /etc/machine-id. This should be unique across all nodes in your cluster, but some cluster deployment tools don't guarantee this. This will not be sent if the useNodeName config option is set to true.
metric_source This is always set to openshift
quota_name The name of the k8s ResourceQuota object that the quota is part of
resource The k8s resource that the quota applies to

Properties

The following properties are set on the dimension values of the dimension specified.

Name Dimension Description
<node label> machine_id/kubernetes_node All non-blank labels on a given node will be synced as properties to the machine_id or kubernetes_node dimension value for that node. Which dimension gets the properties is determined by the useNodeName config option. Any blank values will be synced as tags on that same dimension.
<pod label> kubernetes_pod_uid Any labels with non-blank values on the pod will be synced as properties to the kubernetes_pod_uid dimension. Any blank labels will be synced as tags on that same dimension.