Skip to content

Latest commit

 

History

History
75 lines (51 loc) · 5.84 KB

features.md

File metadata and controls

75 lines (51 loc) · 5.84 KB

Chaos Controller Features Guide

Dry-run mode

This fakes the injection while still going through the process of selecting targets, creating chaos pods, and simulating the disruption as much as possible. Put a different way, all "read" operations (like knowing which network interface to disrupt) will be executed while all "write" operations won't be (like creating what's needed to drop packets). Checkout this example.

Level

A disruption can be applied either at the pod level or at the node level:

  • When applied at the pod level, the controller will target pods and will affect only the targeted pods. Other pods running on the same node as those targeted may still be affected depending on the injected disruption.
  • When applied at the node level, the controller will target nodes and will potentially affect everything running on the node (other processes).

Example

Let's imagine a node with two pods running: foo and bar and a disruption dropping all outgoing network packets:

  • Applying this disruption at the pod level and with a selector targeting the foo pod will result with the foo pod not being able to send any packets, but the bar pod will still be able to send packets, as well as other processes on the node.
  • Applying this disruption at the node level and with a selector targeting the node itself, both foo and bar pods won't be able to send network packets anymore, as well as all the other processes running on the node.

Targeting

The Disruption resource uses label selectors to target pods and nodes. The controller will retrieve all pods or nodes matching the given label selector and will randomly select a number (defined in the count field) of matching targets. It's possible to specify multiple label selectors, in which case the controller will select from targets that match all of them. Once applied, you can see the targeted pods/nodes by describing the Disruption resource.

NOTE: If you are targeting pods, the disruption must be created in the same namespace as the targeted pods.

Targeting a specific pod

How can you target a specific pod by name, if it doesn't have a unique label selector you can use? The Disruption spec doesn't support field selectors at this time, so selecting by name isn't possible. However, you can use the kubectl label pods command, e.g., kubectl label pods $podname unique-label-for-this-disruption=target-me to dynamically add a unique label to the pod, which you can use as your label selector in the Disruption spec.

Targeting a specific container within a pod

By default, a disruption affects all containers within the pod. You can restrict the scope of the disruption to a single container or to only some containers like this.

Applying a disruption on pod initialization

📝 This mode has some restrictions:

  • it only works for network related (network and dns) disruptions
  • it only works with the pod level
  • it does not support containers scoping (applying a disruption to only some containers)

It can be handy to disrupt packets on pod initialization, meaning before containers are actually created and started, to test startup dependencies or init containers. You can do this in only two steps:

  • redeploy your pod with the specific label chaos.datadoghq.com/disrupt-on-init to hold it in the initialization state
    • the chaos-controller will inject an init containers name chaos-handler as the first init container in your pod
    • this init container is lightweight and does nothing but waiting for a SIGUSR1 signal to complete successfully
  • apply your disruption with the init mode on
    • the chaos pod will inject the disruption and unstuck your pod from the pending state

Note that in this mode, only pending pods with a running chaos-handler init container and matching your labels + the special label specified above will be targeted. The chaos-handler init container will automatically exit and fail if no signal is received within the specified timeout (default is 1 minute).

Examples

Please take a look at the different disruptions documentation linked in the table of content for more information about what they can do and how to use them.

Here is a full example of the disruption resource with comments. You can also have a look at the following use cases with examples of disruptions you can adapt and apply as you wish: