Skip to content

Latest commit

 

History

History
88 lines (58 loc) · 4.91 KB

development.md

File metadata and controls

88 lines (58 loc) · 4.91 KB

Development Guide

Testing Locally

demo.yaml contains testing resources you can apply directly to your cluster in whatever namespace you choose (chaos-demo by default), by running:

  • kubectl apply -f examples/demo.yaml

Applying your Manifest

Once you define your test manifest, run:

  • kubectl apply -f examples/<manifest>.yaml

Applying the manifest triggers the admission controller which validates your yaml file.

Successful validation

The resource request is passed to the reconcile loop which creates as many chaos pods as the number of disruptions kinds you included in your manifest. The controller also adds a finalizer to active disruptions so they are not cleaned up by Kubernetes before the disruption is cleaned up. Each chaos pod applies its specific disruption kind (network_disruption, cpu_pressure, node_failure, etc) to your target resources.

A succesful apply prints:

  • disruption.chaos.datadoghq.com/"<manifest name>" created

Unsuccessful validation

An explanitory error message like the following will print:

  • Error from server (count must be a positive integer or a valid percentage value): error when creating "examples/network_drop.yaml": admission webhook "chaos-controller-admission-webhook.chaos-engineering.svc" denied the request: count must be a positive integer or a valid percentage value

Reapplying a manifest

Because Chaos Controller disruptions are immutable, the correct way to reapply a manifest after edit is actually to delete the resource entirely and then apply. Modifying an already applied manifest file and then reapplying it by only running kubectl apply -f examples/<manifest>.yaml will print an error message like the following:

Error from server (a disruption spec can't be edited, please delete and recreate it if needed): error when applying patch: {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"chaos.datadoghq.com/v1beta1\",\"kind\":\"Disruption\",\"metadata\":{\"annotations\":{},\"name\":\"network-drop\",\"namespace\":\"chaos-demo\"},\"spec\":{\"count\":10,\"level\":\"pod\",\"network\":{\"drop\":100,\"hosts\":[{\"host\":\"10.0.0.0/8\"},{\"port\":80}]},\"selector\":{\"app\":\"demo-curl\"}}}\n"}},"spec":{"count":10}}
to:
Resource: "chaos.datadoghq.com/v1beta1, Resource=disruptions", GroupVersionKind: "chaos.datadoghq.com/v1beta1, Kind=Disruption"
Name: "network-drop", Namespace: "chaos-demo"
for: "examples/network_drop.yaml": admission webhook "chaos-controller-admission-webhook.chaos-engineering.svc" denied the request: a disruption spec can't be edited, please delete and recreate it if needed

Deleting your Manifest

Once you are done testing, you can remove the disruption by running:

  • kubectl delete -f examples/<manifest>.yaml

Deleting the manifest triggers the reconcile loop which then tries to delete all chaos pods the controller is aware of having created. When Kubernetes attempts to kill each chaos pod, the kill signal triggers the PersistentPostRun which was configured to run in Cobra. For each desription kind, the cleanAndExit function specified in PersistentPostRun executes the Clean functionality required by the Injector interface. Once this is done, the pod is removed, and when the reconcile loop picks up that all pods are cleaned, it removes the finalizer on the Disruption resource, allowing Kubernetes to clean it up.

A successful delete prints:

  • disruption.chaos.datadoghq.com "<manifest name>" deleted

If your pod gets stuck in terminating, it's possible that there was an issue with the disruption. Try manually removing the finalizer:

  • kubectl patch pod <pod> -p '{"metadata":{"finalizers":null}}'

Basic Troubleshooting

See the existing disruptions (corresponding to metadata.name):

  • kubectl get disruptions

Get a detailed overview of the live disruption (spec, finalizer, major events, etc)

  • kubectl describe disruption <disruption name>

See the chaos pods (with names like chaos-network-delay-llczv, chaos-network-drop-qlqnw):

  • kubectl -n chaos-engineering get pods

Check the logs of the resource:

  • kubectl logs <pod name>

Get a detailed overview of the resource (finalizers, major events, allocated IP address, containers, etc):

  • kubectl describe pod <pod name>

More complex troubleshooting on the faq.md page.

Helpers scripts

For verification on minikube we created some helper scripts:

  • List pod interfaces:
    • ./scripts/list_links.sh <pod_name>
  • List traffic control filters of the given pod:
    • ./scripts/list_tc_filters.sh <pod_name>
  • List traffic control qdiscs of the given pod:
    • ./scripts/list_tc_qdiscs.sh <pod_name>