CKAD/Labs/Ch08 at master · johandry/CKAD

Name	Name	Last commit message	Last commit date
parent directory ..
LAB_8.1.pdf	LAB_8.1.pdf
LAB_8.2.pdf	LAB_8.2.pdf
README.md	README.md
solution.sh	solution.sh

Chapter 8: Troubleshooting

Documentation

kubernetes.io > Concepts > Cluster Administration > Logging Architecture

kubernetes.io > Tasks > Monitoring, Logging, and Debugging > Logging Using Elasticsearch and Kibana

kubernetes.io > Tasks > Monitoring, Logging, and Debugging > Troubleshooting

kubernetes.io > Tasks > Monitoring, Logging, and Debugging > Troubleshoot Applications

kubernetes.io > Tasks > Monitoring, Logging, and Debugging > Troubleshoot Clusters

kubernetes.io > Tasks > Monitoring, Logging, and Debugging > Debug Pods and ReplicationControllers

kubernetes.io > Tasks > Monitoring, Logging, and Debugging > Debug Services

Notes from the Training

Linux tools

Shell into the failing Pod/container
Deploy similar Pod/container with busybox
DNS: dig
tcpdump

Monitoring & Logging tools

Prometheus for monitoring
Grafana for visualization of collected metrics from Prometheus.
Fluentd for logging and feed aggregated logs to Elasticsearch
ELK stack of Elastisearch, ~~Logstach~~ and Kibana. Elasticserch received the aggregated logs from fluentd and use Kibana to visualize them.
OpenTracing propagate transaction among all services, code and packages.
Jaeger is a tracing system focus on distributed context propagation, transaction monitoring and root cause analysis. It's an implementation of OpenTracing.

Basic Steps

Assuming a problematic pod:

kubectl create deployment problem --image=nginx

In the following flow, <tab> means pressing tab key, and it's used to autocomplete the pod name.

Investigate errors from command line

kubectl exec -it problem-<tab> -- /bin/bash

If the pod is running, check the logs:
```
kubectl logs problem-<tab>
```
Consider deploy a *sidecar- container in the pod to generate and handling logging. These can be configured to stream logs or run a logging agent.
Check networking, including DNS, firewalls and general connectivity using Linux commands/tools, example dig.
Check RBAC, SELinux and AppArmor for security settings. These may cause problems with networking.
Check nodes logs for errors. Make sure they have enough resources allocated.
API calls to and from controllers to kube-apiserver
Inter-node network issues, DNS & Firewall
Master server controllers.
1. Control Pods state
2. Errors in log files
3. Sufficient resources

Basic Flow: Pods

From the basic steps, execute steps #1 to #3
Is the containerized application working as expected?

Confirm the app is working correctly, check if this is an intermittent issue or related to slow performance.
(The app is not the culprit). Make sure the Pods are in *Running- status:
```
kubectl get pods
```
The status Pending usually means a resource is not available from the cluster. Examples: a properly tainted node, expected storage or enough resources.
Look at the logs and events of the container
```
kubectl logs problem-<tab>
kubectl describe problem-<tab>
kubectl get events
```
Check the number of restarts. If the restarts are not caused by the command that finished, it may indicate the application is having issues and failing.
If there is no info in the events, check the container logs
```
kubectl logs problem-<tab> <container_name>
```

Basic Flow: Node & Security

Disable security for testing. Disable RBAC, SELinux and AppArmor to identify the root cause of the issue
Check system and agent logs.
1. If they use systemd: Logs will go to journalctl, view the logs with journalctl -a and maybe in /var/log/journal/
2. Without systemd: Logs created in /var/log/<agent>.log
In both cases, the logs could have rotation, if not it's advisable to do it.

Container components:

kube-scheduler
kube-proxy

Non-container components:

kubelet
Docker
Others ...

Certified Kubernetes Conformance Program

A CNCF program to certify distributions that meets essential requirements and adhere to complete API functionality.

Read more about it on GitHub cncf/k8s-conformance and the instructions.

More resources

GitHub website for issues and bug tracking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ch08

Ch08

README.md

Chapter 8: Troubleshooting

Documentation

Notes from the Training

Linux tools

Monitoring & Logging tools

Basic Steps

Basic Flow: Pods

Basic Flow: Node & Security

Certified Kubernetes Conformance Program

More resources

Files

Ch08

Directory actions

More options

Directory actions

More options

Latest commit

History

Ch08

Folders and files

parent directory

README.md

Chapter 8: Troubleshooting

Documentation

Notes from the Training

Linux tools

Monitoring & Logging tools

Basic Steps

Basic Flow: Pods

Basic Flow: Node & Security

Certified Kubernetes Conformance Program

More resources