Skip to content

Commit

Permalink
fixup! docs: add initial version of troubleshooting guide
Browse files Browse the repository at this point in the history
  • Loading branch information
yorugac committed Jan 25, 2024
1 parent 18f5431 commit 2ef435a
Showing 1 changed file with 14 additions and 11 deletions.
25 changes: 14 additions & 11 deletions docs/troublehooting.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Troubleshooting

Just as any Kubernetes application, k6-operator can get into error scenarios which are sometimes a result of misconfigured test or setup. This document is meant to help troubleshoot such scenarios quicker.
Just as any Kubernetes application, k6-operator can get into error scenarios which are sometimes a result of a misconfigured test or setup. This document is meant to help troubleshoot such scenarios quicker.

## Common tricks

## :warning: Highly recommended! :warning:
### Preparation

> [!IMPORTANT]
> Run your script with `k6 run` first.
Before trying to run a script with k6-operator, be it via `TestRun` or via `PrivateLoadZone`, always run it locally:

Expand All @@ -17,13 +20,13 @@ If there are going to be environment variables or CLI options, pass them in as w
MY_ENV_VAR=foo k6 run script.js --tag my_tag=bar
```

This ensures that the script has correct syntax and can be parsed with k6 in the first place. Additionally, local run will make it obvious if the configured options are doing what is expected. If there are any errors or unexpected results in the output of `k6 run`, make sure to fix those prior to deploying the script elsewhere.
This ensures that the script has correct syntax and can be parsed with k6 in the first place. Additionally, running locally will make it obvious if the configured options are doing what is expected. If there are any errors or unexpected results in the output of `k6 run`, make sure to fix those prior to deploying the script elsewhere.

### `TestRun` deployment

#### The pods

In case of one `TestRun` CR creation with `parallelism: n`, there are certain repeating patterns:
In case of one `TestRun` Custom Resource (CR) creation with `parallelism: n`, there are certain repeating patterns:

1. There will be `n + 2` Jobs (with corresponding Pods) created: initializer, starter, `n` runners.
1. If any of these Jobs did not result in a Pod being deployed, there must be an issue with that Job. Some commands that can help here:
Expand All @@ -36,7 +39,7 @@ In case of one `TestRun` CR creation with `parallelism: n`, there are certain re
kubectl logs mytest-initializer-xxxxx
```

If the Pods seem to be working but not producing an expected result and there's not enough information in the logs of the Pods, it might make sense to turn on k6 [verbose option](https://k6.io/docs/using-k6/k6-options/reference/#options) in `TestRun` spec:
If the Pods seem to be working but not producing an expected result and there's not enough information in the logs of the Pods, it might make sense to turn on k6 [verbose option](https://grafana.com/docs/k6/latest/using-k6/k6-options/#options) in `TestRun` spec:
```yaml
apiVersion: k6.io/v1alpha1
Expand All @@ -57,7 +60,7 @@ spec:
Another source of info is k6-operator itself. It is deployed as a Kubernetes `Deployment`, with `replicas: 1` by default, and its logs together with observations about the Pods from [previous subsection](#the-pods) usually contain enough information to glean correct diagnosis. With the standard deployment, the logs of k6-operator can be checked with:
```bash
kubectl -n k6-operator-system -c manager logs k6-operator-controller-manager-9f8469df-trtw5
kubectl -n k6-operator-system -c manager logs k6-operator-controller-manager-xxxxxxxx-xxxxx
```
#### Inspect `TestRun` resource
Expand Down Expand Up @@ -102,7 +105,7 @@ Conditions can be used as a source of info as well, but it is a more advanced tr
### `PrivateLoadZone` deployment
If `PrivateLoadZone` CR was successfully created in Kubernetes, it should become visible in your account in GCk6 interface soon afterwards. If it doesnt appear in UI, then likely there is a problem to troubleshoot.
If `PrivateLoadZone` CR was successfully created in Kubernetes, it should become visible in your account in Grafana Cloud k6 (GCk6) interface soon afterwards. If it doesn't appear in the UI, then there is likely a problem to troubleshoot.

Firstly, go over the [guide](https://grafana.com/docs/grafana-cloud/k6/author-run/private-load-zone-v2/) to double-check if all the steps have been done correctly and successfully.

Expand Down Expand Up @@ -140,7 +143,7 @@ This is a standard problem with escaping the characters, and there's even an [is
### Initializer logs an error but it's not about tags

Often, this happens because of lack of attention to the [highly recommended](#⚠️-highly-recommended-⚠️) step. One more command that can be tried here is to run the following:
Often, this happens because of lack of attention to the [preparation](#preparation) step. One more command that can be tried here is to run the following:

```bash
k6 inspect --execution-requirements script.js
Expand All @@ -158,10 +161,10 @@ If standalone `k6 inspect --execution-requirements` executes successfully, then
ServiceAccount can be defined as `serviceAccountName` and `runner.serviceAccountName` in PrivateLoadZone and TestRun CRD respectfully. If the specified ServiceAccount does not exist, k6-operator will successfully create Jobs but corresponding Pods will fail to be deployed, and k6-operator will wait indefinitely for Pods to be `Ready`. This error can be best seen in the events of the Job:

```bash
kubectl describe job plz-test-154546-1
kubectl describe job plz-test-xxxxxx-initializer
...
Events:
Warning FailedCreate 57s (x4 over 2m7s) job-controller Error creating: pods "plz-test-154546-1-" is forbidden: error looking up service account plz-ns/plz-sa: serviceaccount "plz-sa" not found
Warning FailedCreate 57s (x4 over 2m7s) job-controller Error creating: pods "plz-test-xxxxxx-initializer-" is forbidden: error looking up service account plz-ns/plz-sa: serviceaccount "plz-sa" not found
```

Currently, k6-operator does not try to analyze such scenarios on its own but we have an [issue](https://github.com/grafana/k6-operator/issues/260) for improvement.
Expand All @@ -175,7 +178,7 @@ How to fix: incorrect `serviceAccountName` must be corrected and TestRun or Priv
This case is very similar to [ServiceAccount one](#non-existent-serviceaccount): the Pod creation will fail, only the error would be somewhat different:

```bash
kubectl describe job plz-test-154546-1
kubectl describe pod plz-test-xxxxxx-initializer-xxxxx
...
Events:
Warning FailedScheduling 48s (x5 over 4m6s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector.
Expand Down

0 comments on commit 2ef435a

Please sign in to comment.