From af234a2a1b2f7451fe131dec926b35e47753ffe5 Mon Sep 17 00:00:00 2001
From: Yuliia Horbenko <31223054+yuliiiah@users.noreply.github.com>
Date: Fri, 5 Jul 2024 19:47:22 +0200
Subject: [PATCH] Clean up markup and alignment in the Troubleshooting section
(#3255)
* chore: Fix markup and alignment in the Troubleshooting section
* chore: Fix code snippets in cluster-deployment
---
.../troubleshooting/cluster-deployment.md | 34 ++++-----
docs/docs-content/troubleshooting/edge.md | 28 ++++----
docs/docs-content/troubleshooting/nodes.md | 72 ++++++++-----------
.../troubleshooting/palette-dev-engine.md | 8 ---
docs/docs-content/troubleshooting/pcg.md | 2 +-
.../troubleshooting/troubleshooting.md | 6 --
6 files changed, 64 insertions(+), 86 deletions(-)
diff --git a/docs/docs-content/troubleshooting/cluster-deployment.md b/docs/docs-content/troubleshooting/cluster-deployment.md
index 30221a7eeb..5ee1cfe18a 100644
--- a/docs/docs-content/troubleshooting/cluster-deployment.md
+++ b/docs/docs-content/troubleshooting/cluster-deployment.md
@@ -17,8 +17,6 @@ The following steps will help you troubleshoot errors in the event issues arise
An instance is launched and terminated every 30 minutes prior to completion of its deployment, and the **Events Tab**
lists errors with the following message:
-
-
```hideClipboard bash
Failed to update kubeadmControlPlane Connection timeout connecting to Kubernetes Endpoint
```
@@ -35,28 +33,30 @@ why a service may fail are:
user `spectro`. If you are initiating an SSH session into an installer instance, log in as user `ubuntu`.
```shell
- ssh --identity_file <_pathToYourSSHkey_> spectro@X.X.X.X
+ ssh --identity_file <_pathToYourSSHkey_> spectro@X.X.X.X
```
2. Elevate the user access.
```shell
- sudo -i
+ sudo -i
```
3. Verify the Kubelet service is operational.
```shell
- systemctl status kubelet.service
+ systemctl status kubelet.service
```
4. If the Kubelet service does not work as expected, do the following. If the service operates correctly, you can skip
this step.
1. Navigate to the **/var/log/** folder.
+
```shell
cd /var/log/
```
+
2. Scan the **cloud-init-output** file for any errors. Take note of any errors and address them.
```
cat cloud-init-output.log
@@ -66,34 +66,36 @@ why a service may fail are:
- Export the kubeconfig file.
- ```shell
- export KUBECONFIG=/etc/kubernetes/admin.conf
- ```
+ ```shell
+ export KUBECONFIG=/etc/kubernetes/admin.conf
+ ```
- Connect with the cluster's Kubernetes API.
- ```shell
- kubectl get pods --all-namespaces
- ```
+ ```shell
+ kubectl get pods --all-namespaces
+ ```
- When the connection is established, verify the pods are in a _Running_ state. Take note of any pods that are not in
_Running_ state.
- ```shell
- kubectl get pods -o wide
- ```
+ ```shell
+ kubectl get pods -o wide
+ ```
- If all the pods are operating correctly, verify their connection with the Palette API.
- For clusters using Gateway, verify the connection between the Installer and Gateway instance:
+
```shell
- curl -k https://:6443
+ curl -k https://:6443
```
+
- For Public Clouds that do not use Gateway, verify the connection between the public Internet and the Kube
endpoint:
```shell
- curl -k https://:6443
+ curl -k https://:6443
```
:::info
diff --git a/docs/docs-content/troubleshooting/edge.md b/docs/docs-content/troubleshooting/edge.md
index ce04332b4c..2a63aac898 100644
--- a/docs/docs-content/troubleshooting/edge.md
+++ b/docs/docs-content/troubleshooting/edge.md
@@ -42,23 +42,23 @@ adjust the values of related environment variables in the KubeVip DaemonSet with
2. Issue the following command:
-```shell
-kubectl edit ds kube-vip-ds --namespace kube-system
-```
+ ```shell
+ kubectl edit ds kube-vip-ds --namespace kube-system
+ ```
3. In the `env` of the KubeVip service, modify the environment variables to have the following corresponding values.
-```yaml {4-9}
-env:
- - name: vip_leaderelection
- value: "true"
- - name: vip_leaseduration
- value: "30"
- - name: vip_renewdeadline
- value: "20"
- - name: vip_retryperiod
- value: "4"
-```
+ ```yaml {4-9}
+ env:
+ - name: vip_leaderelection
+ value: "true"
+ - name: vip_leaseduration
+ value: "30"
+ - name: vip_renewdeadline
+ value: "20"
+ - name: vip_retryperiod
+ value: "4"
+ ```
4. Within a minute, the old Pods in unknown state will be terminated and Pods will come up with the updated values.
diff --git a/docs/docs-content/troubleshooting/nodes.md b/docs/docs-content/troubleshooting/nodes.md
index e076b657fc..b2675f9d8b 100644
--- a/docs/docs-content/troubleshooting/nodes.md
+++ b/docs/docs-content/troubleshooting/nodes.md
@@ -48,8 +48,6 @@ resulted in a node repave. The API payload is incomplete for brevity.
For detailed information, review the cluster upgrades [page](../clusters/clusters.md).
-
-
## Clusters
## Scenario - vSphere Cluster and Stale ARP Table
@@ -64,8 +62,6 @@ This is done automatically without any user action.
You can verify the cleaning process by issuing the following command on non-VIP nodes and observing that the ARP cache
is never older than 300 seconds.
-
-
```shell
watch ip -statistics neighbour
```
@@ -77,8 +73,6 @@ Amazon EKS
[Runbook](https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html)
for troubleshooting guidance.
-
-
## Palette Agents Workload Payload Size Issue
A cluster comprised of many nodes can create a situation where the workload report data the agent sends to Palette
@@ -89,8 +83,6 @@ If you encounter this scenario, you can configure the cluster to stop sending wo
the workload report feature, create a _configMap_ with the following configuration. Use a cluster profile manifest layer
to create the configMap.
-
-
```shell
apiVersion: v1
kind: ConfigMap
@@ -101,8 +93,6 @@ data:
feature.workloads: disable
```
-
-
## OS Patch Fails
When conducting [OS Patching](../clusters/cluster-management/os-patching.md), sometimes the patching process can time
@@ -128,39 +118,39 @@ To resolve this issue, use the following steps:
7. SSH into one of the cluster nodes and issue the following command.
-```shell
-rm /var/cache/debconf/config.dat && \
-dpkg --configure -a
-```
+ ```shell
+ rm /var/cache/debconf/config.dat && \
+ dpkg --configure -a
+ ```
8. A prompt may appear asking you to select the boot device. Select the appropriate boot device and press **Enter**.
-:::tip
-
-If you are unsure of the boot device, use a disk utility such as `lsblk` or `fdisk` to identify the boot device. Below
-is an example of using `lsblk` to identify the boot device. The output is abbreviated for brevity.
-
-```shell
-lsblk --output NAME,TYPE,MOUNTPOINT
-```
-
-```shell {10} hideClipboard
-NAME TYPE MOUNTPOINT
-fd0 disk
-loop0 loop /snap/core20/1974
-...
-loop10 loop /snap/snapd/20092
-loop11 loop /snap/snapd/20290
-sda disk
-├─sda1 part /
-├─sda14 part
-└─sda15 part /boot/efi
-sr0 rom
-```
-
-The highlighted line displays the boot device. In this example, the boot device is `sda15`, mounted at `/boot/efi`. The
-boot device may be different for your node.
-
-:::
+ :::tip
+
+ If you are unsure of the boot device, use a disk utility such as `lsblk` or `fdisk` to identify the boot device.
+ Below is an example of using `lsblk` to identify the boot device. The output is abbreviated for brevity.
+
+ ```shell
+ lsblk --output NAME,TYPE,MOUNTPOINT
+ ```
+
+ ```shell {10} hideClipboard
+ NAME TYPE MOUNTPOINT
+ fd0 disk
+ loop0 loop /snap/core20/1974
+ ...
+ loop10 loop /snap/snapd/20092
+ loop11 loop /snap/snapd/20290
+ sda disk
+ ├─sda1 part /
+ ├─sda14 part
+ └─sda15 part /boot/efi
+ sr0 rom
+ ```
+
+ The highlighted line displays the boot device. In this example, the boot device is `sda15`, mounted at `/boot/efi`.
+ The boot device may be different for your node.
+
+ :::
9. Repeat the previous step for all nodes in the cluster.
diff --git a/docs/docs-content/troubleshooting/palette-dev-engine.md b/docs/docs-content/troubleshooting/palette-dev-engine.md
index 04ccbf1e1b..124972afb8 100644
--- a/docs/docs-content/troubleshooting/palette-dev-engine.md
+++ b/docs/docs-content/troubleshooting/palette-dev-engine.md
@@ -12,8 +12,6 @@ tags: ["troubleshooting", "pde", "app mode"]
Use the following content to help you troubleshoot issues you may encounter when using Palette Dev Engine (PDE).
-
-
## Resource Requests
All [Cluster Groups](../clusters/cluster-groups/cluster-groups.md) are configured with a default
@@ -25,18 +23,12 @@ to let the system manage the resources.
If you specify `requests` but not `limits`, the default limits imposed by the LimitRange will likely be lower than the
requests, causing the following error.
-
-
```shell hideClipboard
Invalid value: "300m": must be less than or equal to CPU limit spec.containers[0].resources.requests: Invalid value: "512Mi": must be less than or equal to memory limit
```
-
-
The workaround is to define both the `requests` and `limits`.
-
-
## Scenario - Controller Manager Pod Not Upgraded
If the `palette-controller-manager` pod for a virtual cluster is not upgraded after a Palette platform upgrade, use the
diff --git a/docs/docs-content/troubleshooting/pcg.md b/docs/docs-content/troubleshooting/pcg.md
index f4dab562a0..08741110c4 100644
--- a/docs/docs-content/troubleshooting/pcg.md
+++ b/docs/docs-content/troubleshooting/pcg.md
@@ -90,7 +90,7 @@ unavailable IP addresses for the worker nodes, or the inability to perform a Net
9. If the problem persists, download the cluster logs from Palette. The screenshot below will help you locate the button
to download logs from the cluster details page.
-![A screenshot highlighting how to download the cluster logs from Palette.](/troubleshooting-pcg-download_logs.webp)
+ ![A screenshot highlighting how to download the cluster logs from Palette.](/troubleshooting-pcg-download_logs.webp)
10. Share the logs with our support team at [support@spectrocloud.com](mailto:support@spectrocloud.com).
diff --git a/docs/docs-content/troubleshooting/troubleshooting.md b/docs/docs-content/troubleshooting/troubleshooting.md
index 0b4fc030f2..fe13c40c8f 100644
--- a/docs/docs-content/troubleshooting/troubleshooting.md
+++ b/docs/docs-content/troubleshooting/troubleshooting.md
@@ -11,8 +11,6 @@ tags: ["troubleshooting"]
Use the following troubleshooting resources to help you address issues that may arise. You can also reach out to our
support team by opening up a ticket through our [support page](http://support.spectrocloud.io/).
-
-
- [Cluster Deployment](cluster-deployment.md)
- [Edge](edge.md)
@@ -53,8 +51,6 @@ Follow the link for more details: [Download Cluster Logs](../clusters/clusters.m
Spectro Cloud maintains an event stream with low-level details of the various orchestration tasks being performed. This
event stream is a good source for identifying issues in the event an operation does not complete for a long time.
-
-
:::warning
Due to Spectro Cloud’s reconciliation logic, intermittent errors show up in the event stream. As an example, after
@@ -83,5 +79,3 @@ made to perform the task. Failed conditions are a great source of troubleshootin
For example, failure to create a virtual machine in AWS due to the vCPU limit being exceeded would cause this error is
shown to the end-users. They could choose to bring down some workloads in the AWS cloud to free up space. The next time
a VM creation task is attempted, it would succeed and the condition would be marked as a success.
-
-