diff --git a/docs/docs-content/clusters/cluster-management/backup-restore/create-cluster-backup.md b/docs/docs-content/clusters/cluster-management/backup-restore/create-cluster-backup.md index e72d6baafb..152cda03c7 100644 --- a/docs/docs-content/clusters/cluster-management/backup-restore/create-cluster-backup.md +++ b/docs/docs-content/clusters/cluster-management/backup-restore/create-cluster-backup.md @@ -29,6 +29,10 @@ specify the backup expiry period, meaning the duration after which Palette will example, you can schedule a backup for every week on Sunday at midnight and automatically expire the backup after three months. Additionally, you can initiate a backup on demand for an existing cluster. +## Limitations + +- Nodes in [Maintenance Mode](../maintenance-mode.md) are not included in the backup process. + ## Schedule a Backup ### Prerequisites diff --git a/docs/docs-content/clusters/cluster-management/cluster-management.md b/docs/docs-content/clusters/cluster-management/cluster-management.md index fc98656c8f..02770acc78 100644 --- a/docs/docs-content/clusters/cluster-management/cluster-management.md +++ b/docs/docs-content/clusters/cluster-management/cluster-management.md @@ -85,3 +85,6 @@ The following sections describe these capabilities in detail: individual users and clusters. - [Image Swap](image-swap.md) - Learn how to use image swap capabilities with Palette. + +- [Maintenance Mode](./maintenance-mode.md) - Turn off scheduling (cordon) and drain nodes, migrating workloads to other + healthy nodes in the cluster without service disruptions. diff --git a/docs/docs-content/clusters/cluster-management/compliance-scan.md b/docs/docs-content/clusters/cluster-management/compliance-scan.md index 0efbcf3c46..f14b131b5f 100644 --- a/docs/docs-content/clusters/cluster-management/compliance-scan.md +++ b/docs/docs-content/clusters/cluster-management/compliance-scan.md @@ -17,10 +17,10 @@ purposes. To learn more about each scan type, refer to the following sections. :::info -Scans may not work as expected when a node is in maintenance mode. Before scheduling a scan, we recommend you turn off -maintenance mode if enabled. To verify if a node is in maintenance mode, navigate to **Clusters** > **Nodes** and check -the **Health** column for a **Maintenance mode** icon. To turn off maintenance mode, click on the **three-dot Menu** in -the row of the node you want to scan, and select **Turn off maintenance mode**. +Scans cannot be performed when a node is in [maintenance mode](./maintenance-mode.md). To verify if a node is in +maintenance mode, navigate to **Clusters** > **Nodes** and check the **Health** column for a **Maintenance mode** icon. +To turn off maintenance mode, click on the **three-dot Menu** in the row of the node you want to scan, and select **Turn +off maintenance mode**. ::: diff --git a/docs/docs-content/clusters/cluster-management/maintenance-mode.md b/docs/docs-content/clusters/cluster-management/maintenance-mode.md new file mode 100644 index 0000000000..47f01aa394 --- /dev/null +++ b/docs/docs-content/clusters/cluster-management/maintenance-mode.md @@ -0,0 +1,136 @@ +--- +sidebar_label: "Maintenance Mode" +title: "Maintenance Mode" +description: "Learn how to enable and use maintenance mode to cordon and drain nodes." +hide_table_of_contents: false +sidebar_position: 240 +tags: ["clusters", "cluster management"] +--- + +Similar to `kubectl` commands `cordon` and `drain`, maintenance mode allows you to temporarily disable scheduling for an +active control plane or worker node. When a node is placed in maintenance mode, workloads are migrated automatically to +other healthy nodes in the cluster without services being disrupted. Using maintenance mode makes it easier to perform +necessary maintenance tasks, address node issues, and optimize workload distribution while maintaining the desired level +of performance and availability. + +## Prerequistes + +- An active Palette host cluster with more than one control plane node and worker node. + +- Alternate nodes with sufficient resources available where processes from maintenance nodes can be provisioned. + +## Limitations + + +- Static pods and DaemonSets are not evicted from the node when activating maintenance mode. + +- Scans cannot be performed on the cluster when any node in the cluster is in maintenance mode. + +- Nodes in maintenance mode are not included in the backup process, which also means they cannot be restored. + +- Changes to add-on profiles are not applied to nodes in maintenance mode. + +- Certain changes to infrastructure profiles, such as Kubernetes version upgrades, require nodes to be recreated, + removing maintenance nodes in the process. + +## Activate Maintenance Mode + + +1. Log in to [Palette](https://console.spectrocloud.com). + +2. Navigate to the left **Main Menu** and select **Clusters**. + +3. Select the desired cluster and navigate to the **Nodes** tab of the cluster. + +4. Beside the node that needs maintenance, select the **three-dot Menu** and **Turn on maintenance mode**. + +5. When maintenance mode is activated, the **Health** icon changes to a set of tools, and the tooltip states + **Maintenance Mode: Initiated**. When Maintenance Mode is finished, the tooltip changes to **Maintenance Mode: + Complete**. + +Palette reminds you in several locations that you have a node in maintenance mode: + +- Beside the **Settings** drop-down while viewing your cluster. + +- On the cluster’s **Overview** tab beneath **Health** status. + +- On the cluster’s **Nodes** tab in the node’s **Health** column. + +![Node in maintenance mode](/clusters_cluster-management_maintenance_mode.webp) + +### Validate + +1. Log in to [Palette](https://console.spectrocloud.com). + +2. Navigate to the left **Main Menu** and select **Clusters**. + +3. Select the cluster with maintenance mode active and download the [kubeconfig](./palette-webctl.md) file. + +![The cluster details page with the two kubeconfig files elements highlighted](/clusters_cluster--management_kubeconfig_cluster-details-kubeconfig-files.webp) + +4. Open a terminal window and set the environment variable `KUBECONFIG` to point to the kubeconfig file you downloaded. + + ```bash + export KUBECONFIG=~/Downloads/admin.aws-maintenance-test.kubeconfig + ``` + +5. Confirm that the node is in a maintenance state, indicated by a `STATUS` of `SchedulingDisabled`. + + ```bash + kubectl get nodes + ``` + + ```bash hideClipboard {4} + NAME STATUS ROLES AGE VERSION + ip-10-0-1-174.ec2.internal Ready control-plane 177m v1.30.6 + ip-10-0-1-26.ec2.internal Ready 174m v1.30.6 + ip-10-0-1-235.ec2.internal Ready,SchedulingDisabled 174m v1.30.6 + ``` + +## Disable Maintenance Mode + + +1. Log in to [Palette](https://console.spectrocloud.com). + +2. Navigate to the left **Main Menu** and select **Clusters**. + +3. Select the desired cluster and navigate to the **Nodes** tab of the cluster. + +4. Select the **three-dot Menu** beside the maintenance node and **Turn off maintenance mode**. + +5. When maintenance mode is disabled, the **Health** icon reverts to a checkmark. + +:::warning + +Taking a node out of maintenance mode does not automatically rebalance workloads. + +::: + +### Validate + +1. Log in to [Palette](https://console.spectrocloud.com). + +2. Navigate to the left **Main Menu** and select **Clusters**. + +3. Select the desired cluster and download the [kubeconfig](./palette-webctl.md) file. + +![The cluster details page with the two kubeconfig files elements highlighted](/clusters_cluster--management_kubeconfig_cluster-details-kubeconfig-files.webp) + +4. Open a terminal window and set the environment variable `KUBECONFIG` to point to the kubeconfig file you downloaded. + + ```bash + export KUBECONFIG=~/Downloads/admin.aws-maintenance-test.kubeconfig + ``` + +5. Confirm that scheduling is no longer disabled for the node, indicated by a `STATUS` of `Ready`. + + ```bash + kubectl get nodes + ``` + + ```bash hideClipboard + NAME STATUS ROLES AGE VERSION + ip-10-0-1-174.ec2.internal Ready control-plane 177m v1.30.6 + ip-10-0-1-26.ec2.internal Ready 174m v1.30.6 + ip-10-0-1-235.ec2.internal Ready 174m v1.30.6 + ``` diff --git a/docs/docs-content/vm-management/architecture.md b/docs/docs-content/vm-management/architecture.md index 984b792a6b..7b6853964e 100644 --- a/docs/docs-content/vm-management/architecture.md +++ b/docs/docs-content/vm-management/architecture.md @@ -23,7 +23,7 @@ For more detailed information about the technical architecture of VMO, refer to By default, Palette VMO includes the following components: - **Descheduler**. Enables VM live migration to different nodes in the node pool when the original node is in - maintenance mode. + [maintenance mode](../clusters/cluster-management/maintenance-mode.md). - **Snapshot Controller**. Enables you to create VM snapshots. This component is automatically installed when you initiate or schedule cluster backups. diff --git a/docs/docs-content/vm-management/create-manage-vm/migrate-vm-to-different-node.md b/docs/docs-content/vm-management/create-manage-vm/migrate-vm-to-different-node.md index e6318d8ce8..82ea7cdae6 100644 --- a/docs/docs-content/vm-management/create-manage-vm/migrate-vm-to-different-node.md +++ b/docs/docs-content/vm-management/create-manage-vm/migrate-vm-to-different-node.md @@ -61,11 +61,11 @@ Follow the instructions below to migrate VMs to a different node. ## Evacuate a Host -Compute nodes can be placed into maintenance mode using Palette or manually using the `cordon` and `drain` commands. The -`cordon` command marks the node as un-schedulable and the `drain`command evacuates all the VMs and pods from it. This -process is useful in case you need to perform hardware maintenance on the node - for example to replace a disk or -network interface card (NIC) card, perform memory maintenance, or if there are any issues with a particular node that -need to be resolved. To learn more, check out the +Compute nodes can be placed into [maintenance mode](../../clusters/cluster-management/maintenance-mode.md) using Palette +or manually using the `cordon` and `drain` commands. The `cordon` command marks the node as un-schedulable and the +`drain` command evacuates all the VMs and pods from it. This process is useful in case you need to perform hardware +maintenance on the node - for example to replace a disk or network interface card (NIC) card, perform memory +maintenance, or if there are any issues with a particular node that need to be resolved. To learn more, check out the [Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#use-kubectl-drain-to-remove-a-node-from-service) Kubernetes resource. @@ -173,3 +173,5 @@ You can validate evacuation completed by following the steps below. - [Persistent Volume Access Modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) - [Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#use-kubectl-drain-to-remove-a-node-from-service) + +- [Maintenance Mode](../../clusters/cluster-management/maintenance-mode.md) diff --git a/static/assets/docs/images/clusters_cluster-management_maintenance_mode.webp b/static/assets/docs/images/clusters_cluster-management_maintenance_mode.webp new file mode 100644 index 0000000000..ba1101e0e7 Binary files /dev/null and b/static/assets/docs/images/clusters_cluster-management_maintenance_mode.webp differ