From 4bb433081a892e77a868b435a0d7b63cf47171f0 Mon Sep 17 00:00:00 2001 From: Charlie McBride <33269602+charliedmcb@users.noreply.github.com> Date: Sun, 10 Nov 2024 12:05:31 -0800 Subject: [PATCH] docs(workshop): add documentation for kubecon optional steps (#568) * add documentation for kubecon optional steps * add additional notes on cleanup * add aks-node-viewer install * switch to getting latest * change to v0.0.2-alpha * tested through 12_scheduling_constraints * single areas clearly * updating docs removing termgraceseconds * modify comments * update comment on logs * update notes * remove , * cleanup wording * use correct karpenter-kct name * update note * minor rewording * add note * add a cd * add note about cleanup * update comment * reword * cleanup wording * reword tooling header * reword * add a * reword * rewording * reword --------- Co-authored-by: Charlie McBride --- docs/workshops/10_multi_node_consolidation.md | 2 +- docs/workshops/12_scheduling_constraints.md | 238 ++++++++++++++++++ docs/workshops/13_disruption_controls.md | 71 ++++++ ...cluster_creation_and_install_karpenter.md} | 31 ++- docs/workshops/9_single_node_consolidation.md | 2 +- docs/workshops/kubecon_azure_track.md | 78 +++++- docs/workshops/reestablish_env.md | 5 +- 7 files changed, 409 insertions(+), 18 deletions(-) create mode 100644 docs/workshops/12_scheduling_constraints.md create mode 100644 docs/workshops/13_disruption_controls.md rename docs/workshops/{1_install_karpenter.md => 1_aks_cluster_creation_and_install_karpenter.md} (82%) diff --git a/docs/workshops/10_multi_node_consolidation.md b/docs/workshops/10_multi_node_consolidation.md index 5ddef6e2c..7b0025863 100644 --- a/docs/workshops/10_multi_node_consolidation.md +++ b/docs/workshops/10_multi_node_consolidation.md @@ -1,7 +1,7 @@ ## Deploy NodePool: -Use the following command to deploy a `NodePool`, and `AKSNodeClass` for Multi Node Consolidation, where we've enabled consolidation for when nodes are empty or underutilized, immediately after `0s`. +Use the following command to deploy a `NodePool`, and `AKSNodeClass` for `Multi Node Consolidation`, where we've enabled consolidation for when nodes are empty or underutilized, immediately after `0s`. ```bash cd ~/environment/karpenter diff --git a/docs/workshops/12_scheduling_constraints.md b/docs/workshops/12_scheduling_constraints.md new file mode 100644 index 000000000..1bafba142 --- /dev/null +++ b/docs/workshops/12_scheduling_constraints.md @@ -0,0 +1,238 @@ + +## Deploy NodePool: + +### 2. Percentage-Base Disruption + +Use the following command, instead of the NodePool deployment listed under `2. Percentage-Base Disruption` of `Scheduling Constraints`. This will deploy a `NodePool`, and `AKSNodeClass` where we've set a disruption budget of `40%`. + +```bash +cd ~/environment/karpenter +cat > ndb-nodepool.yaml << EOF +# This example NodePool will provision general purpose instances +--- +apiVersion: karpenter.sh/v1 +kind: NodePool +metadata: + name: default + annotations: + kubernetes.io/description: "Basic NodePool for generic workloads" +spec: + disruption: + consolidationPolicy: WhenEmptyOrUnderutilized + consolidateAfter: 30s + budgets: + - nodes: "40%" + limits: + cpu: "20" + template: + metadata: + labels: + # required for Karpenter to predict overhead from cilium DaemonSet + kubernetes.azure.com/ebpf-dataplane: cilium + eks-immersion-team: my-team + spec: + expireAfter: 720h # 30 days + startupTaints: + # https://karpenter.sh/docs/concepts/nodepools/#cilium-startup-taint + - key: node.cilium.io/agent-not-ready + effect: NoExecute + value: "true" + requirements: + - key: karpenter.azure.com/sku-family + operator: In + values: [D] + - key: karpenter.azure.com/sku-cpu + operator: Lt + values: ["3"] + - key: kubernetes.io/arch + operator: In + values: ["amd64"] + - key: kubernetes.io/os + operator: In + values: ["linux"] + - key: karpenter.sh/capacity-type + operator: In + values: ["on-demand"] + nodeClassRef: + group: karpenter.azure.com + kind: AKSNodeClass + name: default +--- +apiVersion: karpenter.azure.com/v1alpha2 +kind: AKSNodeClass +metadata: + name: default + annotations: + kubernetes.io/description: "Basic AKSNodeClass for running Ubuntu2204 nodes" +spec: + imageFamily: Ubuntu2204 +EOF + +kubectl apply -f ndb-nodepool.yaml +``` + +``` +nodepool.karpenter.sh/default created +aksnodeclass.karpenter.azure.com/default created +``` + +### 3. Multiple Budget Policies + +Use the following command, instead of the first NodePool deployment listed under `3. Multiple Budget Policies` of `Scheduling Constraints`. This will update the `NodePool` deployment to add a max disruption budget of `2`, and define a schedule for 3 hours currently set to start at 21:00 UTC (2:00PM PT) of `0` which when active will not allow for any disruption. + +> Note: modify the schedule to the current UTC time, to see it take effect while completing this workshop + +```bash +cd ~/environment/karpenter +cat > ndb-nodepool.yaml << EOF +# This example NodePool will provision general purpose instances +--- +apiVersion: karpenter.sh/v1 +kind: NodePool +metadata: + name: default + annotations: + kubernetes.io/description: "Basic NodePool for generic workloads" +spec: + disruption: + consolidationPolicy: WhenEmptyOrUnderutilized + consolidateAfter: 30s + budgets: + - nodes: "40%" + reasons: + - "Empty" + - "Drifted" + - nodes: "2" + - nodes: "0" + schedule: "0 21 * * *" # modify this line to the current UTC time + duration: 3h + limits: + cpu: "40" + template: + metadata: + labels: + # required for Karpenter to predict overhead from cilium DaemonSet + kubernetes.azure.com/ebpf-dataplane: cilium + eks-immersion-team: my-team + spec: + expireAfter: 720h # 30 days + startupTaints: + # https://karpenter.sh/docs/concepts/nodepools/#cilium-startup-taint + - key: node.cilium.io/agent-not-ready + effect: NoExecute + value: "true" + requirements: + - key: karpenter.azure.com/sku-family + operator: In + values: [D] + - key: karpenter.azure.com/sku-cpu + operator: Lt + values: ["3"] + - key: kubernetes.io/arch + operator: In + values: ["amd64"] + - key: kubernetes.io/os + operator: In + values: ["linux"] + - key: karpenter.sh/capacity-type + operator: In + values: ["on-demand"] + nodeClassRef: + group: karpenter.azure.com + kind: AKSNodeClass + name: default +--- +apiVersion: karpenter.azure.com/v1alpha2 +kind: AKSNodeClass +metadata: + name: default + annotations: + kubernetes.io/description: "Basic AKSNodeClass for running Ubuntu2204 nodes" +spec: + imageFamily: Ubuntu2204 +EOF + +kubectl apply -f ndb-nodepool.yaml +``` + +``` +nodepool.karpenter.sh/default configured +aksnodeclass.karpenter.azure.com/default unchanged +``` + +Use the following command, instead of the second NodePool deployment listed under `3. Multiple Budget Policies` of `Scheduling Constraints`. This will remove the disruption schedule which is not allowing for any disruptions to occur. + +```bash +cd ~/environment/karpenter +cat > ndb-nodepool.yaml << EOF +# This example NodePool will provision general purpose instances +--- +apiVersion: karpenter.sh/v1 +kind: NodePool +metadata: + name: default + annotations: + kubernetes.io/description: "Basic NodePool for generic workloads" +spec: + disruption: + consolidationPolicy: WhenEmptyOrUnderutilized + consolidateAfter: 30s + budgets: + - nodes: "40%" + reasons: + - "Empty" + - "Drifted" + - nodes: "2" + limits: + cpu: "10" + template: + metadata: + labels: + # required for Karpenter to predict overhead from cilium DaemonSet + kubernetes.azure.com/ebpf-dataplane: cilium + eks-immersion-team: my-team + spec: + expireAfter: 720h # 30 days + startupTaints: + # https://karpenter.sh/docs/concepts/nodepools/#cilium-startup-taint + - key: node.cilium.io/agent-not-ready + effect: NoExecute + value: "true" + requirements: + - key: karpenter.azure.com/sku-family + operator: In + values: [D] + - key: karpenter.azure.com/sku-cpu + operator: Lt + values: ["3"] + - key: kubernetes.io/arch + operator: In + values: ["amd64"] + - key: kubernetes.io/os + operator: In + values: ["linux"] + - key: karpenter.sh/capacity-type + operator: In + values: ["on-demand"] + nodeClassRef: + group: karpenter.azure.com + kind: AKSNodeClass + name: default +--- +apiVersion: karpenter.azure.com/v1alpha2 +kind: AKSNodeClass +metadata: + name: default + annotations: + kubernetes.io/description: "Basic AKSNodeClass for running Ubuntu2204 nodes" +spec: + imageFamily: Ubuntu2204 +EOF + +kubectl apply -f ndb-nodepool.yaml +``` + +``` +nodepool.karpenter.sh/default configured +aksnodeclass.karpenter.azure.com/default unchanged +``` \ No newline at end of file diff --git a/docs/workshops/13_disruption_controls.md b/docs/workshops/13_disruption_controls.md new file mode 100644 index 000000000..471145678 --- /dev/null +++ b/docs/workshops/13_disruption_controls.md @@ -0,0 +1,71 @@ +## Deploy NodePool: + +Use the following command to deploy a `NodePool`, and `AKSNodeClass` for `Disruption Controls`, where we've made the nodes `expireAfter` 2 minutes, which will make the NodePool try to remove the nodes after 2 minutes. + +> Note: setting `terminationGracePeriod` in addition to `expireAfter` is a good way to help define an absolute maximum lifetime of a node. The node would be deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. However, setting `terminationGracePeriod` will ignore `karpenter.sh/do-not-disrupt: "true"`, and take precedence over a pod's own `terminationGracePeriod` or blocking eviction like PDBs, so be careful using it. + +```bash +cd ~/environment/karpenter +cat > eviction.yaml << EOF +# This example NodePool will provision general purpose instances +--- +apiVersion: karpenter.sh/v1 +kind: NodePool +metadata: + name: default + annotations: + kubernetes.io/description: "Basic NodePool for generic workloads" +spec: + disruption: + consolidationPolicy: WhenEmpty + consolidateAfter: 30s + limits: + cpu: "10" + template: + metadata: + labels: + # required for Karpenter to predict overhead from cilium DaemonSet + kubernetes.azure.com/ebpf-dataplane: cilium + eks-immersion-team: my-team + spec: + expireAfter: 2m0s + startupTaints: + # https://karpenter.sh/docs/concepts/nodepools/#cilium-startup-taint + - key: node.cilium.io/agent-not-ready + effect: NoExecute + value: "true" + requirements: + - key: karpenter.azure.com/sku-family + operator: In + values: [D] + - key: kubernetes.io/arch + operator: In + values: ["amd64"] + - key: kubernetes.io/os + operator: In + values: ["linux"] + - key: karpenter.sh/capacity-type + operator: In + values: ["on-demand"] + nodeClassRef: + group: karpenter.azure.com + kind: AKSNodeClass + name: default +--- +apiVersion: karpenter.azure.com/v1alpha2 +kind: AKSNodeClass +metadata: + name: default + annotations: + kubernetes.io/description: "Basic AKSNodeClass for running Ubuntu2204 nodes" +spec: + imageFamily: Ubuntu2204 +EOF + +kubectl apply -f eviction.yaml +``` + +``` +nodepool.karpenter.sh/default created +aksnodeclass.karpenter.azure.com/default created +``` \ No newline at end of file diff --git a/docs/workshops/1_install_karpenter.md b/docs/workshops/1_aks_cluster_creation_and_install_karpenter.md similarity index 82% rename from docs/workshops/1_install_karpenter.md rename to docs/workshops/1_aks_cluster_creation_and_install_karpenter.md index 4ea51e62b..3232924bb 100644 --- a/docs/workshops/1_install_karpenter.md +++ b/docs/workshops/1_aks_cluster_creation_and_install_karpenter.md @@ -4,16 +4,22 @@ Table of contents: - [Create a cluster](#create-a-cluster) - [Configure Helm chart values](#configure-helm-chart-values) - [Install Karpenter](#install-karpenter) - - [Create workshop namespace](#create-a-workshop-namespace) + - [Create our workshop namespace](#create-our-workshop-namespace) ## Envrionment Setup +### Pre-requisite + +You must have an Azure account, and personal Azure subscription. + +> Note: this will use your chosen subscription for any pricing/costs associated with the workshop. At the end of the workshop, see step [Cleanup](https://github.com/Azure/karpenter-provider-azure/blob/main/docs/workshops/kubecon_azure_track.md#cleanup) to ensure all the resources are properly cleaned up to eliminate any additional costs. + ### Launch the Cloud Shell Terminal Open [https://shell.azure.com/](https://shell.azure.com/) in a new tab. > Note:
-> \- If you do get disconnected from the Cloud Shell, and find your setup is not working, you can use the following document's quick and easy steps to reestablish it: [reestablish_env.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/reestablish_env.md). (this will only work if you have already completed all the steps of the installtion in this current doc) +> \- If you do get disconnected from the Cloud Shell, and find your setup is not working, you can use the following document's quick and easy steps to reestablish it: [reestablish_env.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/reestablish_env.md). (this will only work if you have already completed all the steps of installtion in this current doc) ### Create a Directory for the Workshop @@ -26,7 +32,7 @@ export PATH=$PATH:~/environment/karpenter/bin ### Install Utilities -Use the below command to install `yq` and `k9s`, both used for this workshop: +Use the below command to install `yq`, `k9s`, and `aks-node-viewer` all used for this workshop: ```bash cd ~/environment/karpenter/bin @@ -38,22 +44,23 @@ chmod +x ~/environment/karpenter/bin/yq # k9s - terminal UI to interact with the Kubernetes clusters wget https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_Linux_amd64.tar.gz -O ~/environment/karpenter/bin/k9s.tar.gz tar -xf k9s.tar.gz -``` -Optional Tools: -* [aks-node-viewer](https://github.com/azure/aks-node-viewer) - used for tracking price, and other metrics of nodes +# aks-node-viewer - used for tracking price, and other metrics of nodes +wget https://github.com/Azure/aks-node-viewer/releases/download/v0.0.2-alpha/aks-node-viewer_Linux_x86_64 -O ~/environment/karpenter/bin/aks-node-viewer +chmod +x ~/environment/karpenter/bin/aks-node-viewer +``` ## Installation -This guide shows how to get started with Karpenter by creating an AKS cluster and installing self-hosted Karpenter on it. +In these next steps, we'll create an AKS cluster and install self-hosted Karpenter on it. > Note: there is a managed version of Karpenter within AKS, called NAP (Node Autoprovisioning), with some more opinionated defaults and base scaling configurations. However, we'll be exploring the self-hosted approach today. ### Create a cluster -Create a new AKS cluster with the required configuration, and ready to run Karpenter using workload identity. +Create a new AKS cluster with the required configuration, and ready to run Karpenter using a workload identity. -Select the subscription to use (replace ``): +Select the subscription to use (replace `` with your azure subscription guid): ```bash export AZURE_SUBSCRIPTION_ID= @@ -81,7 +88,7 @@ Create the workload MSI that backs the karpenter pod auth: KMSI_JSON=$(az identity create --name karpentermsi --resource-group "${RG}" --location "${LOCATION}") ``` -Create the AKS cluster compatible with Karpenter, with workload identity enabled: +Create the AKS cluster compatible with Karpenter, where workload identity is enabled: ```bash AKS_JSON=$(az aks create \ @@ -168,7 +175,7 @@ Check karpenter version by using `helm list` command. helm list -n "${KARPENTER_NAMESPACE}" ``` -Expected to see `aks-managed-workload-identity` and `cilium` here as well, but if things worked correctly you should see a karpenter line like the following: +It's expected to see `aks-managed-workload-identity` and `cilium` here as well, but if things worked correctly you should see a karpenter line like the following: ``` NAME NAMESPACE REVISION UPDATED STATUS CHART @@ -192,7 +199,7 @@ You can also check the karpenter pod logs with the following: kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller ``` -### Create workshop namespace +### Create our workshop namespace Now let's create a namespace which we'll be using for all our work in this workshop moving forward: diff --git a/docs/workshops/9_single_node_consolidation.md b/docs/workshops/9_single_node_consolidation.md index 2d51e9f33..5eed6f20a 100644 --- a/docs/workshops/9_single_node_consolidation.md +++ b/docs/workshops/9_single_node_consolidation.md @@ -1,7 +1,7 @@ ## Deploy NodePool: -Use the following command to deploy a `NodePool`, and `AKSNodeClass` for Single Node Consolidation, where we've enabled consolidation for when nodes are empty or underutilized, but only after `1m`. +Use the following command to deploy a `NodePool`, and `AKSNodeClass` for `Single Node Consolidation`, where we've enabled consolidation for when nodes are empty or underutilized, but only after `1m`. ```bash cd ~/environment/karpenter diff --git a/docs/workshops/kubecon_azure_track.md b/docs/workshops/kubecon_azure_track.md index 1792e611d..df679bd52 100644 --- a/docs/workshops/kubecon_azure_track.md +++ b/docs/workshops/kubecon_azure_track.md @@ -1,7 +1,7 @@ Table of contents: - [Overview](#overview) - [Basic Cheet Sheet](#basic-cheet-sheet) -- [Adjustments](#adjustments) +- [Main Topics](#main-topics) - [Step: Install Karpenter](#step-install-karpenter) - [Step: Basic NodePool](#step-basic-nodepool) - [Step: Scaling Application](#step-scaling-application) @@ -12,6 +12,10 @@ Table of contents: - [Step: Consolidation](#step-consolidation) - [Step: Single Node Consolidation](#step-single-node-consolidation) - [Step: Multi Node Consolidation](#step-multi-node-consolidation) +- [Bonus Content (optional)](#bonus-content-optional) + - [Step: Scheduling Constraints](#step-scheduling-constraints) + - [Step: Disruption Control](#step-disruption-control) +- [Cleanup](#cleanup) ## Overview @@ -23,11 +27,16 @@ To follow along using this workshop, simply go through the steps detailed in thi When you see `eks-node-viewer` use `aks-node-viewer` instead. -## Adjusted Instructions +> Note: if you ever end up needing to use the extended log command to look back over a longer period of time, make sure its using the `kube-system` namespace like follows: +> ```bash +> kubectl -n karpenter logs -f deployment/karpenter --all-containers=true --since=20m +> ``` + +## Main Topics ### Step: [Install Karpenter](https://catalog.workshops.aws/karpenter/en-US/install-karpenter) -- Instead follow [1_install_karpenter.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/1_install_karpenter.md) +- Instead follow [1_aks_cluster_creation_and_install_karpenter.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/1_aks_cluster_creation_and_install_karpenter.md) ### Step: [Basic NodePool](https://catalog.workshops.aws/karpenter/en-US/basic-nodepool) @@ -107,3 +116,66 @@ When you see `eks-node-viewer` use `aks-node-viewer` instead. kubectl delete aksnodeclass default ``` - The same concepts within the workshop generally translate to AKS, but with different instances/pricing. However, for the deployment step of the NodePool, use a new deployment command with consolidation enabled. Found in [10_multi_node_consolidation.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/10_multi_node_consolidation.md) + +## Bonus Content (optional) + +Everything beyond this point is optional. Although, if skipping these steps, you will still want to [Cleanup](#cleanup) your resources. + +### Step: [Scheduling Constraints](https://catalog.workshops.aws/karpenter/en-US/scheduling-constraints#how-does-it-work) + +> Concepts translate to Azure. + +### Step: [NodePool Disruption Budgets](https://catalog.workshops.aws/karpenter/en-US/scheduling-constraints/nodepool-disruption-budgets) + +- Adjustments: + - In initial cleanup, replace the command to cleanup the `ec2nodeclass`, with: + > Note: it might pause for a few seconds on this command + ```bash + kubectl delete aksnodeclass default + ``` + - The same concepts within the workshop generally translate to AKS. However, for the 3 NodePool deployment commands, use the replacement deployment commands listed in [12_scheduling_constraints.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/12_scheduling_constraints.md) + +### Step: [Disruption Control](https://catalog.workshops.aws/karpenter/en-US/scheduling-constraints/disable-eviction) + +- Adjustments: + - In initial cleanup, replace the command to cleanup the `ec2nodeclass`, with: + > Note: it might pause for a few seconds on this command + ```bash + kubectl delete aksnodeclass default + ``` + - The same concepts within the workshop generally translate to AKS. However, for the deployment step of the NodePool, use the deployment command found in [13_disruption_controls.md](https://github.com/Azure/karpenter-provider-azure/tree/main/docs/workshops/13_disruption_controls.md) + - > Note: don't be surprised if after the `expireAfter` of `2m` has occured that there are new instances being created, and removed. This is expected. + - > Note: you may see a log for selecting the instance type and resolving the image after nodeclaim creation. + - > Note: `triggering termination for expired node after TTL`, and `deprovisioning via expiration` are not actually expected to show up within the logs. + +## Cleanup + +Once you've completed the workshop, ensure you cleanup all the resources to prevent any additional costs. + +> Note: if you've had any disconnects from the Cloud Shell, ensure your subscription is set +> ```bash +> env | grep AZURE_SUBSCRIPTION_ID +> ``` +> If you see no output from the above command, than re-select your subscription to use (replace `` with your azure subscription guid): +> +> ```bash +> export AZURE_SUBSCRIPTION_ID= +> az account set --subscription ${AZURE_SUBSCRIPTION_ID} +> ``` + +To cleanup all the azure resources, simply delete the resource group: + +> Confirm `y` to deleting the resource group, and proceeding with the operation. + +> Note: this will take a couple minutes + +```bash +az group delete --name ${RG} +``` + +The Cloud Shell should automatically clean itself up. However, if you want to pre-emptively remove all the files we created within the workshop, simply delete them with the following command: + +```bash +cd ~/ +rm -rf ~/environment +``` \ No newline at end of file diff --git a/docs/workshops/reestablish_env.md b/docs/workshops/reestablish_env.md index 154849a67..ff7f21b93 100644 --- a/docs/workshops/reestablish_env.md +++ b/docs/workshops/reestablish_env.md @@ -29,7 +29,7 @@ Otherwise, continue the steps in this doc. ## Scripts -Re-select your subscription to use (replace ``): +Re-select your subscription to use (replace `` with your azure subscription guid): ```bash export AZURE_SUBSCRIPTION_ID= @@ -55,6 +55,9 @@ chmod +x ~/environment/karpenter/bin/yq wget https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_Linux_amd64.tar.gz -O ~/environment/karpenter/bin/k9s.tar.gz tar -xf k9s.tar.gz +# aks-node-viewer - used for tracking price, and other metrics of nodes +wget https://github.com/Azure/aks-node-viewer/releases/download/v0.0.2-alpha/aks-node-viewer_Linux_x86_64 -O ~/environment/karpenter/bin/aks-node-viewer +chmod +x ~/environment/karpenter/bin/aks-node-viewer # Setup env vars export CLUSTER_NAME=karpenter