Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Kubernetes 1.29 #4161

Closed
12 tasks done
smerle33 opened this issue Jul 4, 2024 · 11 comments
Closed
12 tasks done

Upgrade to Kubernetes 1.29 #4161

smerle33 opened this issue Jul 4, 2024 · 11 comments

Comments

@smerle33
Copy link
Contributor

smerle33 commented Jul 4, 2024

Per https://learn.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar, Microsoft stops supporting Kubernetes 1.28 in November 2024.

As such, we must upgrade our Kubernetes clusters to 1.29.

Last issue for 1.28 was: #4144


Task list:

@smerle33 smerle33 added the triage Incoming issues that need review label Jul 4, 2024
Copy link

github-actions bot commented Jul 4, 2024

Take a look at these similar issues to see if there isn't already a response to your problem:

  1. 92% Upgrade to Kubernetes 1.27 #3948
  2. 92% Upgrade to Kubernetes 1.26 #3683
  3. 92% Upgrade to Kubernetes 1.25 #3582
  4. 92% Upgrade to Kubernetes 1.24 #3387
  5. 92% Upgrade to Kubernetes 1.23 #3053

@github-actions github-actions bot mentioned this issue Jul 4, 2024
13 tasks
@dduportal dduportal added kubernetes azure and removed triage Incoming issues that need review labels Jul 9, 2024
@dduportal
Copy link
Contributor

No milestone yet: we should schedule this in August 2024

@dduportal dduportal added the triage Incoming issues that need review label Aug 19, 2024
@dduportal dduportal added this to the infra-team-sync-2024-08-27 milestone Aug 20, 2024
@dduportal dduportal removed the triage Incoming issues that need review label Aug 20, 2024
@dduportal
Copy link
Contributor

Ping @jayfranco999 as discussed during the team meeting, you can start working on this topic with the following tasks:

  • Bump the kubectl version line (start with the updatecli manifest, and then let's use the automated PRs) from 1.28.* to 1.29.*
  • Prepare the migration of the infra.ci.jenkins.io-agents1 AKS cluster by opening draft PRs in the same way as we did for 1.28

@dduportal
Copy link
Contributor

@dduportal
Copy link
Contributor

Check of the Kubernetes 1.29 and AKS changelogs:

Nothing for us on

From https://github.com/Azure/AKS/blob/master/CHANGELOG.md:

Starting Kubernetes 1.29, the default cgroups implementation on Azure Linux AKS nodes will be cgroupsv2. Older versions of Java, .NET and NodeJS do not support memory querying v2 memory constraints and this will lead to out of memory (OOM) issues for workloads. Please test your applications for cgroupsv2 compliance, and read the FAQ for cgroupsv2.

✅ We already were using cgroups v2 on ubuntu node since #2982

Changes to kube-reserved memory reservations are now in effect in AKS 1.29. The optimized reservation logic reduces kube-reserved memory by up to 20% depending on the node configuration. For existing 1.29 node pools created prior to 2/26, please perform a node pool update or recreate to see these changes.

✅ Optimization, we don't really care

Workload Identity is now supported as a setting for static PVs on Managed Blob/File CSI drivers in 1.29.

✅ Could be interesting for our statically defined PVs but no changes

Clusters running Kubernetes 1.29 or later will have kubernetes.azure.com/managedby=aks label to tigera-operator deployment in Calico clusters

✅ We don't use Calico

Effective starting with Kubernetes version 1.29, when you deploy Azure Kubernetes Service (AKS) clusters across multiple availability zones, AKS now utilizes zone-redundant storage (ZRS) to create managed disks within built-in storage classes. ZRS ensures synchronous replication of your Azure managed disk across multiple Azure availability zones in your chosen region. This redundancy strategy enhances the resilience of your applications and safeguards your data against datacenter failures. Refer to Storage concept for more information.

✅ We don't use zone redundant cluster

Advanced Container Networking Services can be enabled on Cilium-enabled clusters with Kubernetes v1.29.0 or greater, and on Retina-enabled clusters with Kubernetes v1.21.0 or greater for Advanced Network Observability.

✅ We don't use Cilium

Other changes are usual component version bumps (azure-csi, cni, etc.) , CVEs fixes and bufixes in the 1.29.x line

=> we can proceed!

@dduportal
Copy link
Contributor

dduportal commented Aug 22, 2024

infracijio-agents-1

Warning

Terraform upgrade must be performed manually otherwise it will upgrade the cluster where agents are currently performing, leading to build failure


Edit: upgrade finished with success!

dduportal pushed a commit to jenkins-infra/azure that referenced this issue Aug 22, 2024
As per
jenkins-infra/helpdesk#4161 (comment):

Upgrading the infracijenkinsio_agents_1 kubernetes cluster and node
pools to kub 1.28.9.

We can proceed to check the terraform plan

Signed-off-by: jayfranco999 <[email protected]>
@dduportal
Copy link
Contributor

dduportal commented Aug 22, 2024

cijioagents1

We plan to handle this upgrade Tuesday 22 August around 14:00 UTC

@dduportal
Copy link
Contributor

dduportal commented Aug 23, 2024

Update: privatek8s cluster upgrade:

@dduportal
Copy link
Contributor

dduportal commented Aug 23, 2024

Update: publick8s cluster upgrade:

@dduportal
Copy link
Contributor

Next upgrade: #4258

@dduportal
Copy link
Contributor

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants