-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable high availability (HA) configuration for ASO #4445
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will things behave when upgrading ASO to the next version?
I'm worried about the scenario where the version N+1 of ASO deploys new CRD versions which (still running) version N doesn't understand.
In this scenario, since we use the newest resource version as the storage (hub) version of the resource, any resource that has been touched by the version N+1 will be unintelligible to version N, resulting in a panic/crash.
...zure-service-operator/templates/policy_v1_ poddisruptionbudget_azureserviceoperator-pdb.yaml
Outdated
Show resolved
Hide resolved
@matthchr I've made the requested changes. When you have a moment, could you kindly review the PR? |
@bingikarthik - thanks! I think we need to make a deployment rollout change to make this safe (as @theunrepentantgeek called out). I'll send a separate PR for that and link it here, and then once that merges we can make sure this change is safe w/ it, and assuming it is, merge this too. Thanks for your patience |
Does ASO support "leader" kind of approach? If multiple replicas are running (and reconciling independently), can the same request can be processed by multiple instances of ASO resulting in duplicate calls to Azure? |
Yes, it was indeed. Please check: https://github.com/Azure/azure-service-operator/blob/main/v2/charts/azure-service-operator/templates/apps_v1_deployment_azureserviceoperator-controller-manager.yaml#L59 |
metadata: | ||
name: controller-manager | ||
namespace: system | ||
labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs app.kubernetes.io/name: azure-service-operator
label too, to match Helm
apiVersion: policy/v1 | ||
kind: PodDisruptionBudget | ||
metadata: | ||
name: controller-manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be pdb
not controller-manager
.
had a few comments - once they're fixed I'll kick off CI and we can merge this. |
What this PR does
Closes #4215
As we adopt Azure Service Operator (ASO), it's essential to run multiple replicas for production workloads, especially during cluster upgrade operations. This PR enables high availability (HA) for ASO by increasing the replica count and implementing a Pod Disruption Budget (PDB) to maintain consistent uptime and resilience during potential disruptions.
How does this PR make you feel?
Checklist