Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(infra.ci/agent): new cluster in azure sponsored for infra.ci agents #715

Merged
merged 8 commits into from
Jun 10, 2024

Conversation

smerle33
Copy link
Collaborator

@smerle33 smerle33 commented Jun 4, 2024

as per jenkins-infra/helpdesk#3923 (comment)
kubernetes cluster within the sponsored subscription of azure

split in 3 PR:
- creation of the cluster (this one)
- creation of the nodes
- creation of kubernetes-admin-sa with the module

depends on jenkins-infra/azure-net#249 for the network definition

@smerle33 smerle33 force-pushed the new/aks/sponsored/infra branch from abd7dbf to 3160315 Compare June 10, 2024 12:14
@smerle33 smerle33 marked this pull request as ready for review June 10, 2024 12:16
@smerle33 smerle33 requested a review from a team June 10, 2024 12:16
@smerle33 smerle33 changed the title WIP feat(infra.ci/agent): new cluster in azure sponsored Jun 10, 2024
@smerle33 smerle33 changed the title feat(infra.ci/agent): new cluster in azure sponsored feat(infra.ci/agent): new cluster in azure sponsored for infra.ci agents Jun 10, 2024
lemeurherve
lemeurherve previously approved these changes Jun 10, 2024
locals.tf Outdated Show resolved Hide resolved
locals.tf Outdated
Comment on lines 53 to 57
publick8s_compute_zones = [3]
cijenkinsio_agents_1_compute_zones = [1]
infracijenkinsio_agents_1_compute_zones = [1]

infraci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14" # 10.100.0.1 - 10.103.255.255
Copy link
Member

@lemeurherve lemeurherve Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
publick8s_compute_zones = [3]
cijenkinsio_agents_1_compute_zones = [1]
infracijenkinsio_agents_1_compute_zones = [1]
infraci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14" # 10.100.0.1 - 10.103.255.255
infraci_jenkins_io_agents_1_compute_zones = [1]
infraci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14" # 10.100.0.1 - 10.103.255.255
publick8s_compute_zones = [3]

Nit: ordering and regrouping values, with cijenkinsio_agents_1_compute_zones moved above, and keep the same variable name format.

locals.tf Outdated
}
ci_jenkins_io_fqdn = "ci.jenkins.io"
ci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14"
ci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14" # 10.100.0.1 - 10.103.255.255
Copy link
Member

@lemeurherve lemeurherve Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14" # 10.100.0.1 - 10.103.255.255
ci_jenkins_io_agents_1_compute_zones = [1]
ci_jenkins_io_agents_1_pod_cidr = "10.100.0.0/14" # 10.100.0.1 - 10.103.255.255

nit: moved from below and keep the same variable name format.

node_count = 3 # 3 nodes for HA as per AKS best practises
vnet_subnet_id = data.azurerm_subnet.infraci_jenkins_io_kubernetes_agent_sponsorship.id
tags = local.default_tags
zones = local.cijenkinsio_agents_1_compute_zones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zones = local.cijenkinsio_agents_1_compute_zones
zones = local.ci_jenkins_io_agents_1_compute_zones

(nit, keep same variable name format)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have the cluster Terraform's resource ID the same as the local

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also suggested the same change in locals so all/most resources have a common naming strategy. (Might need more work, too much?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We misunderstood each other: i am not in favor of the proposal. I prefer the local values to be the same as the terraform resource id, which is « cijenkinsio_agents_1_… ».

if you want to change the convention no problem but it us more than a nit as it involves migrating resources (using a moved block) and derails the PR here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also suggested the same change in locals so all/most resources have a common naming strategy. (Might need more work, too much?)

With the last changes, I think I see the "naming" missing homogeneity. I would want to make this another PR though otherwise the review here will be too hard (side tracking from the main goal): could you open a first draft PR so we can have the discussion properly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 🙂

@smerle33 smerle33 requested a review from lemeurherve June 10, 2024 15:09
Copy link
Contributor

@dduportal dduportal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan: 2 to add, 0 to change, 0 to destroy. => a new RG and the new cluster.

LGTM @smerle33 you can proceed!

@dduportal
Copy link
Contributor

This PR failed to deploy with the following error:

17:43:25  │ Error: creating Kubernetes Cluster (Subscription: "1311c09f-aee0-4d6c-99a4-392c2b543204"
17:43:25  │ Resource Group Name: "infra-ci-jenkins-io-kubernetes-agents"
17:43:25  │ Kubernetes Cluster Name: "infracijenkinsio-agents-1"): performing CreateOrUpdate: unexpected status 400 (400 Bad Request) with response: {
17:43:25  │   "code": "SubnetNotAssociatedWithNATGateway",
17:43:25  │   "details": null,
17:43:25  │   "message": "Subnet '/subscriptions/1311c09f-aee0-4d6c-99a4-392c2b543204/resourceGroups/infra-ci-jenkins-io-sponsorship/providers/Microsoft.Network/virtualNetworks/infra-ci-jenkins-io-sponsorship-vnet/subnets/infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent' must have a NAT gateway associated for outbound connection.",
17:43:25  │   "subcode": ""
17:43:25  │  }

#719 created by @smerle33 to revert if we cannot fix the error easily

@dduportal
Copy link
Contributor

This PR failed to deploy with the following error:

17:43:25  │ Error: creating Kubernetes Cluster (Subscription: "1311c09f-aee0-4d6c-99a4-392c2b543204"
17:43:25  │ Resource Group Name: "infra-ci-jenkins-io-kubernetes-agents"
17:43:25  │ Kubernetes Cluster Name: "infracijenkinsio-agents-1"): performing CreateOrUpdate: unexpected status 400 (400 Bad Request) with response: {
17:43:25  │   "code": "SubnetNotAssociatedWithNATGateway",
17:43:25  │   "details": null,
17:43:25  │   "message": "Subnet '/subscriptions/1311c09f-aee0-4d6c-99a4-392c2b543204/resourceGroups/infra-ci-jenkins-io-sponsorship/providers/Microsoft.Network/virtualNetworks/infra-ci-jenkins-io-sponsorship-vnet/subnets/infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent' must have a NAT gateway associated for outbound connection.",
17:43:25  │   "subcode": ""
17:43:25  │  }

#719 created by @smerle33 to revert if we cannot fix the error easily

Error fixed by jenkins-infra/azure-net#250. Let's retry

@dduportal
Copy link
Contributor

New error:

"message": "The name infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent used in subnet ID (/subscriptions/1311c09f-aee0-4d6c-99a4-392c2b543204/resourceGroups/infra-ci-jenkins-io-sponsorship/providers/Microsoft.Network/virtualNetworks/infra-ci-jenkins-io-sponsorship-vnet/subnets/infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent) in agent pool systempool1 is too long for AKS to use. This name will be used as a label value on the Kubernetes Node(s) for this agent pool. There is a limit of 63 characters for Node labels. Please shorten the name so the length is less than or equal to 63. If the agent pool has already been created then it must be deleted first before making any more changes to this cluster.",
18:18:49 
 │   "subcode": "NetworkNameTooLong"

@dduportal
Copy link
Contributor

New error:

"message": "The name infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent used in subnet ID (/subscriptions/1311c09f-aee0-4d6c-99a4-392c2b543204/resourceGroups/infra-ci-jenkins-io-sponsorship/providers/Microsoft.Network/virtualNetworks/infra-ci-jenkins-io-sponsorship-vnet/subnets/infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent) in agent pool systempool1 is too long for AKS to use. This name will be used as a label value on the Kubernetes Node(s) for this agent pool. There is a limit of 63 characters for Node labels. Please shorten the name so the length is less than or equal to 63. If the agent pool has already been created then it must be deleted first before making any more changes to this cluster.",
18:18:49 
 │   "subcode": "NetworkNameTooLong"

Should be fixed by:

=> new subnet name is 53 chars

dduportal added a commit that referenced this pull request Jun 10, 2024
… (shorter) name (#720)

This PR aims to fix the error mentioned in
#715 (comment)

It updates the subnet name used for this cluster as a follow up of
jenkins-infra/azure-net#252

Signed-off-by: Damien Duportal <[email protected]>
@dduportal
Copy link
Contributor

New error:

"message": "The name infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent used in subnet ID (/subscriptions/1311c09f-aee0-4d6c-99a4-392c2b543204/resourceGroups/infra-ci-jenkins-io-sponsorship/providers/Microsoft.Network/virtualNetworks/infra-ci-jenkins-io-sponsorship-vnet/subnets/infra-ci-jenkins-io-sponsorship-vnet-infraci_jenkins_io_kubernetes-agent) in agent pool systempool1 is too long for AKS to use. This name will be used as a label value on the Kubernetes Node(s) for this agent pool. There is a limit of 63 characters for Node labels. Please shorten the name so the length is less than or equal to 63. If the agent pool has already been created then it must be deleted first before making any more changes to this cluster.",
18:18:49 
 │   "subcode": "NetworkNameTooLong"

Should be fixed by:

* [Revert "feat(infraci/agents-sponsored/cluster): add new subnet to infra ci outbound ips in sponsored" azure-net#251](https://github.com/jenkins-infra/azure-net/pull/251)

* [hotfix(vnets) shorten name for the infra.ci sponsored subnet used for kubernetes agents azure-net#252](https://github.com/jenkins-infra/azure-net/pull/252)

* [Reapply "feat(infraci/agents-sponsored/cluster): add new subnet to infra ci ou…" (#251) azure-net#253](https://github.com/jenkins-infra/azure-net/pull/253)

=> new subnet name is 53 chars

Next step: #720

smerle33 added a commit that referenced this pull request Jun 11, 2024
as per jenkins-infra/helpdesk#3923
and following #715 

this PR create 3 nodes pools:
- application one in arm64
- agents in x86-64
- agents in arm64

---------

Co-authored-by: Damien Duportal <[email protected]>
smerle33 added a commit that referenced this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants