Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_kubernetes_cluster_node_pool: Adds support for temporary_name_for_rotation #27791

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

CorrenSoft
Copy link
Contributor

@CorrenSoft CorrenSoft commented Oct 28, 2024

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize for review
  • Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for PR followers and do not help prioritize for review

Description

  • Added temporary_name_for_rotation property (optional, not persisted).
  • Added the functionality to rotate the node pool instead of recreate it for the following properties:
    • fips_enabled
    • host_encryption_enabled
    • kubelet_config
    • linux_os_config
    • max_pods
    • node_public_ip_enabled
    • os_disk_size_gb
    • os_disk_type
    • pod_subnet_id
    • snapshot_id
    • ultra_ssd_enabled
    • vm_size
    • vnet_subnet_id
    • zones
  • Removed the ForceNew flag, and added the code to update the values on the above listed properties.
  • Updated TestAccKubernetesClusterNodePool_manualScaleVMSku and TestAccKubernetesClusterNodePool_ultraSSD test cases.
  • Removed schemaNodePoolSysctlConfigForceNew, schemaNodePoolKubeletConfigForceNew and schemaNodePoolLinuxOSConfigForceNew as they are no longer used.
  • Renamed retrySystemNodePoolCreation to retryNodePoolCreation, as now is being used for both cases.

PR Checklist

  • I have followed the guidelines in our Contributing Documentation.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.
  • I have checked if my changes close any open issues. If so please include appropriate closing keywords below.
  • I have updated/added Documentation as required written in a helpful and kind way to assist users that may be unfamiliar with the resource / data source.
  • I have used a meaningful PR title to help maintainers and other users understand this change and help prevent duplicate work.

Changes to existing Resource / Data Source

  • I have added an explanation of what my changes do and why I'd like you to include them (This may be covered by linking to an issue above, but may benefit from additional explanation).
  • I have written new tests for my resource or datasource changes & updated any relevant documentation.
  • I have successfully run tests with my changes locally. If not, please provide details on testing challenges that prevented you running the tests.

Testing

  • My submission includes Test coverage as described in the Contribution Guide and the tests pass. (if this is not possible for any reason, please include details of why you did or could not add test coverage)

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

This is a (please select all that apply):

  • Bug Fix
  • New Feature (ie adding a service, resource, or data source)
  • Enhancement
  • Breaking Change

Related Issue(s)

Fixes #22265

Note

If this PR changes meaningfully during the course of review please update the title and description as required.

Updates the NodePoolUpdate function to rotate the node pool.
Removes the ForceNew flag on properties.
Restoring name as ForceNew.
Deleting obsolete methods.
Renaming `retrySystemNodePoolCreation` to `retryNodePoolCreation`.
@CorrenSoft
Copy link
Contributor Author

Test Logs

make acctests SERVICE='containers' TESTARGS='-run=TestAccKubernetesClusterNodePool_'
==> Checking that code complies with gofmt requirements...
==> Checking that Custom Timeouts are used...
==> Checking that acceptance test packages are used...
TF_ACC=1 go test -v ./internal/services/containers -run=TestAccKubernetesClusterNodePool_ -timeout 180m -ldflags="-X=github.com/hashicorp/terraform-provider-azurerm/version.ProviderVersion=acc"
=== RUN TestAccKubernetesClusterNodePool_autoScale
=== PAUSE TestAccKubernetesClusterNodePool_autoScale
=== RUN TestAccKubernetesClusterNodePool_autoScaleUpdate
=== PAUSE TestAccKubernetesClusterNodePool_autoScaleUpdate
=== RUN TestAccKubernetesClusterNodePool_availabilityZones
=== PAUSE TestAccKubernetesClusterNodePool_availabilityZones
=== RUN TestAccKubernetesClusterNodePool_capacityReservationGroup
=== PAUSE TestAccKubernetesClusterNodePool_capacityReservationGroup
=== RUN TestAccKubernetesClusterNodePool_errorForAvailabilitySet
kubernetes_cluster_node_pool_resource_test.go:125: AvailabilitySet not supported as an option for default_node_pool in 4.0
--- SKIP: TestAccKubernetesClusterNodePool_errorForAvailabilitySet (0.00s)
=== RUN TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig
=== PAUSE TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig
=== RUN TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial
=== PAUSE TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial
=== RUN TestAccKubernetesClusterNodePool_other
=== PAUSE TestAccKubernetesClusterNodePool_other
=== RUN TestAccKubernetesClusterNodePool_multiplePools
=== PAUSE TestAccKubernetesClusterNodePool_multiplePools
=== RUN TestAccKubernetesClusterNodePool_manualScale
=== PAUSE TestAccKubernetesClusterNodePool_manualScale
=== RUN TestAccKubernetesClusterNodePool_manualScaleMultiplePools
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleMultiplePools
=== RUN TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate
=== RUN TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges
=== RUN TestAccKubernetesClusterNodePool_manualScaleUpdate
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleUpdate
=== RUN TestAccKubernetesClusterNodePool_manualScaleVMSku
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleVMSku
=== RUN TestAccKubernetesClusterNodePool_modeSystem
=== PAUSE TestAccKubernetesClusterNodePool_modeSystem
=== RUN TestAccKubernetesClusterNodePool_modeUpdate
=== PAUSE TestAccKubernetesClusterNodePool_modeUpdate
=== RUN TestAccKubernetesClusterNodePool_nodeTaints
=== PAUSE TestAccKubernetesClusterNodePool_nodeTaints
=== RUN TestAccKubernetesClusterNodePool_nodeLabels
=== PAUSE TestAccKubernetesClusterNodePool_nodeLabels
=== RUN TestAccKubernetesClusterNodePool_nodePublicIP
=== PAUSE TestAccKubernetesClusterNodePool_nodePublicIP
=== RUN TestAccKubernetesClusterNodePool_podSubnet
=== PAUSE TestAccKubernetesClusterNodePool_podSubnet
=== RUN TestAccKubernetesClusterNodePool_osDiskSizeGB
=== PAUSE TestAccKubernetesClusterNodePool_osDiskSizeGB
=== RUN TestAccKubernetesClusterNodePool_proximityPlacementGroupId
=== PAUSE TestAccKubernetesClusterNodePool_proximityPlacementGroupId
=== RUN TestAccKubernetesClusterNodePool_osDiskType
=== PAUSE TestAccKubernetesClusterNodePool_osDiskType
=== RUN TestAccKubernetesClusterNodePool_requiresImport
=== PAUSE TestAccKubernetesClusterNodePool_requiresImport
=== RUN TestAccKubernetesClusterNodePool_spot
=== PAUSE TestAccKubernetesClusterNodePool_spot
=== RUN TestAccKubernetesClusterNodePool_upgradeSettings
=== PAUSE TestAccKubernetesClusterNodePool_upgradeSettings
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkAutomatic
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkAutomatic
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkManual
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkManual
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet
=== RUN TestAccKubernetesClusterNodePool_windows
=== PAUSE TestAccKubernetesClusterNodePool_windows
=== RUN TestAccKubernetesClusterNodePool_windows2019
=== PAUSE TestAccKubernetesClusterNodePool_windows2019
=== RUN TestAccKubernetesClusterNodePool_windows2022
=== PAUSE TestAccKubernetesClusterNodePool_windows2022
=== RUN TestAccKubernetesClusterNodePool_windowsAndLinux
=== PAUSE TestAccKubernetesClusterNodePool_windowsAndLinux
=== RUN TestAccKubernetesClusterNodePool_zeroSize
=== PAUSE TestAccKubernetesClusterNodePool_zeroSize
=== RUN TestAccKubernetesClusterNodePool_hostEncryption
=== PAUSE TestAccKubernetesClusterNodePool_hostEncryption
=== RUN TestAccKubernetesClusterNodePool_maxSize
=== PAUSE TestAccKubernetesClusterNodePool_maxSize
=== RUN TestAccKubernetesClusterNodePool_sameSize
=== PAUSE TestAccKubernetesClusterNodePool_sameSize
=== RUN TestAccKubernetesClusterNodePool_ultraSSD
=== PAUSE TestAccKubernetesClusterNodePool_ultraSSD
=== RUN TestAccKubernetesClusterNodePool_osSkuUbuntu
=== PAUSE TestAccKubernetesClusterNodePool_osSkuUbuntu
=== RUN TestAccKubernetesClusterNodePool_osSkuAzureLinux
=== PAUSE TestAccKubernetesClusterNodePool_osSkuAzureLinux
=== RUN TestAccKubernetesClusterNodePool_osSkuCBLMariner
kubernetes_cluster_node_pool_resource_test.go:820: CBLMariner is an invalid os_sku in 4.0
--- SKIP: TestAccKubernetesClusterNodePool_osSkuCBLMariner (0.00s)
=== RUN TestAccKubernetesClusterNodePool_osSkuMariner
kubernetes_cluster_node_pool_resource_test.go:838: Mariner is an invalid os_sku in 4.0
--- SKIP: TestAccKubernetesClusterNodePool_osSkuMariner (0.00s)
=== RUN TestAccKubernetesClusterNodePool_osSkuMigration
=== PAUSE TestAccKubernetesClusterNodePool_osSkuMigration
=== RUN TestAccKubernetesClusterNodePool_dedicatedHost
=== PAUSE TestAccKubernetesClusterNodePool_dedicatedHost
=== RUN TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings
=== PAUSE TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings
=== RUN TestAccKubernetesClusterNodePool_scaleDownMode
=== PAUSE TestAccKubernetesClusterNodePool_scaleDownMode
=== RUN TestAccKubernetesClusterNodePool_workloadRuntime
=== PAUSE TestAccKubernetesClusterNodePool_workloadRuntime
=== RUN TestAccKubernetesClusterNodePool_customCATrustEnabled
kubernetes_cluster_node_pool_resource_test.go:980: Skipping this test in 4.0 beta as it is not supported
--- SKIP: TestAccKubernetesClusterNodePool_customCATrustEnabled (0.00s)
=== RUN TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled
=== PAUSE TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled
=== RUN TestAccKubernetesClusterNodePool_nodeIPTags
=== PAUSE TestAccKubernetesClusterNodePool_nodeIPTags
=== RUN TestAccKubernetesClusterNodePool_networkProfileComplete
=== PAUSE TestAccKubernetesClusterNodePool_networkProfileComplete
=== RUN TestAccKubernetesClusterNodePool_networkProfileUpdate
=== PAUSE TestAccKubernetesClusterNodePool_networkProfileUpdate
=== RUN TestAccKubernetesClusterNodePool_snapshotId
=== PAUSE TestAccKubernetesClusterNodePool_snapshotId
=== RUN TestAccKubernetesClusterNodePool_gpuInstance
=== PAUSE TestAccKubernetesClusterNodePool_gpuInstance
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition
=== CONT TestAccKubernetesClusterNodePool_autoScale
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkAutomatic
=== CONT TestAccKubernetesClusterNodePool_requiresImport
=== CONT TestAccKubernetesClusterNodePool_manualScaleVMSku
=== CONT TestAccKubernetesClusterNodePool_multiplePools
=== CONT TestAccKubernetesClusterNodePool_upgradeSettings
=== CONT TestAccKubernetesClusterNodePool_spot
=== CONT TestAccKubernetesClusterNodePool_manualScaleUpdate
--- PASS: TestAccKubernetesClusterNodePool_spot (965.01s)
=== CONT TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges
--- PASS: TestAccKubernetesClusterNodePool_requiresImport (1067.78s)
=== CONT TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkAutomatic (1158.41s)
=== CONT TestAccKubernetesClusterNodePool_manualScaleMultiplePools
--- PASS: TestAccKubernetesClusterNodePool_multiplePools (1161.29s)
=== CONT TestAccKubernetesClusterNodePool_manualScale
--- PASS: TestAccKubernetesClusterNodePool_upgradeSettings (1285.21s)
=== CONT TestAccKubernetesClusterNodePool_osSkuAzureLinux
--- PASS: TestAccKubernetesClusterNodePool_manualScaleVMSku (1329.16s)
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition
--- PASS: TestAccKubernetesClusterNodePool_manualScaleUpdate (1417.96s)
=== CONT TestAccKubernetesClusterNodePool_gpuInstance
--- PASS: TestAccKubernetesClusterNodePool_autoScale (1529.06s)
=== CONT TestAccKubernetesClusterNodePool_snapshotId
testcase.go:173: Step 1/5 error: Pre-apply plan check(s) failed:
azurerm_kubernetes_cluster_node_pool.test - Resource not found in plan ResourceChanges
--- FAIL: TestAccKubernetesClusterNodePool_snapshotId (87.46s)
=== CONT TestAccKubernetesClusterNodePool_networkProfileUpdate
--- PASS: TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges (870.69s)
=== CONT TestAccKubernetesClusterNodePool_networkProfileComplete
--- PASS: TestAccKubernetesClusterNodePool_osSkuAzureLinux (844.04s)
=== CONT TestAccKubernetesClusterNodePool_nodeIPTags
--- PASS: TestAccKubernetesClusterNodePool_manualScale (1012.77s)
=== CONT TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled
--- PASS: TestAccKubernetesClusterNodePool_manualScaleMultiplePools (1094.87s)
=== CONT TestAccKubernetesClusterNodePool_workloadRuntime
--- PASS: TestAccKubernetesClusterNodePool_gpuInstance (862.54s)
=== CONT TestAccKubernetesClusterNodePool_scaleDownMode
--- PASS: TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate (1500.80s)
=== CONT TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition (1346.04s)
=== CONT TestAccKubernetesClusterNodePool_dedicatedHost
--- PASS: TestAccKubernetesClusterNodePool_networkProfileComplete (947.44s)
=== CONT TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig
--- PASS: TestAccKubernetesClusterNodePool_nodeIPTags (885.03s)
=== CONT TestAccKubernetesClusterNodePool_other
--- PASS: TestAccKubernetesClusterNodePool_networkProfileUpdate (1429.04s)
=== CONT TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial
--- PASS: TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled (887.71s)
=== CONT TestAccKubernetesClusterNodePool_osSkuMigration
--- PASS: TestAccKubernetesClusterNodePool_workloadRuntime (1001.13s)
=== CONT TestAccKubernetesClusterNodePool_availabilityZones
--- PASS: TestAccKubernetesClusterNodePool_scaleDownMode (1254.85s)
=== CONT TestAccKubernetesClusterNodePool_capacityReservationGroup
--- PASS: TestAccKubernetesClusterNodePool_dedicatedHost (1020.72s)
=== CONT TestAccKubernetesClusterNodePool_nodePublicIP
--- PASS: TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig (1023.48s)
=== CONT TestAccKubernetesClusterNodePool_osDiskType
--- PASS: TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings (1257.96s)
=== CONT TestAccKubernetesClusterNodePool_nodeTaints
--- PASS: TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial (949.30s)
=== CONT TestAccKubernetesClusterNodePool_nodeLabels
--- PASS: TestAccKubernetesClusterNodePool_other (1109.40s)
=== CONT TestAccKubernetesClusterNodePool_proximityPlacementGroupId
--- PASS: TestAccKubernetesClusterNodePool_availabilityZones (929.24s)
=== CONT TestAccKubernetesClusterNodePool_podSubnet
--- PASS: TestAccKubernetesClusterNodePool_osSkuMigration (1342.22s)
=== CONT TestAccKubernetesClusterNodePool_zeroSize
--- PASS: TestAccKubernetesClusterNodePool_capacityReservationGroup (980.51s)
=== CONT TestAccKubernetesClusterNodePool_autoScaleUpdate
--- PASS: TestAccKubernetesClusterNodePool_nodePublicIP (946.35s)
=== CONT TestAccKubernetesClusterNodePool_osDiskSizeGB
--- PASS: TestAccKubernetesClusterNodePool_nodeTaints (1026.63s)
=== CONT TestAccKubernetesClusterNodePool_osSkuUbuntu
--- PASS: TestAccKubernetesClusterNodePool_osDiskType (1054.57s)
=== CONT TestAccKubernetesClusterNodePool_windows2019
--- PASS: TestAccKubernetesClusterNodePool_nodeLabels (866.63s)
=== CONT TestAccKubernetesClusterNodePool_ultraSSD
--- PASS: TestAccKubernetesClusterNodePool_podSubnet (1122.30s)
=== CONT TestAccKubernetesClusterNodePool_windowsAndLinux
--- PASS: TestAccKubernetesClusterNodePool_zeroSize (915.50s)
=== CONT TestAccKubernetesClusterNodePool_sameSize
--- PASS: TestAccKubernetesClusterNodePool_proximityPlacementGroupId (1222.90s)
=== CONT TestAccKubernetesClusterNodePool_windows2022
=== NAME TestAccKubernetesClusterNodePool_windowsAndLinux
testcase.go:173: Step 1/5 error: Pre-apply plan check(s) failed:
azurerm_kubernetes_cluster_node_pool.test - Resource not found in plan ResourceChanges
--- FAIL: TestAccKubernetesClusterNodePool_windowsAndLinux (109.76s)
=== CONT TestAccKubernetesClusterNodePool_maxSize
--- PASS: TestAccKubernetesClusterNodePool_osDiskSizeGB (933.40s)
=== CONT TestAccKubernetesClusterNodePool_hostEncryption
--- PASS: TestAccKubernetesClusterNodePool_osSkuUbuntu (935.00s)
=== CONT TestAccKubernetesClusterNodePool_modeUpdate
--- PASS: TestAccKubernetesClusterNodePool_windows2019 (1082.78s)
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet
--- PASS: TestAccKubernetesClusterNodePool_autoScaleUpdate (1567.26s)
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkManual
--- PASS: TestAccKubernetesClusterNodePool_ultraSSD (1245.12s)
=== CONT TestAccKubernetesClusterNodePool_windows
--- PASS: TestAccKubernetesClusterNodePool_hostEncryption (776.35s)
=== CONT TestAccKubernetesClusterNodePool_modeSystem
--- PASS: TestAccKubernetesClusterNodePool_sameSize (1048.09s)
--- PASS: TestAccKubernetesClusterNodePool_windows2022 (1096.77s)
--- PASS: TestAccKubernetesClusterNodePool_maxSize (1149.64s)
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet (933.70s)
--- PASS: TestAccKubernetesClusterNodePool_modeUpdate (1290.64s)
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkManual (1021.69s)
--- PASS: TestAccKubernetesClusterNodePool_windows (1030.36s)
--- PASS: TestAccKubernetesClusterNodePool_modeSystem (1041.87s)
FAIL
FAIL github.com/hashicorp/terraform-provider-azurerm/internal/services/containers 7394.253s
FAIL
make: *** [GNUmakefile:99: acctests] Error 1

@CorrenSoft
Copy link
Contributor Author

Note on test cases

TestAccKubernetesClusterNodePool_manualScaleVMSku and TestAccKubernetesClusterNodePool_ultraSSD tests were previously failing, stating that the resource couldn't be replaced. After adding a temporary name for the rotation, the tests are successful, but not sure if they are still relevant to keep after this change (plus, I saw another PR removing at least one of them), or perhaps rewrite them as a unique and consolidated test for this specific use case.

@CorrenSoft CorrenSoft changed the title azurerm_kubernetes_cluster_node_pool: Adds support for temporary_name_for_rotation` azurerm_kubernetes_cluster_node_pool: Adds support for temporary_name_for_rotation Oct 28, 2024
Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @CorrenSoft!

In addition to the comments and suggestions in-line, we also need to consider the behaviour when rotation of the node pool fails e.g. what happens if we fail to spin up the node pool with the new configuration and are left with the temporary node pool? It would be good if failures here are recoverable for the user.

The azurerm_kubernetes_cluster handles this by falling back on the temporary_name_for_rotation system node pool when we perform a read.

We then have special logic in the CustomizeDiff to prevent the node pool name from triggering a ForceNew and to allow the rotation logic to continue on from where it failed.

Test cases that simulate these failure scenarios should be added for this as well, I've linked two tests we wrote for the AKS resource below that can help you with those:

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempWithoutDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basicWithTempName(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool and delete the old default node pool to simulate the case where resizing fails when trying to bring up the new node pool
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
if err := client.DeleteThenPoll(ctx, defaultNodePoolId); err != nil {
return fmt.Errorf("deleting default %s: %+v", defaultNodePoolId, err)
}
return nil
}, data.ResourceName),
),
// the plan will show that the default node pool name has been set to "temp" and we're trying to set it back to "default"
ExpectNonEmptyPlan: true,
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})
}

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempAndDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basic(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool to simulate the case where both old default node pool and temp node pool exist
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
return nil
}, data.ResourceName),
),
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})

I hope that makes sense, let me know if you have any questions!

}

if d.HasChange("kubelet_config") {
if kubeletConfig := d.Get("kubelet_config").([]interface{}); len(kubeletConfig) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen here if a user had kubelet_config defined in their configuration and then removed it completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be updated.
Fixed!

}

if d.HasChange("linux_os_config") {
if linuxOSConfig := d.Get("linux_os_config").([]interface{}); len(linuxOSConfig) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, what happens if a user had linux_os_config defined in their configuration and then removed it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer ;)

@@ -723,10 +718,41 @@ func resourceKubernetesClusterNodePoolUpdate(d *pluginsdk.ResourceData, meta int
props.EnableAutoScaling = utils.Bool(enableAutoScaling)
}

if d.HasChange("fips_enabled") {
props.EnableFIPS = utils.Bool(d.Get("fips_enabled").(bool))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're using the utils.Bool/utils.String etc. elsewhere in this resource, but these functions are actually deprecated and can all be replaced by pointer.To. Could you update your changes to use pointer.To?

Suggested change
props.EnableFIPS = utils.Bool(d.Get("fips_enabled").(bool))
props.EnableFIPS = pointer.To(d.Get("fips_enabled").(bool))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied!

Comment on lines 842 to 851
var subnetID *commonids.SubnetId
if subnetIDValue, ok := d.GetOk("vnet_subnet_id"); ok {
subnetID, err = commonids.ParseSubnetID(subnetIDValue.(string))
if err != nil {
return err
}
if subnetID != nil {
props.VnetSubnetID = utils.String(subnetID.ID())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely assume that subnetID will not be nil is it has been parsed correctly, i.e. err = nil

Suggested change
var subnetID *commonids.SubnetId
if subnetIDValue, ok := d.GetOk("vnet_subnet_id"); ok {
subnetID, err = commonids.ParseSubnetID(subnetIDValue.(string))
if err != nil {
return err
}
if subnetID != nil {
props.VnetSubnetID = utils.String(subnetID.ID())
}
}
if subnetIDValue, ok := d.GetOk("vnet_subnet_id"); ok {
subnetID, err := commonids.ParseSubnetID(subnetIDValue.(string))
if err != nil {
return err
}
props.VnetSubnetID = pointer.To(subnetID.ID())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Thanks :)

@@ -798,6 +868,13 @@ func resourceKubernetesClusterNodePoolUpdate(d *pluginsdk.ResourceData, meta int
props.NetworkProfile = expandAgentPoolNetworkProfile(d.Get("node_network_profile").([]interface{}))
}

if d.HasChange("zones") {
zones := zones.ExpandUntyped(d.Get("zones").(*schema.Set).List())
if len(zones) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user chooses to not specify zones in their config anymore then this check would prevent that change from propagating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!
As a note in this property (and similar cases mentioned above), i replicated the implementation made on the default_node_pool scenario; not sure if it would have the same problem or the behaviour is different (for whatever reason), maybe that should be verified.

@stephybun
Copy link
Member

Hey @CorrenSoft we have some customers requesting this feature. Would you be able to let me know whether you're planning to work through the review feedback that was left? I'm happy to take this over to get it in if you don't find yourself with time/energy at the moment to get back to this, just checking before I step on your toes here 🙂

@CorrenSoft
Copy link
Contributor Author

Thanks for this PR @CorrenSoft!

In addition to the comments and suggestions in-line, we also need to consider the behaviour when rotation of the node pool fails e.g. what happens if we fail to spin up the node pool with the new configuration and are left with the temporary node pool? It would be good if failures here are recoverable for the user.

The azurerm_kubernetes_cluster handles this by falling back on the temporary_name_for_rotation system node pool when we perform a read.

We then have special logic in the CustomizeDiff to prevent the node pool name from triggering a ForceNew and to allow the rotation logic to continue on from where it failed.

Test cases that simulate these failure scenarios should be added for this as well, I've linked two tests we wrote for the AKS resource below that can help you with those:

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempWithoutDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basicWithTempName(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool and delete the old default node pool to simulate the case where resizing fails when trying to bring up the new node pool
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
if err := client.DeleteThenPoll(ctx, defaultNodePoolId); err != nil {
return fmt.Errorf("deleting default %s: %+v", defaultNodePoolId, err)
}
return nil
}, data.ResourceName),
),
// the plan will show that the default node pool name has been set to "temp" and we're trying to set it back to "default"
ExpectNonEmptyPlan: true,
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})
}

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempAndDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basic(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool to simulate the case where both old default node pool and temp node pool exist
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
return nil
}, data.ResourceName),
),
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})

I hope that makes sense, let me know if you have any questions!

I see the point. I will do further test to evaluate how is behaving and what can do about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for temporary_name_for_rotation for other nodepools
3 participants