[Bug]: Throttling errors after migrating services to `aws-sdk-go-v2` #34669

mlynch1985 · 2023-11-30T22:21:57Z

Terraform Core Version

1.6.4

AWS Provider Version

5.28.0

Affected Resource(s)

aws_controltower_control

Expected Behavior

The Terraform plan should complete the refresh process successfully without error and allow for the apply stage to execute.

Actual Behavior

The refresh was interrupted due to the throttling errors preventing the plan/apply from completing.

Relevant Error/Panic Output Snippet

Error: reading ControlTower Control (arn:aws:organizations::000000000000:ou/o-abcdefghijk/ou-abcd-efghijklmno,arn:aws:controltower:us-east-1::control/BKEEVLXJOIZI): operation error ControlTower: ListEnabledControls, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested
│
│   with module.ct_managed_controls.aws_controltower_control.vpc["BKEEVLXJOIZI/ou-abcd-efghijklmno"],
│   on modules\ct_managed_controls\main.tf line 122, in resource "aws_controltower_control" "api_gateway":
│  122: resource "aws_controltower_control" "api_gateway" {

Terraform Configuration Files

terraform {
  required_version = ">= 1.6.0, < 2.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region  = "us-east-1"
}

data "aws_region" "current" {}
data "aws_organizations_organization" "this" {}

data "aws_organizations_organizational_units" "level_one" {
  parent_id = data.aws_organizations_organization.this.roots[0].id
}

data "aws_organizations_organizational_units" "level_two" {
  for_each  = local.level_one_ous
  parent_id = each.value.id
}

#  ...  #

locals {
  level_one_ous = { for ou in data.aws_organizations_organizational_units.level_one.children : ou.name => ou }

  level_two_ous = merge([
    for parent_name, ou in data.aws_organizations_organizational_units.level_two :
    { for child in ou.children : "${parent_name}/${child.name}" => child }
  ]...)

  #  ...  #

  all_ous = merge(local.level_one_ous, local.level_two_ous, local.level_three_ous, local.level_four_ous, local.level_five_ous)
}

locals {
  api_gateway = {
    # [SH.APIGateway.1] API Gateway REST and WebSocket API execution logging should be enabled
    "OOTDCUSIKIZZ" = {
      "${local.all_ous["Deployments"].id}"    = local.all_ous["Deployments"].arn,
      "${local.all_ous["Infrastructure"].id}" = local.all_ous["Infrastructure"].arn,
      "${local.all_ous["Sandbox"].id}"        = local.all_ous["Sandbox"].arn,
      "${local.all_ous["Workloads"].id}"      = local.all_ous["Workloads"].arn
    }

    #  ...  #
  }

  #  ...  #
}

resource "aws_controltower_control" "api_gateway" {
  for_each = merge([for control, ou_map in local.api_gateway :
    { for ou_id, ou_arn in ou_map : "${control}/${ou_id}" => { "control" = control, "ou_arn" = ou_arn } }
  ]...)

  control_identifier = "arn:aws:controltower:${data.aws_region.current.name}::control/${each.value.control}"
  target_identifier  = each.value.ou_arn
}

Steps to Reproduce

Setup AWS Control Tower and copy the above code into main.tf. You will need to create the OU Structure and enable CT Controls to OU associations as it seems to throttle after the initial apply.

Debug Output

No response

Panic Output

No response

Important Factoids

After upgrading to AWS provider v5.28.0 and attempting to execute a plan/apply containing 10+ instances of the "aws_controltower_control" resource, we received throttling errors. When adding a constraint to the provider block to downgrade the AWS provider to <5.28.0 the issue is resolved. Alternatively we can pass in the -refresh=false switch to complete the apply successfully.

References

[Enhancement]: Migrate controltower service to aws-sdk-go-v2

Would you like to implement a fix?

None

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-30T22:22:09Z

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

RobbertDM · 2023-12-06T15:17:00Z

This seems broader than controltower. I also have this for Athena:

│ Error: reading Athena WorkGroup (...): operation error Athena: GetWorkGroup, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested

But also

│ Error: listing tags for Athena WorkGroup (arn:aws:athena:...): operation error Athena: ListTagsForResource, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested

There seems to be a related open issue at the aws-sdk-go-v2 repo:
aws/aws-sdk-go-v2#1665

I'm on aws provider 5.29.0

ewbankkit · 2024-01-16T13:36:29Z

Relates #34409.

terhirissa · 2024-01-22T07:23:13Z

We get similar error with GetInlinePolicyForPermissionSet.

Error: reading SSO Permission Set Inline Policy (...): operation error SSO Admin: GetInlinePolicyForPermissionSet, failed to get rate limit token, retry quota exceeded, 3 available, 5 requested

The error exists on aws provider version 5.29.0 and above.

neogibson · 2024-01-23T20:52:40Z

We are also seeing this with CodePipeline Webhook resources on any version above 5.31.0. If we pin our provider version to 5.31.0 it's fine but 5.32.1 and 5.33.0 result in plan failures:

Error: reading CodePipeline Webhook (arn:aws:codepipeline:ca-central-1::webhook:example): 
operation error CodePipeline: ListWebhooks, failed to get rate limit token, retry quota exceeded, 3
 available, 5 requested

ewbankkit · 2024-01-25T21:09:56Z

https://github.com/aws/aws-sdk-go-v2/blob/e6eb2ad83b1dad3e9ff7cb22914a5cb70db2c797/aws/retry/standard.go#L231-L246

shawnl-kb4 · 2024-01-30T20:50:38Z

Same: We cannot use this for managing controls due to a "ThrottlingException" resulting from making the API call to "ListEnabledControls".

I just got off the phone with AWS Control Tower folks, who suggested updating the retry logic. It would be great to see a fix for this.

ewbankkit · 2024-01-30T22:10:37Z

My thinking on this is to add new provider configuration attribute(s) that will customize the AWS SDK for Go v2 retryer
https://github.com/aws/aws-sdk-go-v2/blob/4fce0fdec6c41822255f4c3ec17aa46a9b6e2ac3/aws/retry/standard.go#L160-L171
in particular a RateLimiter with a configurable (different from the default of 500) token bucket size.

miguelaferreira · 2024-01-31T12:50:43Z

We are also facing crippling throttling on method ListTagsForResource for aws_config_config_rule resources.

ewbankkit · 2024-02-05T12:45:27Z

@mlynch1985 9at al.) Could you please try setting retry_mode = "adaptive" in your provider configuration and see if this helps?

neogibson · 2024-02-05T18:03:52Z

@ewbankkit Thanks for the suggestion, setting that on the provider did work in my case, a plan was generated without those rate limit errors. However, on one of our workspaces that consistently plans in ~3 minutes on provider version 5.31.0, this setting seems to have increased the plan time to around 9 minutes on the latest provider version 5.35.0.

ewbankkit · 2024-02-05T19:43:43Z

@neogibson Thanks for looking into this.
My guess is that we could fine time some of the options to get the behavior closer to AWS SDK for Go v1.
The maintainers have this on the agenda to discuss for this week's tech debt review.

mlynch1985 · 2024-02-12T21:34:00Z

@mlynch1985 9at al.) Could you please try setting retry_mode = "adaptive" in your provider configuration and see if this helps?

I tested with this option and unfortunately the error is still present.

Error: reading ControlTower Control (arn:aws:organizations::012345678912:ou/o-abcdefghij/ou-abcd-abcdefgh,arn:aws:controltower:us-west-2::control/PBGUIXCOFNGC): operation error ControlTower: ListEnabledControls, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested

ewbankkit · 2024-02-14T20:42:25Z

hashicorp/aws-sdk-go-base#918, incorporated into the Terraform AWS Provider via #35817 should address the failed to get rate limit token, retry quota exceeded errors.
As we have not been able to reproduce the throttling errors in our testing we cannot guarantee that all error cases have been dealt with, so I will leave this issue open for comments.
The fix will be available in Terraform AWS Provider v5.37.0, likely released tomorrow.

jwh-exerp · 2024-02-16T10:29:35Z

Unfortunately we are still seeing this issue even with AWS provider version v5.37.0, with our project which manages controls and their mappings across our organization.

Terraform configuration:

╰─ terraform version
Terraform v1.6.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v5.37.0
+ provider registry.terraform.io/hashicorp/local v2.4.1

terraform plan

...

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: reading ControlTower Control (arn:aws:organizations::1234567890123:ou/o-fsdiovxxxx/ou-o2bx-xxxxxxxxx,arn:aws:controltower:eu-central-1::control/AWS-GR_EC2_VOLUME_INUSE_CHECK): operation error ControlTower: ListEnabledControls, failed to get rate limit token, retry quota exceeded, 2 available, 5 requested
│ 
│   with aws_controltower_control.detective["Build_AWS-GR_EC2_VOLUME_INUSE_CHECK"],
│   on main.tf line 66, in resource "aws_controltower_control" "detective":
│   66: resource "aws_controltower_control" "detective" {
│ 
╵
╷
│ Error: reading ControlTower Control (arn:aws:organizations::1234567890123:ou/o-fsdiovxxxx/ou-o2bx-xxxxxxxxx,arn:aws:controltower:eu-central-1::control/AWS-GR_RDS_STORAGE_ENCRYPTED): operation error ControlTower: ListEnabledControls, failed to get rate limit token, retry quota exceeded, 4 available, 5 requested
│ 
│   with aws_controltower_control.detective["Workloads/Shared_AWS-GR_RDS_STORAGE_ENCRYPTED"],
│   on main.tf line 66, in resource "aws_controltower_control" "detective":
│   66: resource "aws_controltower_control" "detective" {
│ 
╵
╷
│ Error: reading ControlTower Control (arn:aws:organizations::1234567890123:ou/o-fsdiovxxxx/ou-o2bx-xxxxxxxxx,arn:aws:controltower:eu-central-1::control/AWS-GR_DETECT_CLOUDTRAIL_ENABLED_ON_MEMBER_ACCOUNTS): operation error ControlTower: ListEnabledControls, failed to get rate limit token, retry quota exceeded, 4 available, 5 requested
│ 
│   with aws_controltower_control.detective["DR_AWS-GR_DETECT_CLOUDTRAIL_ENABLED_ON_MEMBER_ACCOUNTS"],
│   on main.tf line 66, in resource "aws_controltower_control" "detective":
│   66: resource "aws_controltower_control" "detective" {
│ 

... and many more similar quote exceeded examples

In an earlier comment: they found v5.31.0 didn't have this issue. It does for us and our project. We are pinned on v5.26.0 until a solution can be found.

mlynch1985 · 2024-02-16T14:10:42Z

I retested today with TF v.1.7.3 and AWS Provider v5.37.0 but still encountered the same errors. Reverting back to v.5.27.0 continues to be the work around.

sixdaysandy · 2024-02-19T10:25:54Z

I retested with the 5.37.0 update today after experiencing errors with the 5.36.0 provider, reverted back to the 5.35.0 provider as that throws no errors.
We're seeing it in CloudWatch: ListTagsForResource & CloudWatch: DescribeAlarms but only on very large states.

kieran-lowe · 2024-02-19T12:06:29Z

Yeah we're also experiencing this for CodeBuild.

Edit: pinning to 5.27.0 as suggested by @mlynch1985 worked for us. Will test with setting retry_mode.

dthvt · 2024-02-19T20:00:37Z

FYI, I had some luck changing the provider configuration to include retry_mode = "adaptive" after the update to SDK v2. This resolved the throttling issues I was encountering w/ the Workspaces API.

ewbankkit · 2024-02-20T19:40:03Z

For the next pass at a solution, we will add the ability to be able to configure the token bucket capacity for the retry throttling rate limiter (e.g. aws/aws-sdk-go-v2#1665 (comment)). This configured value will be used to initialize the capacity of every API client's token bucket.

ewbankkit · 2024-02-22T22:24:22Z

With the very soon to be released v5.38.0 of the Terraform AWS provider we have added a new provider-level configuration parameter token_bucket_rate_limiter_capacity:

provider "aws" {
  token_bucket_rate_limiter_capacity = 5000
}

which allows the capacity of the rate limiter token bucket to be set.
The default is 500 tokens, so if you are experiencing throttling errors then please configure a larger value.

mlynch1985 · 2024-02-23T20:42:06Z

I test with the above suggested 5000 and still encountered the error. What is the downside to increasing this value? I don't want to set a ridiculously high number without understanding the potential risks. If it helps, I can setup a code dump so you can test the same code as me.

ewbankkit · 2024-02-26T16:33:10Z

@mlynch1985 There are no additional resource consumed by increasing the value.

mlynch1985 · 2024-02-28T22:33:25Z

@ewbankkit I had to set my provider to 50,000 before it worked, however I was able to complete the plan/apply with this update. I will close this issue now. Thank you!

richgreen-moj · 2024-03-15T10:07:31Z

We are also facing crippling throttling on method ListTagsForResource for aws_config_config_rule resources.

We had issues with this over the last few weeks but today it has started to work again and seems to coincide with the update of provider to v5.41.0

Last provider it worked with was v5.38.0 , since then I've been trying some of the suggested workarounds e.g. retry_mode to adaptive and token_bucket_rate_limiter_capacity to a very large number but neither helped. We'll keep an eye on it.

dandelo · 2024-03-28T16:52:03Z

Fixed for us in v5.42.0, specifically looks like this fix:

provider: Change the default AWS SDK for Go v2 API client RateLimiter to ratelimit.None so that services migrated to AWS SDK for Go v2 maintain behavioral compatibility with AWS SDK for Go v1 (#36467)

AbAvramidis · 2024-04-04T10:34:57Z

We still facing some issues related to this, we noticed a strange behavior where the TF plan during the refreshing state of several resources just freezing and halts, time out after 40mins and the state is locked.
Anyone faces something similar even with the latest version?
We notice this behavior on any version higher than 5.32.

neogibson · 2024-04-10T17:02:47Z

Thanks for fixing this!

ewbankkit · 2024-04-10T18:22:17Z

@AbAvramidis Do you know which services are exhibiting this behavior?

github-actions · 2024-05-11T02:03:02Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

mlynch1985 added the bug Addresses a defect in current functionality. label Nov 30, 2023

github-actions bot added service/controltower Issues and PRs that pertain to the controltower service. service/organizations Issues and PRs that pertain to the organizations service. labels Nov 30, 2023

terraform-aws-provider bot added the needs-triage Waiting for first response or review from a maintainer. label Nov 30, 2023

ewbankkit removed service/organizations Issues and PRs that pertain to the organizations service. needs-triage Waiting for first response or review from a maintainer. labels Jan 16, 2024

ewbankkit changed the title ~~[Bug]: Throttling error after migrating controltower service to aws-sdk-go-v2~~ [Bug]: Throttling errors after migrating services to aws-sdk-go-v2 Feb 12, 2024

github-actions bot added the service/organizations Issues and PRs that pertain to the organizations service. label Feb 12, 2024

This was referenced Feb 21, 2024

Add ability to configure aws-sdk-go-v2 token bucket rate limiter capacity hashicorp/aws-sdk-go-base#932

Closed

Add provider token_bucket_rate_limiter_capacity parameter #35926

Merged

mlynch1985 closed this as completed Feb 28, 2024

This was referenced Feb 29, 2024

Investigate AWS SDK for Go v2 default retry configuration #36024

Closed

[Bug]: DescribeOrganizationConfigRules, failed to get rate limit token, retry quota exceeded #36094

Closed

richgreen-moj mentioned this issue Mar 15, 2024

Bug: Error listing tags for Config Rule in Secure-baselines ministryofjustice/modernisation-platform#6486

Closed

ewbankkit mentioned this issue Apr 10, 2024

MAX_BACKOFF changed between AWS SDK for Go v1 and v2 #36837

Closed

github-actions bot locked as resolved and limited conversation to collaborators May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Throttling errors after migrating services to `aws-sdk-go-v2` #34669

[Bug]: Throttling errors after migrating services to `aws-sdk-go-v2` #34669

mlynch1985 commented Nov 30, 2023

github-actions bot commented Nov 30, 2023

RobbertDM commented Dec 6, 2023 •

edited

Loading

ewbankkit commented Jan 16, 2024

terhirissa commented Jan 22, 2024

neogibson commented Jan 23, 2024 •

edited

Loading

ewbankkit commented Jan 25, 2024

shawnl-kb4 commented Jan 30, 2024

ewbankkit commented Jan 30, 2024

miguelaferreira commented Jan 31, 2024

ewbankkit commented Feb 5, 2024

neogibson commented Feb 5, 2024 •

edited

Loading

ewbankkit commented Feb 5, 2024

mlynch1985 commented Feb 12, 2024

ewbankkit commented Feb 14, 2024

jwh-exerp commented Feb 16, 2024

mlynch1985 commented Feb 16, 2024

sixdaysandy commented Feb 19, 2024

kieran-lowe commented Feb 19, 2024 •

edited

Loading

dthvt commented Feb 19, 2024

ewbankkit commented Feb 20, 2024

ewbankkit commented Feb 22, 2024

mlynch1985 commented Feb 23, 2024

ewbankkit commented Feb 26, 2024

mlynch1985 commented Feb 28, 2024

richgreen-moj commented Mar 15, 2024

dandelo commented Mar 28, 2024

AbAvramidis commented Apr 4, 2024

neogibson commented Apr 10, 2024

ewbankkit commented Apr 10, 2024

github-actions bot commented May 11, 2024

[Bug]: Throttling errors after migrating services to aws-sdk-go-v2 #34669

[Bug]: Throttling errors after migrating services to aws-sdk-go-v2 #34669

Comments

mlynch1985 commented Nov 30, 2023

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

github-actions bot commented Nov 30, 2023

Community Note

RobbertDM commented Dec 6, 2023 • edited Loading

ewbankkit commented Jan 16, 2024

terhirissa commented Jan 22, 2024

neogibson commented Jan 23, 2024 • edited Loading

ewbankkit commented Jan 25, 2024

shawnl-kb4 commented Jan 30, 2024

ewbankkit commented Jan 30, 2024

miguelaferreira commented Jan 31, 2024

ewbankkit commented Feb 5, 2024

neogibson commented Feb 5, 2024 • edited Loading

ewbankkit commented Feb 5, 2024

mlynch1985 commented Feb 12, 2024

ewbankkit commented Feb 14, 2024

jwh-exerp commented Feb 16, 2024

mlynch1985 commented Feb 16, 2024

sixdaysandy commented Feb 19, 2024

kieran-lowe commented Feb 19, 2024 • edited Loading

dthvt commented Feb 19, 2024

ewbankkit commented Feb 20, 2024

ewbankkit commented Feb 22, 2024

mlynch1985 commented Feb 23, 2024

ewbankkit commented Feb 26, 2024

mlynch1985 commented Feb 28, 2024

richgreen-moj commented Mar 15, 2024

dandelo commented Mar 28, 2024

AbAvramidis commented Apr 4, 2024

neogibson commented Apr 10, 2024

ewbankkit commented Apr 10, 2024

github-actions bot commented May 11, 2024

[Bug]: Throttling errors after migrating services to `aws-sdk-go-v2` #34669

[Bug]: Throttling errors after migrating services to `aws-sdk-go-v2` #34669

RobbertDM commented Dec 6, 2023 •

edited

Loading

neogibson commented Jan 23, 2024 •

edited

Loading

neogibson commented Feb 5, 2024 •

edited

Loading

kieran-lowe commented Feb 19, 2024 •

edited

Loading