Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 backend doesn't respect region argument for DynamoDB anymore #36129

Closed
mwacc2 opened this issue Nov 28, 2024 · 7 comments
Closed

S3 backend doesn't respect region argument for DynamoDB anymore #36129

mwacc2 opened this issue Nov 28, 2024 · 7 comments
Labels
backend/s3 bug new new issue not yet triaged waiting-response An issue/pull request is waiting for a response from the community

Comments

@mwacc2
Copy link

mwacc2 commented Nov 28, 2024

Terraform Version

Terraform v1.10.0
on darwin_amd64
...
+ provider registry.terraform.io/hashicorp/aws v5.78.0

Terraform Configuration Files

  # NOTE: environment variable AWS_REGION = "REGION-2"

  # Terraform will store infrastructure state using the following AWS resources
  backend "s3" {
    # This is NOT the region where resources are created.
    # This is the region where information about created resources is stored.
    region = "REGION-1"

    bucket       = "BUCKET"
    key          = "PREFIX"
    encrypt      = true
    use_lockfile = true # experimental feature of 1.10, uses both DynamoDB and S3 conditional write on this file

    dynamodb_table = "TerraformLock"
    endpoints = {
      # DynamoDB doesn't respect the region parameter above anymore. For REGION-2-based infra it tries to use REGION-2
      # DynamoDB instead of REGION-1's. So we force through explicit endpoint.
      # (uncomment the next line to fix the issue)
      # dynamodb = "https://dynamodb.REGION-1.amazonaws.com/"
    }
  }

Debug Output

Acquiring state lock. This may take a few moments...

│ Error: Error acquiring the state lock

│ Error message: operation error DynamoDB: PutItem, failed to resolve service endpoint, endpoint rule error, Credentials-sourced account ID parameter is invalid
│ Unable to retrieve item from DynamoDB table "TerraformLock": operation error DynamoDB: GetItem, failed to resolve service endpoint, endpoint rule error, Credentials-sourced account ID parameter is invalid

│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.

Expected Behavior

When environment variable AWS_REGION = "REGION-2"

The S3 backend should have tried to reach DynamoDB in region REGION-1 as defined by the region argument.

Actual Behavior

When environment variable AWS_REGION = "REGION-2"

The S3 backend seems to respect the AWS_REGION environment variable instead of the region argument, and attempts to read the DynamoDB table in REGION-2.

Steps to Reproduce

  1. terraform init -reconfigure
  2. AWS_REGION="REGION-2" terraform apply

Additional Context

Using the explicit endpoint argument with the appropriate (but repeated) REGION-1 endpoint URL will fix this issue as commented in the source above.

But according to documentation this should not be necessary:

region - (Required) AWS Region of the S3 Bucket and DynamoDB Table (if used). This can also be sourced from the AWS_DEFAULT_REGION and AWS_REGION environment variables.

References

No response

@mwacc2 mwacc2 added bug new new issue not yet triaged labels Nov 28, 2024
@mwacc2 mwacc2 changed the title S3 backend doesn't respect region attribute for DynamoDB anymore S3 backend doesn't respect region argument for DynamoDB anymore Nov 28, 2024
@bschaatsbergen
Copy link
Member

Hey @mwacc2,

Thank you for reporting this! The S3 backend is managed by the AWS Provider team at HashiCorp, and this issue has been added to their triage queue. Thanks again!

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Nov 28, 2024

Hi @mwacc2,

Thanks again for filing this issue. Unfortunately I wasn’t able to replicate the problem on my end. Could you verify if my attempt is correct? I’ve set the AWS_REGION environment variable to "us-east-1" and used the following configuration:

terraform {
  backend "s3" {
    region         = "us-west-2"
    bucket         = "somebucket" # this is a resource in us-west-2
    key            = "36129.tfstate"
    encrypt        = true
    use_lockfile   = true
    dynamodb_table = "sometable" # this is a resource in us-west-2
  }
}

Here's an attached debug log: https://gist.github.com/bschaatsbergen/5169e72bde3ace9488fd3bb6bc748305—Let me know if this matches your setup or if there’s anything different on your end. Thanks!

@bschaatsbergen bschaatsbergen added the waiting-response An issue/pull request is waiting for a response from the community label Nov 28, 2024
@mwacc2
Copy link
Author

mwacc2 commented Nov 29, 2024

Hi,

I was able to reproduce, both right away and after deleting the .terraform local directory/lock file and re-initialising. I was also able to fix it again by uncommenting the custom endpoint.

Maybe a factor is that I use aws-vault to assume a role (with MFA enforced). So the actual commands looks like this:

# Required after changing the backend configuration (ie uncommenting custom DynamoDB endpoint)
aws-vault exec Profile --  terraform init -reconfigure
aws-vault exec Profile -- terraform apply -var-file=some-var-file.tfvars

AWS Vault sets the following environment variables:

AWS_REGION=REGION-2
AWS_DEFAULT_REGION=REGION-2
AWS_ACCESS_KEY_ID=***
AWS_SECRET_ACCESS_KEY=***
AWS_SESSION_TOKEN=***

When the custom DynamoDB endpoint argument is commented, and using DEBUG logs, I get the following:

[WARN]  backend-s3: failed to fetch state MD5: tf_backend.operation=Get tf_backend.req_id=UUID tf_backend.s3.bucket=BUCKET tf_backend.s3.path=KEY error="Unable to retrieve item from DynamoDB table \"TerraformLock\": operation error DynamoDB: GetItem, failed to resolve service endpoint, endpoint rule error, Credentials-sourced account ID parameter is invalid"

Note that according to Terraform logs there are no HTTP request sent to DynamoDB (unlike seen for S3 calls or in the Gist you pasted). I did not find in my logs entries like in yours (backend-s3: HTTP Request Sent: aws.dynamodb.table_names ... db.system=dynamodb). Looking for different parts of that entry in my logs yields no results.

Thus the WARN entry above does not seem to be the consequence of an API forbidden/unauthorised/not found response (unless HTTP requests/responses are not logged in some failure cases). So IMHO I suspect that the DynamoDB client doesn't figure out its configuration and thus doesn't even get to sending an API call.

After uncommenting the custom DynamoDB endpoint and reconfiguring the backend with aws-vault ... terraform init -reconfigure as described above, the Terraform plan/apply command is successful, and then I can indeed find log entries for HTTP requests/responses to DynamoDB (here at unlocking):

[DEBUG] backend-s3: HTTP Request Sent: aws.dynamodb.table_names=["TerraformLock"] aws.region=REGION-1 db.system=dynamodb rpc.method=DeleteItem ...
[DEBUG] backend-s3: HTTP Response Received: aws.dynamodb.table_names=["TerraformLock"] aws.region=REGION-1  db.system=dynamodb rpc.method=DeleteItem rpc.service=DynamoDB ...  http.status_code=200

I tried to use the assume_role argument instead of AWS Vault but this doesn't seem to be compatible with an MFA-enforced assume role/trust policy.

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Dec 1, 2024

Hey @mwacc2,

Thank you for sharing more details about the environment and tooling used to fetch and inject dynamic credentials. I believe the issue comes from how one of the underlying libraries is sourcing and setting the Account ID from the provided credentials or profile. Leading to an error when the DynamoDB client is instantiated and attempts to resolve its endpoint config.

Specifically, the error is triggered when the AccountId parameter is not a valid host label in a non-FIPS, non-DualStack AWS partition configuration. See: https://github.com/aws/aws-sdk-go-v2/blob/main/service/dynamodb/endpoints.go#L423-L435

Could you confirm if you’re only experiencing this issue on Terraform 1.10.0? And, are you aware if there's an Account ID in the profile configuration you’re targeting with aws-vault? Thanks!

@mwacc2
Copy link
Author

mwacc2 commented Dec 2, 2024

Could you confirm if you’re only experiencing this issue on Terraform 1.10.0?

I have no issues with Terraform v1.9.8.

(in that working production configuration, the backend block does not declare argument use_lockfile as it's new, and there are no endpoints customisation as it's not needed)

And, are you aware if there's an Account ID in the profile configuration you’re targeting with aws-vault?

In the AWS profile file I use with aws-vault, the account ID is only there as part of the assumed role ARN and the MFA device ARN (properties mfa_serial and role_arn).

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Dec 6, 2024

Hey @mwacc2,

A change we inherited in Terraform 1.10:

This release contains an upstream AWS SDK for Go v2 change to DynamoDB service endpoints. We now connect to a DynamoDB endpoint in the format (account-id).ddb.(region).amazonaws.com instead of dynamodb.(region).amazonaws.com. If your network configuration blocks outgoing traffic to DynamoDB based on DNS names or endpoint URLs, you must adjust your configuration, because the service’s DNS name will change. You may instead disable account-based endpoints for DynamoDB by setting account_id_endpoint_mode = disabled in a shared config file or setting the AWS_ACCOUNT_ID_ENDPOINT_MODE environment variable to disabled (#39505)

Perhaps another approach to troubleshoot is setting AWS_ACCOUNT_ID_ENDPOINT_MODE=disabled? It seems the account ID is now required for endpoint resolution, whereas it wasn’t before. Alternatively, you could set AWS_ACCOUNT_ID to your account ID, which should allow account-based endpoint resolution to function as expected. (which might be a better first option given it’s the AWS recommendation for the best performance and scalability).

I still suspect that the underlying AWS library responsible for sourcing and setting the AccountID is retrieving an incompatible value for the account ID, which is triggering the error. Something that was previously compatible with how endpoints were constructed, but now conflicts with the way the AWS library builds the endpoint for DynamoDB.

Could you try either of these options and share your findings to help with troubleshooting?

@mwacc2
Copy link
Author

mwacc2 commented Dec 9, 2024

could set AWS_ACCOUNT_ID to your account ID

So I ran the following:

AWS_ACCOUNT_ID="MY_ACCOUNT_ID" aws-vault exec MYPROFILE --  terraform init -upgrade -reconfigure
AWS_ACCOUNT_ID="MY_ACCOUNT_ID" aws-vault exec MYPROFILE --  terraform apply -var-file=...

And the lock was successfully locked and released 🥳
Without the account ID environment variable I was able to reproduce again.

I also tried with AWS_ACCOUNT_ID_ENDPOINT_MODE=disabled instead of the account ID, and it also works.

So I guess it's not a bug, and I need to squint my eyes harder while reading the CHANGELOGs 😳

Although the important information was perhaps a bit hidden/not obvious? Because the linked changelog entry is part of the AWS provider repo, and not part of the Tfstate backend docs/main Terraform repo changelog? How does the backend and provider relate?

Thanks for your help @bschaatsbergen 🙏

@mwacc2 mwacc2 closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend/s3 bug new new issue not yet triaged waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests

2 participants