Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable Connect do SQS if using a VPC #1900

Closed
victorsantosdevops opened this issue Mar 7, 2019 · 22 comments
Closed

Unable Connect do SQS if using a VPC #1900

victorsantosdevops opened this issue Mar 7, 2019 · 22 comments
Labels
breaking-change Issue requires a breaking change to remediate. bug This issue is a confirmed bug. p2 This is a standard priority issue sqs

Comments

@victorsantosdevops
Copy link

victorsantosdevops commented Mar 7, 2019

when i try send sqs message from lambda in a VPC, i get timeout. I tryed use the VPC Link, but dont work.
{
"errorMessage": "2019-03-07T13:45:11.739Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 Task timed out after 15.01 seconds"
}

SG Outbound ALL Open and NACL too.
I already create the VPC Link.

Function Logs
[INFO] 2019-03-07T13:44:56.744Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 Start with Hash: 1111114502ff8532d063b9d988e2406a
[INFO] 2019-03-07T13:44:56.744Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 msgData: {'msgBody': 'Howdy @ 2019-03-07 13:44:56', 'msgAttributes': {'hash': {'StringValue': '1111114502ff8532d063b9d988e2406a', 'DataType': 'String'}}}
2019-03-07 13:45:11.739 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 ask timed out after 15.01 secondsundefined

If i remove the VPC all work fine...
But i need this fuction working inside a VPC
anyone help me please T_T

@SteveByerly
Copy link

I'm having the same problem. I can access KMS and SSM properly, just not SQS

@SteveByerly
Copy link

I finally figured this out.

In order for the routes to work properly, you need to use a specific URL for the api calls as noted in the docs. The SQS metadata hasn't been updated in a long time and so it does not have this updated URL scheme.

The solution was not clear to me originally since the argument for the send_message method uses a URL - which I verified was in the proper format. The URL in question is the one where the API call is sent to - the queue URL is just part of the API call's params.

So the fix is to override the endpoint_url when making your client/resource.

session = boto3.Session()

sqs_client = session.client(
    service_name='sqs',
    endpoint_url='https://sqs.us-east-1.amazonaws.com',
)

sqs_client.send_message(
    QueueUrl='https://sqs.us-east-1.amazonaws.com/...',
    MessageBody=json.dumps('my payload'),
)

@JordonPhillips JordonPhillips added bug This issue is a confirmed bug. breaking-change Issue requires a breaking change to remediate. labels Mar 11, 2019
@JordonPhillips
Copy link
Contributor

So the reason we use the alternate endpoint style is to support Python 2.6 as it does not support SNI, which is required for the new endpoints. We would need to drop support for python 2.6-2.7.8. Even then it would still be a breaking change because people have whitelists for particular urls, so changing what we use would break them.

One possibility in the short term is to add a configuration setting to switch over to the new endpoints.

@SteveByerly
Copy link

That makes sense. I don't necessarily think configuration would be better since the user would still need to know about the configuration options.

A warning in the docs would be a good start, perhaps at the top of the page and each relevant section. I looked at the docs several times for a clue when I was working through this - that would have likely resolved it quickly.

Another idea would be to log warnings if the user is on py2.7.8+, is using a new-style URL for the queue_url, and has not set the endpoint_url.

Thanks for following up!

@dt-kylecrayne
Copy link

Any updates or plans for tackling this issue? We're stuck on older versions of boto3 so we can work with SQS inside our VPCs.

@michaelwills
Copy link

@SteveByerly thanks much for #1900 (comment)

And I think a warning in the docs/logs would be good.

@Jon-AtAWS
Copy link

@SteveByerly, you're my hero.

Second that. The docs absolutely do not cover this (seems to apply to sqs only) and I burned 8 hours trying to figure it out.

@oleksii-donoha
Copy link

I want to add to the observation, it seems like it's not even consistent across regions. I had same code with same setup working in one region, but failing in another, sending me to investigate networking problems.

Overriding endpoint URL works in both regions, but default sqs_client = boto3.client('sqs') only in one. Real head scratcher imma tell you.

@christophevg
Copy link

christophevg commented Jan 14, 2020

The proposed solution with the additional endpoint_url doesn't seem to solve the problem in our case. Just to be sure, it is the same hostname as the queue url, without the path, etc?
So given QueueUrl: https://sqs.eu-central-1.amazonaws.com/1234567/queue-name the endpoint_url would be https://sqs.eu-central-1.amazonaws.com?

@christophevg
Copy link

To avoid confusion a quick follow-up: our problem was related to the lambda not having access rights to the public SQS endpoint. After fixing that, simply using sqs_client = boto3.client('sqs') worked as expected.

@marianobrc
Copy link

Any updates on this one?
I'm trying to run SQS and celery in AWS with a VPC Endpoint (no NAT gateways). Celery initializes the boto3 client with default parameters, and it's not possible to modify the boto3 client initialization code to set the endpoint_url parameter to the right url.
I checked that sending a message directly with boto3 and setting endpoint_url works, but with celery the connection times out cause it tries to connect using the default (legacy) endpoint which is not supported with VPC endpoints.
AWS ref: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-sending-messages-from-vpc.html

@marianobrc
Copy link

@dt-kylecrayne I'm having the same issue, which boto3 version is working for you with SQS inside your VPCs?
Thanks

@marianobrc
Copy link

I found the following workaround overriding boto settings in endpoints.json:

  1. Copy .venv/lib/python3.8/site-packages/botocore/data/endpoints.json to a known path inside a directory/ (your path may be different depending on where is boto intalled)
  2. Edit the file and replace any reference to "queue.{dnsSuffix}" with "sqs.{region}.{dnsSuffix}". This will modify the endpoint url format.
  3. Also edit "protocols" : [ "http", "https" ] removing "http". SQS VPC endpoints only work through https.
  4. Set the env var AWS_DATA_PATH=/directory/conaining/your/file/ to tell boto to get settings from there first.

I hope this helps someone else until this gets fixed

@joseph-wortmann
Copy link

This would be quite simple to fix within botocore. The offending line is 467 in client.py. A simple check for python version or for ssl.HAS_SNI to choose either the sslCommonName or the hostname should do it. Currently this line simply chooses sslCommonName if it exists, and hostname otherwise. For SQS and a couple of other services, the sslCommonName always exists in current botocore.

Until this gets fixed (as I said, should be simple), I've created a microlibrary that implements a variation of the solution that @marianobrc indicated directly above. You can find this here - https://pypi.org/project/awsserviceendpoints/

@willronchetti
Copy link

Any updates on a fix for this?

@kapilt
Copy link

kapilt commented Nov 17, 2021

this also results in mismatch data between the cli and boto api usage, as the cli for some reason knows how to use the correct endpoint (sqs.region) but the boto api usage doesn't and has the legacy region. when querying queue url the service returns it based on the accessed host, so now we have data inconsistencies as well because of this.

❯ aws sqs list-queues
{
    "QueueUrls": [
        "https://sqs.us-east-2.amazonaws.com/123456785098/assetdb-ftest-cvKP",
        "https://sqs.us-east-2.amazonaws.com/123456785098/dev_policy_deploys",
        "https://sqs.us-east-2.amazonaws.com/123456785098/dev_policy_deploys_dlq",
        "https://sqs.us-east-2.amazonaws.com/123456785098/local-assetdb",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test2",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test3",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test4",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test5"
    ]
}

❯ python
Python 3.10.0 (default, Oct  5 2021, 06:12:41) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
>>> import pprint
>>> pprint.pprint(boto3.client('sqs').list_queues())
{'QueueUrls': ['https://us-east-2.queue.amazonaws.com/123456785098/assetdb-ftest-cvKP',
               'https://us-east-2.queue.amazonaws.com/123456785098/dev_policy_deploys',
               'https://us-east-2.queue.amazonaws.com/123456785098/dev_policy_deploys_dlq',
               'https://us-east-2.queue.amazonaws.com/123456785098/local-assetdb',
               'https://us-east-2.queue.amazonaws.com/123456785098/test',
               'https://us-east-2.queue.amazonaws.com/123456785098/test2',
               'https://us-east-2.queue.amazonaws.com/123456785098/test3',
               'https://us-east-2.queue.amazonaws.com/123456785098/test4',
               'https://us-east-2.queue.amazonaws.com/123456785098/test5'],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '989',
                                      'content-type': 'text/xml',
                                      'date': 'Thu, 18 Nov 2021 13:04:54 GMT',
                                      'x-amzn-requestid': '554b37b9-02bd-5e12-ad5a-6da9530bfb45'},
                      'HTTPStatusCode': 200,
                      'RequestId': '554b37b9-02bd-5e12-ad5a-6da9530bfb45',
                      'RetryAttempts': 0}}

it feels like madness to me that the sdk is forcing all its users to work around it.

is there a default sane configuration without having to manually pass in endpoint, ie. how is the awscli doing the right thing?

can we get a environment flag similiar to sts regional endpoints?

@AbdulBasitKhaleeq
Copy link

Resolved the issue by putting lambda function in private subnet and allowing internet access using NAT gateway.

VPC -> create private subnets -> create NAT Gateway in public subnet -> attach private subnets to NAT Gateway -> lambda configuration update VPC setting.

session = boto3.Session(region_name="ca-central-1")
sqs = session.client(service_name='sqs',
endpoint_url='https://sqs.ca-central-1.amazonaws.com')

@aBurmeseDev aBurmeseDev added the p2 This is a standard priority issue label Nov 8, 2022
@sejr1996
Copy link

I have had a lambda function sending messages to a sqs queue configured with a vpc, it has been working normally for several months, but now out of nowhere no messages are sent and the function times out. The Lambda function is in a private subnet.

@sejr1996
Copy link

Change the security group ingress rules to allow all traffic, that works.
Previously the configurations allowed access through port 22 and 2049, which port should be added for the correct functioning of the sqs queues?

@dfloresxyon
Copy link

dfloresxyon commented Dec 21, 2023

Change the security group ingress rules to allow all traffic, that works. Previously the configurations allowed access through port 22 and 2049, which port should be added for the correct functioning of the sqs queues?

Same thing happened to me. Lambda running with the VPC set up, there is a endpoint created so the resources within the VPN can access SQS endpoints. All working fine for years. Suddenly lambdas started to timeout and couldn't resolve SQS endoints. Opened the doors as @sejr1996 mentioned as a last resort and it worked for now.

@tim-finnigan
Copy link
Contributor

This issue has been addressed — you can test by running:

import boto3
session = boto3.Session()
boto3.set_stream_logger('')

sqs_client = session.client(
    service_name='sqs',
    region_name='us-east-1'
)

response = sqs_client.list_queues()
print(response)

And see in the logs that it resolves to the correct:

Endpoint provider result: https://sqs.us-east-1.amazonaws.com

Please update to a newer version of Boto3 for access to the latest functionality. The most recent version is 1.34.125 per the CHANGELOG. And note that Python 3.8+ is required.

SQS endpoints for reference: https://docs.aws.amazon.com/general/latest/gr/sqs-service.html. If you want to use a custom or legacy endpoint you could set the service-specific endpoint AWS_ENDPOINT_URL_SQS to the value you need.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Issue requires a breaking change to remediate. bug This issue is a confirmed bug. p2 This is a standard priority issue sqs
Projects
None yet
Development

No branches or pull requests