Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when pulling image from private ECR #303

Closed
riupie opened this issue Mar 26, 2024 · 10 comments
Closed

Error when pulling image from private ECR #303

riupie opened this issue Mar 26, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@riupie
Copy link

riupie commented Mar 26, 2024

Hi, I have some deployment that use private ECR. I use service account for credential. I read that kuik already support it on version 1.5.0 and right now I use 1.7.1 but somehow I am still getting error.

➜  ~ k get cachedimages                                                                                                                                                                                                              (trident-staging/default)
NAME                                                                                                              CACHED   RETAIN   EXPIRES AT             PODS COUNT   AGE                                                                    1            13m
xxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl                                                             1            3h9m

Error log from controller:

2024-03-26T07:30:07.285Z        ERROR   failed to cache image   {"controller": "cachedimage", "controllerGroup": "kuik.enix.io", "controllerKind": "CachedImage", "CachedImage": {"name":"xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl"}, "namespace": "", "name": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl", "reconcileID": "d9432bcf-f245-4bdb-b942-8ffe8a2e872e", "sourceImage": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/crunchydata/crunchy-postgres:ubi8-15.5-0-tsl", "error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n", "errorCauses": [{"error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n"}]}
2024-03-26T07:30:07.286Z        ERROR   Reconciler error        {"controller": "cachedimage", "controllerGroup": "kuik.enix.io", "controllerKind": "CachedImage", "CachedImage": {"name":"xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl"}, "namespace": "", "name": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl", "reconcileID": "d9432bcf-f245-4bdb-b942-8ffe8a2e872e", "error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n", "errorCauses": [{"error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n"}]}

Do I need to add some credential somewhere?

@Nicolasgouze Nicolasgouze added the bug Something isn't working label Mar 29, 2024
@Nicolasgouze
Copy link
Contributor

Hi @riupie,

Where is deployed your k8s cluster : is it an EKS one ? Something else ?
* If EKS, the credentials should be managed automatically without any manipulation on our side
* If not, you should set the needed info in a specific pullSecret (standard methodology)

@riupie
Copy link
Author

riupie commented Apr 2, 2024

@Nicolasgouze I deployed on top EKS.

If EKS, the credentials should be managed automatically without any manipulation on our side

It's weird, so what make my image failed to pull image? Should I put service account on registry deployment?
Any idea what should I check?

@plaffitt
Copy link
Contributor

Hello,

Sorry but I couldn't reproduce this bug in my EKS setup. Does you EKS setup has anything particular? Can you pull this image without kuik? (you can use the value controllers.webhook.ignoredImages to ignore this specific image).

@riupie
Copy link
Author

riupie commented Apr 23, 2024

Yes, I can pull the image without kuik since it already deployed from the start.
I don't know if this will help you to reproduce or not but I use IRSA to grant the access.

  1. Create policy to access ECR
  2. Using aws module terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc to create iam role
  3. Create service account and add annotation to attach role on point 2.
  4. Attach service account on each deployment that use the private ECR

@plaffitt
Copy link
Contributor

plaffitt commented May 7, 2024

I'm not totally sure if this is related, but maybe this could help : awslabs/amazon-ecr-credential-helper#581

@riupie
Copy link
Author

riupie commented May 17, 2024

the credentials should be managed automatically

@Nicolasgouze how kuik authenticate and pull image from private repository? as I remember we didn't need to setup role/pull secret on kuik itself, only on related app deployment.

@plaffitt
Copy link
Contributor

@riupie in v1.7.1, the caching mechanism is implemented in registry.go#L105-L122 which uses the registry.GetKeychains function to retreive keychains based on the environment and CachedImage's pull secrets. Authentication against ECR in an EKS cluster is done using the AWS helper (authn.NewKeychainFromHelper(ecrLogin.NewECRHelper())) in the registry.GetKeychains function. Implementation of this helper can be found there: https://github.com/awslabs/amazon-ecr-credential-helper

@riupie
Copy link
Author

riupie commented May 17, 2024

I don't know if this will help you to reproduce or not but I use IRSA to grant the access.

  1. Create policy to access ECR
  2. Using aws module terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc to create iam role
  3. Create service account and add annotation to attach role on point 2.
  4. Attach service account on each deployment that use the private ECR

Sorry, I revise my statement here. My EKS didn't use IRSA or pull secret at all. I use IAM role that attached to each EKS node group for ECR access. Is it already supported?

@riupie
Copy link
Author

riupie commented May 17, 2024

I think I found the culprit.
When we use amazon-ecr-credential-helper, it will call ec2 metadata, right? In my case, I set HttpPutResponseHopLimit to 1 on EC2, which mean metadata service only can be accessed on local EC2. Accessing metadata service from container means it need 2 hops, that will be rejected by AWS since I only set hoplimit 1.
It's just my hypothesis, I don't have any AWS account where I can customize hoplimit.

@plaffitt
Copy link
Contributor

According to the AWS documentation, it is indeed recommended to set this value to 2 : https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html

In a container environment, we recommend setting the hop limit to 2.

You should consider finding a way to set the hoplimit to 2.

I close the issue since there is nothing we can do on our side considering that the issue comes from a bad configuration in your AWS account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants