Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV-2741: EKS FAQs #717

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions docs/layers/eks/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,43 @@ launch and scale runners for GitHub automatically.

For more on how to set up ARC, see the
[GitHub Action Runners setup docs for EKS](/layers/github-actions/eks-github-actions-controller/).

## Managed nodes are successfully launching, but worker nodes are not joining the cluster

Worker nodes are not joining the EKS cluster even though managed nodes are successfully launching. This often happens when worker nodes cannot communicate with the EKS cluster due to missing cluster add-ons.

Ensure that cluster add-ons compatible with your EKS cluster version are properly configured and included in your stack. Verify that the addon stack file (e.g., `stacks/catalog/eks/mixins/k8s-1-29.yaml`) is imported into your stack. You can confirm this by checking the final rendered component stack with Atmos:

```bash
atmos describe component eks/cluster -s <stack>
```

## I'm able to ping the cluster endpoint but unable to connect to the cluster

You can ping the EKS cluster endpoint but cannot connect to it using `kubectl` or other tools. This indicates a networking issue preventing proper communication with the cluster.

Use the AWS Reachability Analyzer to diagnose the network path between your source and the EKS endpoint. Check for misconfigurations in security groups, Transit Gateway attachments, and subnet routes. Ensure that managed nodes are using private subnets by setting `cluster_private_subnets_only: true` in your EKS cluster configuration.

## AWS Client VPN clients not receiving routes to EKS cluster

VPN clients connected via AWS Client VPN are not receiving routes to the EKS cluster’s VPC, preventing access to the API endpoint.

Verify that the Client VPN endpoint has active routes to the EKS VPC CIDR and that these routes are associated with subnets attached to the Client VPN endpoint. Confirm that authorization rules permit access to the EKS VPC CIDR. Ensure that security groups associated with the Client VPN endpoint allow outbound traffic to the EKS VPC. After making changes, disconnect and reconnect VPN clients to receive updated routes.

## Common troubleshooting steps when unable to connect to EKS cluster

1. Check EKS Cluster Security Groups: Ensure that inbound and outbound rules allow necessary traffic.
2. Verify Network ACLs: Confirm that Network ACLs permit the required inbound and outbound traffic.
3. Inspect Subnet Route Tables: Ensure that VPC route tables correctly route traffic between your source and the EKS cluster.
4. Confirm Transit Gateway Configuration: Verify that Transit Gateway attachments and route tables are properly set up.
5. Verify DNS Resolution: Check that the EKS API endpoint’s DNS name resolves correctly from your source.
6. *Use AWS Reachability Analyzer*: Analyze the network path to identify any connectivity issues. Set the VPNs ENI as the source and the EKS cluster endpoint private IP as the destination. _Check both directions_.
7. Review EKS Cluster Endpoint Access Settings: Make sure the cluster’s endpoint access configuration aligns with your needs.
8. Check the EKS Cluster Subnets: Ensure that the EKS cluster subnets are correctly configured and associated with the cluster. We recommend using private subnets for managed nodes.
9. Check IAM Permissions: Ensure your IAM user or role has the necessary permissions to access the cluster.

For example, here's an example command to test connectivity to the EKS cluster's control plane endpoint. You can find this endpoint in the AWS web console or in Terraform outputs:

```bash
curl -fsSk --max-time 5 "https://82F58026XXXXXXXXXXXXXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz"
```