Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new: troubleshooting module - All scenarios #1182

Open
wants to merge 106 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
2ed2045
Adding ALB troubleshooting scenario and Troubleshooting Methodologies
arcegacardenas Jun 11, 2024
f72bc88
Enable lab timings in PR website build
niallthomson Jul 1, 2024
a554a6c
- adding test automation validation to each section
Jul 27, 2024
1cc9bbd
Fix kubectl completion setup to be idempotent
niallthomson Jul 18, 2024
c16897f
Update flux lab for new sample app repo structure
niallthomson Jul 21, 2024
e74a4c8
Yaml component: Slight refactor, added zoomPath
niallthomson Jul 23, 2024
25f97a0
feat: Added Console button component to display links to AWS console …
niallthomson Jul 23, 2024
1ad5f44
update: Migrate OSS metrics lab to OpenTelemetry operator (#1017)
niallthomson Jul 23, 2024
8d3f887
update: Update Container Insights lab based on enhanced observability…
niallthomson Jul 23, 2024
e48f4af
new: Lab for Kubernetes Event-Driven Autoscaler (KEDA) (#1011)
dms486 Jul 24, 2024
d031b55
Remove debug statement
niallthomson Jul 24, 2024
68010af
Improve reliability of tests for observability labs
niallthomson Jul 27, 2024
ecd3034
Add wait to ArgoCD test while investigating flakiness
niallthomson Jul 27, 2024
cead511
Correct how OTel operator installed
niallthomson Jul 27, 2024
6b7a823
Add lab timings for keda
niallthomson Jul 27, 2024
1441444
Revert change
niallthomson Jul 27, 2024
e772770
Fixing formatting
arcegacardenas Jul 27, 2024
84e763c
Adding introduction for each section
arcegacardenas Jul 27, 2024
55382cb
Added 1.30 upgrade notice
niallthomson Jul 27, 2024
c735d08
Add upgrade header
niallthomson Jul 27, 2024
a1a429b
Migrated own account set up to use CloudFormation quick launch links
niallthomson Jul 29, 2024
1df2f48
Remove unnecessary packages from installer
niallthomson Aug 1, 2024
0ddad96
chore(deps): update dependency argoproj/argo-cd to v2.11.7 (#1023)
renovate[bot] Aug 1, 2024
0b0ea82
chore(deps): update dependency hashicorp/terraform to v1.9.3 (#1024)
renovate[bot] Aug 1, 2024
642bc0a
chore(deps): update dependency eksctl-io/eksctl to v0.188.0 (#1041)
renovate[bot] Aug 5, 2024
6209cfe
chore(deps): update helm release nginx to v18.1.7 (#1039)
renovate[bot] Aug 5, 2024
d3a5a7f
Bump sass from 1.77.6 to 1.77.8 in /website (#1037)
dependabot[bot] Aug 5, 2024
a2d98fa
Bump glob from 10.4.2 to 11.0.0 in /website (#1035)
dependabot[bot] Aug 5, 2024
d1be304
chore(deps): update helm release keda to v2.15.0 (#1033)
renovate[bot] Aug 5, 2024
e6790a5
chore(deps): update dependency kubernetes/autoscaler to v1.30.2 (#1026)
renovate[bot] Aug 5, 2024
b592f1b
chore(deps): update dependency kubernetes/kubernetes to v1.30.3 (#1032)
renovate[bot] Aug 5, 2024
830e84f
chore(deps): update dependency helm/helm to v3.15.3 (#1025)
renovate[bot] Aug 5, 2024
6fa9528
chore(deps): update helm release opentelemetry-operator to v0.65.1 (#…
renovate[bot] Aug 5, 2024
2a9c479
Bump typescript from 5.5.2 to 5.5.4 in /test/util (#1029)
dependabot[bot] Aug 6, 2024
c499738
Bump @types/chai from 4.3.14 to 4.3.17 in /test/util (#1027)
dependabot[bot] Aug 6, 2024
f51fc56
6-month steering committee rotation
svennam92 Aug 7, 2024
2591ff9
Fix network policies logging
niallthomson Aug 9, 2024
3c6ae3b
Fixed steering file formatting
niallthomson Aug 9, 2024
c64031b
Revert "chore(deps): update helm release opentelemetry-operator to v0…
niallthomson Aug 9, 2024
c58eb3d
chore(deps): update helm release argo-cd to v7 (#1002)
renovate[bot] Aug 9, 2024
b6253a1
Fix test utility to inject proper test output to after hooks
niallthomson Aug 9, 2024
6c7b40e
chore: Add markdown linting checks (#1047)
niallthomson Aug 12, 2024
f5a911c
update: Fix inconsistent deployment name in Resource View section (#1…
arkagang Aug 12, 2024
868e682
fix: Corrected typo in ADOT manifest breakdowns (#1051)
niallthomson Aug 14, 2024
88a7eee
update: Update IRSA and Pod Identity to new sample application versio…
DovAmir Aug 14, 2024
b045d0c
Bump mocha from 10.5.2 to 10.7.0 in /test/util (#1031)
dependabot[bot] Aug 14, 2024
b8d7fa3
Bump react-tooltip from 5.27.0 to 5.28.0 in /website (#1043)
dependabot[bot] Aug 14, 2024
d49d8e5
Bump @fortawesome/fontawesome-svg-core from 6.5.2 to 6.6.0 in /websit…
dependabot[bot] Aug 14, 2024
44030dc
Bump yaml from 2.4.5 to 2.5.0 in /test/util (#1028)
dependabot[bot] Aug 14, 2024
28bbd9b
Fixing formatting
arcegacardenas Aug 16, 2024
ef7bf9e
Merge branch 'aws-samples:main' into troubleshooting-module
arcegacardenas Aug 21, 2024
85c3120
Formatting
arcegacardenas Aug 21, 2024
1d74201
Merge branch 'main' of github.com:arcegacardenas/eks-workshop-v2 into…
arcegacardenas Sep 12, 2024
46882b4
Removing provisioner destroy since it was not working properly. Movin…
arcegacardenas Sep 21, 2024
95c8833
Merge branch 'main' into troubleshooting-module
arcegacardenas Sep 21, 2024
bfdef47
pod troubleshooting scenarios
Oct 24, 2024
205503b
Merge branch 'main' into troubleshooting-module
Oct 25, 2024
d352b90
initial creation of the branch
rimaulana Nov 4, 2024
49c3e1d
fixed linting
rimaulana Nov 4, 2024
a2509ca
Merge branch 'main' into troubleshooting-module
Nov 4, 2024
fed459f
added hook scripts
Nov 5, 2024
ab282aa
fixed pre-commit errors
Nov 5, 2024
63ed98f
modified clean-up logic and fix resource provisioning
rimaulana Nov 5, 2024
67f7c05
fixed pre-commit errors
Nov 6, 2024
668c661
fix pod crash
Nov 6, 2024
57c579f
fixed the readme content
Nov 6, 2024
730da0e
pod troubleshooting scenarios
Oct 24, 2024
79db7d9
added hook scripts
Nov 5, 2024
8f2bbb4
fixed pre-commit errors
Nov 5, 2024
3245a57
fixed pre-commit errors
Nov 6, 2024
d1b5c26
fix pod crash
Nov 6, 2024
1e13ac9
fixed the readme content
Nov 6, 2024
4523c81
Merge branch 'troubleshooting-module-pod' of https://github.com/arceg…
Nov 7, 2024
c058c8d
Merge pull request #5 from arcegacardenas/troubleshooting-module-pod
arcegacardenas Nov 9, 2024
96dac9f
module troubleshooting/dns all files
abencomoc Nov 10, 2024
db9e0c7
module dns format updates
abencomoc Nov 10, 2024
d97e7e7
tuning test timers module dns
abencomoc Nov 10, 2024
f6a66f8
Adding folders to the troubleshooting pod scenarios so the autmated test
Nov 12, 2024
c1c1ac5
fixing hook timeout
Nov 12, 2024
1cea126
Merge pull request #6 from arcegacardenas/troubleshooting-module-aben…
arcegacardenas Nov 12, 2024
7be5ab4
Merge pull request #7 from arcegacardenas/troubleshooting-module-pagilla
arcegacardenas Nov 13, 2024
614a3b2
fixed exported variables and add output of command
rimaulana Nov 13, 2024
f4cf781
Merge remote-tracking branch 'origin/troubleshooting-module-all-scena…
rimaulana Nov 13, 2024
155f89f
Merge pull request #8 from arcegacardenas/pull-4
rimaulana Nov 13, 2024
92afb0e
made all changes for troubleshooting module workernodes. Created webs…
robisoh88 Nov 13, 2024
46ba7a1
modified aws-auth script to modify existing entries and not create a …
robisoh88 Nov 14, 2024
0fddd79
Updating page indexes and adding CNI automated test timeout
Nov 16, 2024
ba0df43
Merge pull request #4 from arcegacardenas/troubleshooting-module-rima…
arcegacardenas Nov 16, 2024
bd1cd3e
Merge branch 'pull-4' into troubleshooting-module-all-scenarios
Nov 16, 2024
b2adf2c
made all changes for troubleshooting module workernodes. Created webs…
robisoh88 Nov 13, 2024
44f8d01
modified aws-auth script to modify existing entries and not create a …
robisoh88 Nov 14, 2024
8782e2b
Merge branch 'troubleshooting-module-robisoh88' of github.com:arcegac…
Nov 16, 2024
11b6b27
Merge branch 'main' of github.com:arcegacardenas/eks-workshop-v2 into…
Nov 16, 2024
59c73f0
Adding corect index weight and title of the new sub-module to the main
Nov 17, 2024
334793b
fixed image for efs console
Nov 18, 2024
3b650c9
Merge branch 'troubleshooting-module-all-scenarios' of https://github…
Nov 18, 2024
bd2cfef
Merge pull request #9 from arcegacardenas/troubleshooting-module-pagilla
arcegacardenas Nov 18, 2024
be2f679
removed comments and changed workernodes website weights
robisoh88 Nov 19, 2024
4232d52
Merge pull request #10 from arcegacardenas/troubleshooting-module-rob…
arcegacardenas Nov 19, 2024
862c235
Removed "section", "step" from titles
abencomoc Nov 19, 2024
ad45794
Spellcheck completed for workernodes scenario index.md files.
robisoh88 Nov 20, 2024
cdf33be
Spellcheck completed for workernodes scenario index.md files.
robisoh88 Nov 20, 2024
cac0462
Merge pull request #11 from arcegacardenas/troubleshooting-module-all…
arcegacardenas Nov 21, 2024
085fe41
fixed spelling and add unknown words into dictionary
rimaulana Nov 21, 2024
16b44c1
added more words to dictionary
rimaulana Nov 21, 2024
b71316f
Merge pull request #12 from arcegacardenas/troubleshooting-module-rim…
arcegacardenas Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .spelling
Original file line number Diff line number Diff line change
Expand Up @@ -127,4 +127,9 @@ sheetal
joshi
keda
AIML
DCGM
DCGM
IPVS
xvda
NACL
routability
xnew
25 changes: 25 additions & 0 deletions manifests/modules/troubleshooting/alb/.workshop/cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

set -e

logmessage "Restoring public subnet tags..."

# Function to create ftags for subnets ids
remove_tags_from_subnets() {
subnets_vpc=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=*Public*" "Name=tag:created-by,Values=eks-workshop-v2" --query 'Subnets[*].SubnetId' --output text)
#logmessage "subnets_vpc: $subnets_vpc"


#remove tag from subnets with AWS cli
for subnet_id in $subnets_vpc; do
#logmessage "public subnets: $subnet_id"
aws ec2 create-tags --resources "$subnet_id" --tags Key=kubernetes.io/role/elb,Value='1' || logmessage "Failed to create tag from subnet $subnet_id"
done
return 0
}

remove_tags_from_subnets

kubectl delete ingress -n ui ui --ignore-not-found

uninstall-helm-chart aws-load-balancer-controller kube-system
176 changes: 176 additions & 0 deletions manifests/modules/troubleshooting/alb/.workshop/terraform/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
terraform {
required_providers {
# kubectl = {
# source = "gavinbunney/kubectl"
# version = ">= 1.14"
# }
}
}



provider "aws" {
region = "us-east-1"
alias = "virginia"
}

locals {
tags = {
module = "troubleshooting"
}
}

data "aws_vpc" "selected" {
tags = {
created-by = "eks-workshop-v2"
env = var.addon_context.eks_cluster_id
}
}

data "aws_subnets" "public" {
tags = {
created-by = "eks-workshop-v2"
env = var.addon_context.eks_cluster_id
}

filter {
name = "tag:Name"
values = ["*Public*"]
}
}


resource "time_sleep" "blueprints_addons_sleep" {
depends_on = [
module.eks_blueprints_addons
]

create_duration = "15s"
destroy_duration = "15s"
}


resource "null_resource" "break_public_subnet" {
triggers = {
public_subnets = join(" ", data.aws_subnets.public.ids)
always_run = timestamp()
}
count = length(data.aws_subnets.public)

lifecycle {
create_before_destroy = false
}


provisioner "local-exec" {
when = create
command = "aws ec2 delete-tags --resources ${self.triggers.public_subnets} --tags Key=kubernetes.io/role/elb,Value='1'"
}

}

module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "1.16.2"

enable_aws_load_balancer_controller = true
aws_load_balancer_controller = {
wait = true
}

cluster_name = var.addon_context.eks_cluster_id
cluster_endpoint = var.addon_context.aws_eks_cluster_endpoint
cluster_version = var.eks_cluster_version
oidc_provider_arn = var.addon_context.eks_oidc_provider_arn

tags = merge(
var.tags,
local.tags
)

depends_on = [null_resource.break_public_subnet]

}


# create a new policy from json file
resource "aws_iam_policy" "issue" {
name = "issue"
path = "/"
policy = file("${path.module}/template/other_issue.json")
}

# attach issue policy to role
resource "aws_iam_role_policy_attachment" "issue_policy_attachment" {
role = module.eks_blueprints_addons.aws_load_balancer_controller.iam_role_name
policy_arn = aws_iam_policy.issue.arn
depends_on = [module.eks_blueprints_addons, time_sleep.blueprints_addons_sleep]
}

resource "null_resource" "detach_existing_policy" {
triggers = {
role_name = module.eks_blueprints_addons.aws_load_balancer_controller.iam_role_name,
always_run = timestamp()
}

provisioner "local-exec" {
command = "aws iam detach-role-policy --role-name ${self.triggers.role_name} --policy-arn ${module.eks_blueprints_addons.aws_load_balancer_controller.iam_policy_arn}"
when = create
}

depends_on = [aws_iam_role_policy_attachment.issue_policy_attachment]
}

resource "null_resource" "kustomize_app" {
triggers = {
always_run = timestamp()
}

provisioner "local-exec" {
command = "kubectl apply -k ~/environment/eks-workshop/modules/troubleshooting/alb/creating-alb"
when = create
}

depends_on = [aws_iam_role_policy_attachment.issue_policy_attachment]
}



# Example to now how to get variables from add ons outputs DO-NOT-DELETE; AddOns and helms documentaitons does not show exactly the output variables returned
#resource "null_resource" "blue_print_output" {
# for_each = module.eks_blueprints_addons.aws_load_balancer_controller
# triggers = {
#
# timestamp = timestamp()
# }
#
# #count = length(module.eks_blueprints_addons.aws_load_balancer_controller)
# provisioner "local-exec" {
# command = "mkdir -p /eks-workshop/logs; echo \" key: ${each.key} Value:${each.value}\" >> /eks-workshop/logs/action-load-balancer-output.log"
# }
#
# depends_on = [module.eks_blueprints_addons,time_sleep.blueprints_addons_sleep]
#}

#option to run a bash script file
#resource "null_resource" "break2" {
# provisioner "local-exec" {
# command = "${path.module}/template/break.sh ${path.module} mod2"
# }
#
# triggers = {
# always_run = timestamp()
# }
# depends_on = [module.eks_blueprints_addons,time_sleep.blueprints_addons_sleep]
#}

#option to run a kubectl manifest
#resource "kubectl_manifest" "alb" {
# yaml_body = templatefile("${path.module}/template/ingress.yaml", {
#
# })
#
# depends_on = [null_resource.break_policy]
#}


Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
output "environment_variables" {
description = "Environment variables to be added to the IDE shell"
value = merge({
VPC_ID = data.aws_vpc.selected.id,
LOAD_BALANCER_CONTROLLER_ROLE_NAME = module.eks_blueprints_addons.aws_load_balancer_controller.iam_role_name,
LOAD_BALANCER_CONTROLLER_POLICY_ARN_FIX = module.eks_blueprints_addons.aws_load_balancer_controller.iam_policy_arn,
LOAD_BALANCER_CONTROLLER_POLICY_ARN_ISSUE = aws_iam_policy.issue.arn,
LOAD_BALANCER_CONTROLLER_ROLE_ARN = module.eks_blueprints_addons.aws_load_balancer_controller.iam_role_arn
}, {
for index, id in data.aws_subnets.public.ids : "PUBLIC_SUBNET_${index + 1}" => id
}
)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
#!/usr/bin/env bash
#. .env

set -e

mkdir -p /eks-workshop/logs
log_file=/eks-workshop/logs/action-$(date +%s).log

exec 2>&1

logmessage() {
echo "$@" >&7
echo "$@" >&1
}
export -f logmessage

# Function to get the role name from a role ARN
get_role_name_from_arn() {
local role_arn=$1

# Extract the role name from the ARN
role_name=$(logmessage "$role_arn" | awk -F'/' '{print $NF}')

if [ -n "$role_name" ]; then
logmessage "$role_name"
else
logmessage "Failed to retrieve role name from ARN: $role_arn"
return 1
fi
}

# Function to get the Kubernetes role attached to a service account
get_service_account_role() {
local namespace=$1
local service_account=$2

# Get the role ARN associated with the service account
role_arn=$(kubectl get serviceaccount "$service_account" -n "$namespace" -o jsonpath="{.metadata.annotations['eks\.amazonaws\.com\/role-arn']}")

if [ -n "$role_arn" ]; then
logmessage "Service Account: $service_account"
logmessage "Namespace: $namespace"
logmessage "Role ARN: $role_arn"
get_role_name_from_arn "$role_arn"
return 0
else
logmessage "Failed to retrieve role for service account '$service_account' in namespace '$namespace'"
return 1
fi

}

# Function to get the first policy ARN attached to a role ARN
get_first_policy_arn_from_role_arn() {
local role_arn=$1

# Get the list of policies attached to the role
policy_arn=$(aws iam list-attached-role-policies --role-name "$role_arn" --query 'AttachedPolicies[0].PolicyArn' --output text)

if [ -n "$policy_arn" ]; then
logmessage "First Policy ARN attached to role '$role_arn':"
logmessage "Policy: $policy_arn"
return 0
else
logmessage "Failed to retrieve policy ARN for role '$role_arn'"
return 1
fi
}

# Function to update the policy with new statement
update_policy_with_new_statement() {
local policy_arn=$1
local new_statement=$2

logmessage "PolicyARN: $policy_arn"
logmessage "Statement: $new_statement"
aws iam create-policy-version --policy-arn $policy_arn --policy-document $new_statement --set-as-default

}

# Function to remove an action from a policy statement
remove_action_from_policy_statement() {
local policy_name=$1
local action_to_remove=$2

# Get the current policy document
policy_document=$(aws iam get-policy-version --policy-arn "$policy_arn" --query 'PolicyVersion.Document' --version-id v1 --output json)

# Remove the specified action from the statements
new_statements=$(logmessage "$policy_document" | jq ".Statement[] | select(.Action[] | contains('$action_to_remove')) | .Action = [.Action[] | select(. != '$action_to_remove')]")
new_policy_document=$(logmessage '{"Version": "2012-10-17", "Statement": '"$new_statements"'}')
+
# Update the policy with the modified document
logmessage "Policy Document"
logmessage $new_policy_document
#aws iam create-policy-version --policy-arn "$policy_arn" --policy-document "$new_policy_document" --set-as-default

if [ $? -eq 0 ]; then
logmessage "Action removed from policy statement successfully."
return 0
else
logmessage "Failed to remove action from policy statement."
return 1
fi
}

# Function to remove tags from subnets ids
remove_tags_from_subnets() {
local tag_key="Key=kubernetes.io/role/elb,Value=1"

logmessage "retrive subnets ids with tag key assigned to specific vpc_id via aws cli"
logmessage "getting public subnets from VPC: $vpc_id "


subnets_vpc=$(aws ec2 describe-subnets --filters "Name=vpc-id,Values=$vpc_id" --query 'Subnets[*].SubnetId' --output text)
logmessage "subnets_vpc: $subnets_vpc"


#remove tag from subnets with AWS cli
for subnet_id in $subnets_vpc; do
logmessage "public subnets: $subnet_id"
aws ec2 delete-tags --resources "$subnet_id" --tags "Key=$tag_key" || logmessage "Failed to remove tag from subnet $subnet_id"
done
return 0
}

# Getting the service role
path_tofile=$1
mode=$2
vpc_id=$3
public_subnets=$4
namespace="kube-system"
service_account="aws-load-balancer-controller-sa"
#new_statement="file://$path_tofile/template/iam_policy_incorrect.json"
new_statement="file://$path_tofile/template/other_issue.json"

logmessage "path_sent: $path_tofile"


# validate if mode is equal to mod1
logmessage "mode: $mode"
if [ "$mode" == "mod1" ]; then
logmessage "Removing subnet tags"
remove_tags_from_subnets
else
logmessage "Removing permissions"
get_service_account_role "$namespace" "$service_account"
get_first_policy_arn_from_role_arn "$role_name"
update_policy_with_new_statement "$policy_arn" "$new_statement"

fi




Loading
Loading