This bootstraps the following stack in a few minutes:
- Expose services via HTTPS using nginx-ingress, NodePort, ALB, ACM and Route53.
- Bootstrap a cluster by the script.
- Manage the cluster using
kubectl
,helm
,kops
andterraform
.
Make sure you have the following items:
- An AWS account
- An IAM user with these permissions
- A domain or subdomain, e.g.
dev.example.com
Install the following tools:
# WSL/Ubuntu
sudo apt install awscli
./install.sh # Install kubectl, kops, helm, and terraform
Kube-bootstrapper supports different cluster config for different environments. All envs are maintained in envs directory. To get started dev envirnoment is provided. In envs/dev/variables.sh file replaces values with in <<>> to appropriate values. Now Load the values with following command.
source envs/{{environement}}/variables.sh # dev incase of local in this case
Configure your AWS credentials.
aws configure --profile "$AWS_PROFILE"
Create a public hosted zone for the domain:
aws route53 create-hosted-zone --name "$kubernetes_ingress_domain" --caller-reference "$(date)"
You may need to add the NS records to the parent zone.
Note: If you are using Kops 1.6.2 or later, then DNS configuration is
optional. Instead, a gossip-based cluster can be easily created. The
only requirement to trigger this is to have the cluster name end with
.k8s.local
. If a gossip-based cluster is created then you can skip
this section.
In order to build a Kubernetes cluster with kops
, we need to prepare
somewhere to build the required DNS records. There are three scenarios
below and you should choose the one that most closely matches your AWS
situation.
If you bought your domain with AWS, then you should already have a hosted zone in Route53. If you plan to use this domain then no more work is needed.
In this example you own example.com
and your records for Kubernetes would
look like etcd-us-east-1c.internal.clustername.example.com
In this scenario you want to contain all kubernetes records under a subdomain of a domain you host in Route53. This requires creating a second hosted zone in route53, and then setting up route delegation to the new zone.
In this example you own example.com
and your records for Kubernetes would
look like etcd-us-east-1c.internal.clustername.subdomain.example.com
This is copying the NS servers of your SUBDOMAIN up to the PARENT domain in Route53. To do this you should:
- Create the subdomain, and note your SUBDOMAIN name servers (If you have already done this you can also get the values)
# Note: This example assumes you have jq installed locally.
ID=$(uuidgen) && aws route53 create-hosted-zone --name subdomain.example.com --caller-reference $ID | \
jq .DelegationSet.NameServers
- Note your PARENT hosted zone id
# Note: This example assumes you have jq installed locally.
aws route53 list-hosted-zones | jq '.HostedZones[] | select(.Name=="example.com.") | .Id'
- Create a new JSON file with your values (
subdomain.json
)
Note: The NS values here are for the SUBDOMAIN
{
"Comment": "Create a subdomain NS record in the parent domain",
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "subdomain.example.com",
"Type": "NS",
"TTL": 300,
"ResourceRecords": [
{
"Value": "ns-1.awsdns-1.co.uk"
},
{
"Value": "ns-2.awsdns-2.org"
},
{
"Value": "ns-3.awsdns-3.com"
},
{
"Value": "ns-4.awsdns-4.net"
}
]
}
}
]
}
- Apply the SUBDOMAIN NS records to the PARENT hosted zone.
aws route53 change-resource-record-sets \
--hosted-zone-id <parent-zone-id> \
--change-batch file://subdomain.json
Now traffic to *.subdomain.example.com
will be routed to the correct subdomain hosted zone in Route53.
If you bought your domain elsewhere, and would like to dedicate the entire domain to AWS you should follow the guide here
If you bought your domain elsewhere, but only want to use a subdomain in AWS Route53 you must modify your registrar's NS (NameServer) records. We'll create a hosted zone in Route53, and then migrate the subdomain's NS records to your other registrar.
You might need to grab jq for some of these instructions.
- Create the subdomain, and note your name servers (If you have already done this you can also get the values)
ID=$(uuidgen) && aws route53 create-hosted-zone --name subdomain.example.com --caller-reference $ID | jq .DelegationSet.NameServers
-
You will now go to your registrar's page and log in. You will need to create a new SUBDOMAIN, and use the 4 NS records received from the above command for the new SUBDOMAIN. This MUST be done in order to use your cluster. Do NOT change your top level NS record, or you might take your site offline.
-
Information on adding NS records with Godaddy.com
-
Information on adding NS records with Google Cloud Platform
By default the assumption is that NS records are publicly available. If you require private DNS records you should modify the commands we run later in this guide to include:
kops create cluster --dns private $NAME
If you have a mix of public and private zones, you will also need to include the --dns-zone
argument with the hosted zone id you wish to deploy in:
kops create cluster --dns private --dns-zone ZABCDEFG $NAME
This section is not be required if a gossip-based cluster is created.
You should now able to dig your domain (or subdomain) and see the AWS Name Servers on the other end.
dig ns subdomain.example.com
Should return something similar to:
;; ANSWER SECTION:
subdomain.example.com. 172800 IN NS ns-1.awsdns-1.net.
subdomain.example.com. 172800 IN NS ns-2.awsdns-2.org.
subdomain.example.com. 172800 IN NS ns-3.awsdns-3.com.
subdomain.example.com. 172800 IN NS ns-4.awsdns-4.co.uk.
This is a critical component of setting up clusters. If you are experiencing problems with the Kubernetes API not coming up, chances are something is wrong with the cluster's DNS.
Please DO NOT MOVE ON until you have validated your NS records! This is not required if a gossip-based cluster is created.
Request a certificate for the wildcard domain:
aws acm request-certificate --domain-name "*.$kubernetes_ingress_domain" --validation-method DNS --region {{your region}} # us-east-2 in my case
You need to approve the DNS validation. Open https://console.aws.amazon.com/acm/home and click the "Create record in Route53" button. See AWS User Guide for more.
Create a bucket for state store of kops and Terraform. You must enable bucket versioning.
aws s3api create-bucket \
--bucket "$state_store_bucket_name" \
--region "$AWS_DEFAULT_REGION" \
--create-bucket-configuration "LocationConstraint=$AWS_DEFAULT_REGION"
aws s3api put-bucket-versioning \
--bucket "$state_store_bucket_name" \
--versioning-configuration "Status=Enabled"
By default the script will create the following components:
- kops
- 1 master (t2.medium) in a single AZ
- 2 nodes (t2.medium) in a single AZ
- Terraform
- An internet-facing ALB
- A Route53 record for the internet-facing ALB
- A security group for the internet-facing ALB
- kubectl
- Create
ServiceAccount
andClusterRoleBinding
for the Helm tiller - Patch
StorageClass/gp2
to remove the default storage class
- Create
- Helm
Bootstrap a cluster.
./bootstrap.sh
You can change instance type of the master:
kops edit ig "master-${AWS_DEFAULT_REGION}a"
You can change instance type of the nodes:
kops edit ig "nodes-${AWS_DEFAULT_REGION}a"
Apply the changes:
kops update cluster
kops update cluster --yes
To change access control for the Kubernetes API and SSH:
kops edit cluster
spec:
kubernetesApiAccess:
- xxx.xxx.xxx.xxx/32
sshAccess:
- xxx.xxx.xxx.xxx/32
Apply the changes for the Kubernetes API and SSH:
kops update cluster
kops update cluster --yes
The following resources are needed so that the masters and nodes can access to services in the VPC:
- An internal ALB
- A Route53 private hosted zone for the internal ALB
- A Route53 record for the internal ALB
- A security group for the internal ALB
To change access control for the internet-facing ALB, edit tf_config.tf
:
variable "alb_external_allow_ip" {
default = [
"xxx.xxx.xxx.xxx/32",
"xxx.xxx.xxx.xxx/32",
]
}
variable "alb_internal_enabled" {
default = true
}
Apply the changes for the internet-facing ALB:
terraform apply
You can setup OIDC authentication for exposing Kubernetes Dashboard and Kibana.
If you want to use your Google Account, create an OAuth client on Google APIs Console and change the client ID and secret in envs/$environment/variables.sh
as follows:
export oidc_discovery_url=https://accounts.google.com
export oidc_kubernetes_dashboard_client_id=xxx-xxx.apps.googleusercontent.com
export oidc_kubernetes_dashboard_client_secret=xxxxxx
export oidc_kibana_client_id=xxx-xxx.apps.googleusercontent.com
export oidc_kibana_client_secret=xxxxxx
See also the tutorial at int128/kubernetes-dashboard-proxy.
Terraform creates the security group allow-from-nodes.hello.k8s.local
which allows access from the Kubernetes nodes.
For extra addons to enhance the stability and functionality of the cluster, please refer the addons
folder.
Currently available addons are listed below.
TLDR; Automatically scale number of nodes based on load. Requires metrics-server if used with HPA.
To check if metrics-server is installed, run the following command.
kubectl top pods
kubectl top nodes
If the commands ☝️ return CPU and memory utilization, you have metrics server installed!
chmod a+x ./addons/autoscaling.sh
./addons/autoscaling.sh
Cluster Autoscaler scales the number of nodes available in an Instance Group, based on maxSize
and minSize
.
minSize
is the minimum number of nodes that will be available even when the cluster is not at load.maxSize
is the maximum number of nodes the Cluster Autoscaler can request for at times of peak load. (This will be the reason your AWS bills go off the roof or stay in check).
Cluster Autoscaler kicks in when; new pods are created AND no node has space (CPU/Memory available) to accommodate the newly created pod.
CA works best with kubernetes deployments that have HPA(Horizontal Pod Autosaler) enabled.
With HPA enabled, if a deployment increases its number of replicas when at high load, CA automatically requests and provisions new nodes to schedule the newly created pods.
Beware: Although CA might take just a few seconds to respond to pods pending scheduling, it may take upto 10-15 minutes for the new node to actually join the cluster and become ready to be scheduled. So there is a chance for a minor downtime when scaling up (it autorecovers quite quickly).
When the surge/load reduces, the number of pods automatically goes down(with an HPA); the CA also takes care of removing underutilized nodes.
T2.medium/t3.medium/T-series nodes NotReady after CA
T-series instances have CPU burst credits which allows the CPU to use up to twice its specified resources of that instance type for short bursts of time. All T-series nodes get a set amount of CPU credit every hour, and loose CPU burst credits when CPU utilization exceeds above the allowed 100%.
Autoscaling ensures all nodes are used to their maximum potential; but that could mean using the CPU gets used upto 100%. Since the instance is allowed to cross beyond the 100% threshold,
They tend to exhaust their burst credits and all nodes in the cluster end up in a NotReady
state.
Tell the following steps to your team members.
source envs/$environment/variables.sh
# Configure your AWS credentials.
aws configure --profile "$AWS_PROFILE"
# Initialize kubectl and Terraform.
./init.sh
source envs/$environment/variables.sh
# Now you can execute the following tools.
kops
terraform
helm
WARNING: kops delete cluster
command will delete all EBS volumes with a tag.
You should take snapshots before destroying.
terraform destroy
kops delete cluster --name "$KOPS_CLUSTER_NAME" --yes
Running cost depends on number of masters and nodes.
Here is a minimum configuration with AWS Free Tier (first 1 year):
Role | Kind | Spec | Monthly Cost |
---|---|---|---|
Master | EC2 | m3.medium spot | $5 |
Master | EBS | gp2 10GB | free |
Master | EBS for etcd | gp2 5GB x2 | free |
Node | EC2 | m3.medium spot | $5 |
Node | EBS | gp2 10GB | free |
Cluster | EBS for PVs | gp2 | $0.1/GB |
Cluster | ALB | - | free |
Cluster | Route53 Hosted Zone | - | $0.5 |
Cluster | S3 | - | free |
The cluster name must be a domain name in order to reduce an ELB for masters.
# envs/$environment/variables.sh
kubernetes_cluster_name=dev.example.com
Reduce size of the volumes:
# kops edit cluster
spec:
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-west-2a
name: a
volumeSize: 5
name: main
version: 3.2.14
- etcdMembers:
- instanceGroup: master-us-west-2a
name: a
volumeSize: 5
name: events
version: 3.2.14
---
# kops edit ig master-us-west-2a
spec:
machineType: m3.medium
maxPrice: "0.02"
rootVolumeSize: 10
---
# kops edit ig nodes
spec:
machineType: m3.medium
maxPrice: "0.02"
rootVolumeSize: 10
subnets:
- us-west-2a
This is an open source software licensed under Apache License 2.0. Feel free to bring up issues or pull requests.
I've modified code from https://github.com/int128/kops-terraform-starter. Thanks to creator.