If you are using a released version of Kubernetes, you should refer to the docs that go with that version.
Documentation for other releases can be found at releases.k8s.io.
Simplify the cluster provisioning process for a cluster with one master and multiple worker nodes. It should be secured with SSL and have all the default add-ons. There should not be significant differences in the provisioning process across deployment targets (cloud provider + OS distribution) once machines meet the node specification.
Cluster provisioning can be broken into a number of phases, each with their own exit criteria. In some cases, multiple phases will be combined together to more seamlessly automate the cluster setup, but in all cases the phases can be run sequentially to provision a functional cluster.
It is possible that for some platforms we will provide an optimized flow that combines some of the steps together, but that is out of scope of this document.
Note: Exit critieria in the following sections are not intended to list all tests that should pass, rather list those that must pass.
Objective: Create a set of machines (master + nodes) where we will deploy Kubernetes.
For this phase to be completed successfully, the following requirements must be completed for all nodes:
- Basic connectivity between nodes (i.e. nodes can all ping each other)
- Docker installed (and in production setups should be monitored to be always running)
- One of the supported OS
We will provide a node specification conformance test that will verify if provisioning has been successful.
This step is provider specific and will be implemented for each cloud provider + OS distribution separately using provider specific technology (cloud formation, deployment manager, PXE boot, etc). Some OS distributions may meet the provisioning criteria without needing to run any post-boot steps as they ship with all of the requirements for the node specification by default.
Substeps (on the GCE example):
- Create network
- Create firewall rules to allow communication inside the cluster
- Create firewall rule to allow
ssh
to all machines - Create firewall rule to allow
https
to master - Create persistent disk for master
- Create static IP address for master
- Create master machine
- Create node machines
- Install docker on all machines
Exit critera:
- Can
ssh
to all machines and run a test docker image - Can
ssh
to master and nodes and ping other machines
Objective: Generate security certificates used to configure secure communication between client, master and nodes
TODO: Enumerate ceritificates which have to be generated.
Objective: Run kubelet and all the required components (e.g. etcd, apiserver, scheduler, controllers) on the master machine.
Substeps:
- copy certificates
- copy manifests for static pods:
- etcd
- apiserver, controller manager, scheduler
- run kubelet in docker container (configuration is read from apiserver Config object)
- run kubelet-checker in docker container
v1.2 simplifications:
- kubelet-runner.sh - we will provide a custom docker image to run kubelet; it will contain
kubelet binary and will run it using
nsenter
to workaround problem with mount propagation - kubelet config file - we will read kubelet configuration file from disk instead of apiserver; it will be generated locally and copied to all nodes.
Exit criteria:
- Can run basic API calls (e.g. create, list and delete pods) from the client side (e.g. replication controller works - user can create RC object and RC manager can create pods based on that)
- Critical master components works:
- scheduler
- controller manager
Objective: Start kubelet on all nodes and configure kubernetes network. Each node can be deployed separately and the implementation should make it ~impossible to change this assumption.
Substeps:
- copy certificates
- run kubelet in docker container (configuration is read from apiserver Config object)
- run kubelet-checker in docker container
v1.2 simplifications:
- kubelet config file - we will read kubelet configuration file from disk instead of apiserver; it will be generated locally and copied to all nodes.
Exit critera:
- All nodes are registered, but not ready due to lack of kubernetes networking.
Objective: Configure the Kubernetes networking to allow routing requests to pods and services.
To keep default setup consistent across open source deployments we will use Flannel to configure kubernetes networking. However, implementation of this step will allow to easily plug in different network solutions.
Substeps:
- copy manifest for flannel server to master machine
- create a daemonset with flannel daemon (it will read assigned CIDR and configure network appropriately).
v1.2 simplifications:
- flannel daemon will run as a standalone binary (not in docker container)
- flannel server will assign CIDRs to nodes outside of kubernetes; this will require restarting kubelet
after reconfiguring network bridge on local machine; this will also require running master nad node differently
(
--configure-cbr0=false
on node and--allocate-node-cidrs=false
on master), which breaks encapsulation between nodes
Exit criteria:
- Pods correctly created, scheduled, run and accessible from all nodes.
Objective: Start all system daemons (e.g. kube-proxy)
Substeps::
- Create daemonset for kube-proxy
Exit criteria:
- Services work correctly on all nodes.
Objective: Add default add-ons (e.g. dns, dashboard)
Substeps::
- Create Deployments (and daemonsets if needed) for all add-ons
We will use Ansible as the default technology for deployment orchestration. It has low requirements on the cluster machines and seems to be popular in kubernetes community which will help us to maintain it.
For simpler UX we will provide simple bash scripts that will wrap all basic commands for deployment (e.g. up
or down
)
One disadvantage of using Ansible is that it adds a dependency on a machine which runs deployment scripts. We will workaround this by distributing deployment scripts via a docker image so that user will run the following command to create a cluster:
docker run gcr.io/google_containers/deploy_kubernetes:v1.2 up --num-nodes=3 --provider=aws