A proposal for the distribution of secrets (passwords, keys, etc) to the Kubelet and to containers inside Kubernetes using a custom volume type.
Secrets are needed in containers to access internal resources like the Kubernetes master or external resources such as git repositories, databases, etc. Users may also want behaviors in the kubelet that depend on secret data (credentials for image pull from a docker registry) associated with pods.
Goals of this design:
- Describe a secret resource
- Define the various challenges attendant to managing secrets on the node
- Define a mechanism for consuming secrets in containers without modification
- This design does not prescribe a method for storing secrets; storage of secrets should be pluggable to accommodate different use-cases
- Encryption of secret data and node security are orthogonal concerns
- It is assumed that node and master are secure and that compromising their security could also
compromise secrets:
- If a node is compromised, the only secrets that could potentially be exposed should be the secrets belonging to containers scheduled onto it
- If the master is compromised, all secrets in the cluster may be exposed
- Secret rotation is an orthogonal concern, but it should be facilitated by this proposal
- A user who can consume a secret in a container can know the value of the secret; secrets must be provisioned judiciously
- As a user, I want to store secret artifacts for my applications and consume them securely in
containers, so that I can keep the configuration for my applications separate from the images
that use them:
- As a cluster operator, I want to allow a pod to access the Kubernetes master using a custom
.kubeconfig
file, so that I can securely reach the master - As a cluster operator, I want to allow a pod to access a Docker registry using credentials
from a
.dockercfg
file, so that containers can push images - As a cluster operator, I want to allow a pod to access a git repository using SSH keys, so that I can push and fetch to and from the repository
- As a cluster operator, I want to allow a pod to access the Kubernetes master using a custom
- As a user, I want to allow containers to consume supplemental information about services such as username and password which should be kept secret, so that I can share secrets about a service amongst the containers in my application securely
- As a user, I want to associate a pod with a
ServiceAccount
that consumes a secret and have the kubelet implement some reserved behaviors based on the types of secrets the service account consumes:- Use credentials for a docker registry to pull the pod's docker image
- Present kubernetes auth token to the pod or transparently decorate traffic between the pod and master service
- As a user, I want to be able to indicate that a secret expires and for that secret's value to be rotated once it expires, so that the system can help me follow good practices
Many configuration files contain secrets intermixed with other configuration information. For example, a user's application may contain a properties file than contains database credentials, SaaS API tokens, etc. Users should be able to consume configuration artifacts in their containers and be able to control the path on the container's filesystems where the artifact will be presented.
Most pieces of information about how to use a service are secrets. For example, a service that provides a MySQL database needs to provide the username, password, and database name to consumers so that they can authenticate and use the correct database. Containers in pods consuming the MySQL service would also consume the secrets associated with the MySQL service.
Service Accounts are proposed as a
mechanism to decouple capabilities and security contexts from individual human users. A
ServiceAccount
contains references to some number of secrets. A Pod
can specify that it is
associated with a ServiceAccount
. Secrets should have a Type
field to allow the Kubelet and
other system components to take action based on the secret's type.
As an example, the service account proposal discusses service accounts consuming secrets which contain kubernetes auth tokens. When a Kubelet starts a pod associated with a service account which consumes this type of secret, the Kubelet may take a number of actions:
- Expose the secret in a
.kubernetes_auth
file in a well-known location in the container's file system - Configure that node's
kube-proxy
to decorate HTTP requests from that pod to thekubernetes-master
service with the auth token, e. g. by adding a header to the request (see the LOAS Daemon proposal)
Another example use case is where a pod is associated with a secret containing docker registry credentials. The Kubelet could use these credentials for the docker pull to retrieve the image.
Rotation is considered a good practice for many types of secret data. It should be possible to express that a secret has an expiry date; this would make it possible to implement a system component that could regenerate expired secrets. As an example, consider a component that rotates expired secrets. The rotator could periodically regenerate the values for expired secrets of common types and update their expiry dates.
Some images will expect to receive configuration items as environment variables instead of files. We should consider what the best way to allow this is; there are a few different options:
-
Force the user to adapt files into environment variables. Users can store secrets that need to be presented as environment variables in a format that is easy to consume from a shell:
$ cat /etc/secrets/my-secret.txt export MY_SECRET_ENV=MY_SECRET_VALUE
The user could
source
the file at/etc/secrets/my-secret
prior to executing the command for the image either inline in the command or in an init script, -
Give secrets an attribute that allows users to express the intent that the platform should generate the above syntax in the file used to present a secret. The user could consume these files in the same manner as the above option.
-
Give secrets attributes that allow the user to express that the secret should be presented to the container as an environment variable. The container's environment would contain the desired values and the software in the container could use them without accomodation the command or setup script.
For our initial work, we will treat all secrets as files to narrow the problem space. There will be a future proposal that handles exposing secrets as environment variables.
There are two fundamentally different use-cases for access to secrets:
- CRUD operations on secrets by their owners
- Read-only access to the secrets needed for a particular node by the kubelet
In use cases for CRUD operations, the user experience for secrets should be no different than for other API resources.
The data store backing the REST API should be pluggable because different cluster operators will have different preferences for the central store of secret data. Some possibilities for storage:
- An etcd collection alongside the storage for other API resources
- A collocated HSM
- An external datastore such as an external etcd, RDBMS, etc.
There should be a size limit for secrets in order to:
- Prevent DOS attacks against the API server
- Allow kubelet implementations that prevent secret data from touching the node's filesystem
The size limit should satisfy the following conditions:
- Large enough to store common artifact types (encryption keypairs, certificates, small configuration files)
- Small enough to avoid large impact on node resource consumption (storage, RAM for tmpfs, etc)
To begin discussion, we propose an initial value for this size limit of 1MB.
Defining a policy for limitations on how a secret may be referenced by another API resource and how constraints should be applied throughout the cluster is tricky due to the number of variables involved:
- Should there be a maximum number of secrets a pod can reference via a volume?
- Should there be a maximum number of secrets a service account can reference?
- Should there be a total maximum number of secrets a pod can reference via its own spec and its associated service account?
- Should there be a total size limit on the amount of secret data consumed by a pod?
- How will cluster operators want to be able to configure these limits?
- How will these limits impact API server validations?
- How will these limits affect scheduling?
For now, we will not implement validations around these limits. Cluster operators will decide how much node storage is allocated to secrets. It will be the operator's responsibility to ensure that the allocated storage is sufficient for the workload scheduled onto a node.
The use-case where the kubelet reads secrets has several additional requirements:
- Kubelets should only be able to receive secret data which is required by pods scheduled onto the kubelet's node
- Kubelets should have read-only access to secret data
- Secret data should not be transmitted over the wire insecurely
- Kubelets must ensure pods do not have access to each other's secrets
The Kubelet should only be allowed to read secrets which are consumed by pods scheduled onto that Kubelet's node and their associated service accounts. Authorization of the Kubelet to read this data would be delegated to an authorization plugin and associated policy rule.
Consideration must be given to whether secret data should be allowed to be at rest on the node:
- If secret data is not allowed to be at rest, the size of secret data becomes another draw on the node's RAM - should it affect scheduling?
- If secret data is allowed to be at rest, should it be encrypted?
- If so, how should be this be done?
- If not, what threats exist? What types of secret are appropriate to store this way?
For the sake of limiting complexity, we propose that initially secret data should not be allowed to be at rest on a node; secret data should be stored on a node-level tmpfs filesystem. This filesystem can be subdivided into directories for use by the kubelet and by the volume plugin.
The Kubelet will be responsible for creating the per-node tmpfs file system for secret storage. It is hard to make a prescriptive declaration about how much storage is appropriate to reserve for secrets because different installations will vary widely in available resources, desired pod to node density, overcommit policy, and other operation dimensions. That being the case, we propose for simplicity that the amount of secret storage be controlled by a new parameter to the kubelet with a default value of 64MB. It is the cluster operator's responsibility to handle choosing the right storage size for their installation and configuring their Kubelets correctly.
Configuring each Kubelet is not the ideal story for operator experience; it is more intuitive that the cluster-wide storage size be readable from a central configuration store like the one proposed in #1553. When such a store exists, the Kubelet could be modified to read this configuration item from the store.
When the Kubelet is modified to advertise node resources (as proposed in #4441), the capacity calculation for available memory should factor in the potential size of the node-level tmpfs in order to avoid memory overcommit on the node.
Every pod will have a security context. Secret data on the node should be isolated according to the security context of the container. The Kubelet volume plugin API will be changed so that a volume plugin receives the security context of a volume along with the volume spec. This will allow volume plugins to implement setting the security context of volumes they manage.
Several proposals / upstream patches are notable as background for this proposal:
- Docker vault proposal
- Specification for image/container standardization based on volumes
- Kubernetes service account proposal
- Secrets proposal for docker (1)
- Secrets proposal for docker (2)
We propose a new Secret
resource which is mounted into containers with a new volume type. Secret
volumes will be handled by a volume plugin that does the actual work of fetching the secret and
storing it. Secrets contain multiple pieces of data that are presented as different files within
the secret volume (example: SSH key pair).
In order to remove the burden from the end user in specifying every file that a secret consists of,
it should be possible to mount all files provided by a secret with a single VolumeMount
entry
in the container specification.
A new resource for secrets will be added to the API:
type Secret struct {
TypeMeta
ObjectMeta
// Data contains the secret data. Each key must be a valid DNS_SUBDOMAIN.
// The serialized form of the secret data is a base64 encoded string,
// representing the arbitrary (possibly non-string) data value here.
Data map[string][]byte `json:"data,omitempty"`
// Used to facilitate programmatic handling of secret data.
Type SecretType `json:"type,omitempty"`
}
type SecretType string
const (
SecretTypeOpaque SecretType = "Opaque" // Opaque (arbitrary data; default)
SecretTypeKubernetesAuthToken SecretType = "KubernetesAuth" // Kubernetes auth token
SecretTypeDockerRegistryAuth SecretType = "DockerRegistryAuth" // Docker registry auth
// FUTURE: other type values
)
const MaxSecretSize = 1 * 1024 * 1024
A Secret can declare a type in order to provide type information to system components that work
with secrets. The default type is opaque
, which represents arbitrary user-owned data.
Secrets are validated against MaxSecretSize
. The keys in the Data
field must be valid DNS
subdomains.
A new REST API and registry interface will be added to accompany the Secret
resource. The
default implementation of the registry will store Secret
information in etcd. Future registry
implementations could store the TypeMeta
and ObjectMeta
fields in etcd and store the secret
data in another data store entirely, or store the whole object in another data store.
Initially there will be no validations for the number of secrets a pod references, or the number of secrets that can be associated with a service account. These may be added in the future as the finer points of secrets and resource allocation are fleshed out.
A new SecretSource
type of volume source will be added to the VolumeSource
struct in the
API:
type VolumeSource struct {
// Other fields omitted
// SecretSource represents a secret that should be presented in a volume
SecretSource *SecretSource `json:"secret"`
}
type SecretSource struct {
Target ObjectReference
}
Secret volume sources are validated to ensure that the specified object reference actually points
to an object of type Secret
.
In the future, the SecretSource
will be extended to allow:
- Fine-grained control over which pieces of secret data are exposed in the volume
- The paths and filenames for how secret data are exposed
A new Kubelet volume plugin will be added to handle volumes with a secret source. This plugin will
require access to the API server to retrieve secret data and therefore the volume Host
interface
will have to change to expose a client interface:
type Host interface {
// Other methods omitted
// GetKubeClient returns a client interface
GetKubeClient() client.Interface
}
The secret volume plugin will be responsible for:
- Returning a
volume.Builder
implementation fromNewBuilder
that:- Retrieves the secret data for the volume from the API server
- Places the secret data onto the container's filesystem
- Sets the correct security attributes for the volume based on the pod's
SecurityContext
- Returning a
volume.Cleaner
implementation fromNewClear
that cleans the volume from the container's filesystem
The Kubelet must be modified to accept a new parameter for the secret storage size and to create a tmpfs file system of that size to store secret data. Rough accounting of specific changes:
- The Kubelet should have a new field added called
secretStorageSize
; units are megabytes NewMainKubelet
should accept a value for secret storage size- The Kubelet server should have a new flag added for secret storage size
- The Kubelet's
setupDataDirs
method should be changed to create the secret storage
For use-cases where the Kubelet's behavior is affected by the secrets associated with a pod's
ServiceAccount
, the Kubelet will need to be changed. For example, if secrets of type
docker-reg-auth
affect how the pod's images are pulled, the Kubelet will need to be changed
to accommodate this. Subsequent proposals can address this on a type-by-type basis.
For clarity, let's examine some detailed examples of some common use-cases in terms of the
suggested changes. All of these examples are assumed to be created in a namespace called
example
.
To create a pod that uses an ssh key stored as a secret, we first need to create a secret:
{
"apiVersion": "v1beta2",
"kind": "Secret",
"id": "ssh-key-secret",
"data": {
"id-rsa.pub": "dmFsdWUtMQ0K",
"id-rsa": "dmFsdWUtMg0KDQo="
}
}
Note: The serialized JSON and YAML values of secret data are encoded as base64 strings. Newlines are not valid within these strings and must be omitted.
Now we can create a pod which references the secret with the ssh key and consumes it in a volume:
{
"id": "secret-test-pod",
"kind": "Pod",
"apiVersion":"v1beta2",
"labels": {
"name": "secret-test"
},
"desiredState": {
"manifest": {
"version": "v1beta1",
"id": "secret-test-pod",
"containers": [{
"name": "ssh-test-container",
"image": "mySshImage",
"volumeMounts": [{
"name": "secret-volume",
"mountPath": "/etc/secret-volume",
"readOnly": true
}]
}],
"volumes": [{
"name": "secret-volume",
"source": {
"secret": {
"target": {
"kind": "Secret",
"namespace": "example",
"name": "ssh-key-secret"
}
}
}
}]
}
}
}
When the container's command runs, the pieces of the key will be available in:
/etc/secret-volume/id-rsa.pub
/etc/secret-volume/id-rsa
The container is then free to use the secret data to establish an ssh connection.
Let's compare examples where a pod consumes a secret containing prod credentials and another pod consumes a secret with test environment credentials.
The secrets:
[{
"apiVersion": "v1beta2",
"kind": "Secret",
"id": "prod-db-secret",
"data": {
"username": "dmFsdWUtMQ0K",
"password": "dmFsdWUtMg0KDQo="
}
},
{
"apiVersion": "v1beta2",
"kind": "Secret",
"id": "test-db-secret",
"data": {
"username": "dmFsdWUtMQ0K",
"password": "dmFsdWUtMg0KDQo="
}
}]
The pods:
[{
"id": "prod-db-client-pod",
"kind": "Pod",
"apiVersion":"v1beta2",
"labels": {
"name": "prod-db-client"
},
"desiredState": {
"manifest": {
"version": "v1beta1",
"id": "prod-db-pod",
"containers": [{
"name": "db-client-container",
"image": "myClientImage",
"volumeMounts": [{
"name": "secret-volume",
"mountPath": "/etc/secret-volume",
"readOnly": true
}]
}],
"volumes": [{
"name": "secret-volume",
"source": {
"secret": {
"target": {
"kind": "Secret",
"namespace": "example",
"name": "prod-db-secret"
}
}
}
}]
}
}
},
{
"id": "test-db-client-pod",
"kind": "Pod",
"apiVersion":"v1beta2",
"labels": {
"name": "test-db-client"
},
"desiredState": {
"manifest": {
"version": "v1beta1",
"id": "test-db-pod",
"containers": [{
"name": "db-client-container",
"image": "myClientImage",
"volumeMounts": [{
"name": "secret-volume",
"mountPath": "/etc/secret-volume",
"readOnly": true
}]
}],
"volumes": [{
"name": "secret-volume",
"source": {
"secret": {
"target": {
"kind": "Secret",
"namespace": "example",
"name": "test-db-secret"
}
}
}
}]
}
}
}]
The specs for the two pods differ only in the value of the object referred to by the secret volume source. Both containers will have the following files present on their filesystems:
/etc/secret-volume/username
/etc/secret-volume/password