- 🌟 About the Project
- 🚀 Getting Started
- ⚒️ Advanced Configuration
- ♻️ Lifecycle
- 🧭 Roadmap
- 👋 Contributing
- ⚖️ License
- 💎 Acknowledgements
Hcloud Kubernetes is a Terraform module for deploying a fully declarative, managed Kubernetes cluster on Hetzner Cloud. It utilizes Talos, a secure, immutable, and minimal operating system specifically designed for Kubernetes, featuring a streamlined architecture with just 12 binaries and managed entirely through an API.
This project is committed to production-grade configuration and lifecycle management, ensuring all components are set up for high availability. It includes a curated selection of widely used and officially recognized Kubernetes components. If you encounter any issues, suboptimal settings, or missing elements, please file an issue to help us improve this project.
Tip
If you don't yet have a Hetzner account, feel free to use this Hetzner Cloud Referral Link to claim a €20 credit and support this project.
This setup includes several features for a seamless, best-practice Kubernetes deployment on Hetzner Cloud:
- Fully Declarative & Immutable: Utilize Talos Linux for a completely declarative and immutable Kubernetes setup on Hetzner Cloud.
- Cross-Architecture: Supports both AMD64 and ARM64 architectures, with integrated image upload to Hetzner Cloud.
- High Availability: Configured for production-grade high availability for all components, ensuring consistent and reliable system performance.
- Distributed Storage: Implements Longhorn for cloud-native block storage with snapshotting and automatic replica rebuilding.
- Autoscaling: Includes Cluster Autoscaler to dynamically adjust node counts based on workload demands, optimizing resource allocation.
- Plug-and-Play Kubernetes: Equipped with an optional Ingress Controller and Cert Manager, facilitating rapid workload deployment.
- Geo-Redundant Ingress: Supports high availability and massive scalability through geo-redundant Load Balancer pools.
- Dual-Stack Support: Employs Load Balancers with Proxy Protocol to efficiently route both IPv4 and IPv6 traffic to the Ingress Controller.
- Enhanced Security: Built with security as a priority, incorporating firewalls and encryption by default to protect your infrastructure.
- Automated Backups: Leverages Talos Backup with support for S3-compatible storage solutions like Hetzner's Object Storage.
This project includes commonly used and essential Kubernetes software, optimized for seamless integration with Hetzner Cloud.
-
Talos Cloud Controller Manager (CCM) Manages node resources by updating with cloud metadata, handling lifecycle deletions, and automatically approving node CSRs. -
Talos Backup Automates etcd snapshots and S3 storage for backup in Talos Linux-based Kubernetes clusters. -
Hcloud Cloud Controller Manager (CCM) Manages the integration of Kubernetes clusters with Hetzner Cloud services, ensuring the update of node data, private network traffic control, and load balancer setup. -
Hcloud Container Storage Interface (CSI) Manages persistent storage in Kubernetes clusters using Hetzner Cloud Volumes, ensuring seamless storage integration and management. -
Longhorn Delivers distributed block storage for Kubernetes, facilitating high availability and easy management of persistent volumes with features like snapshotting and automatic replica rebuilding. -
Cilium Container Network Interface (CNI) A high performance CNI plugin that enhances and secures network connectivity and observability for container workloads through the use of eBPF technology in Linux kernels. -
Ingress NGINX Controller Provides a robust web routing and load balancing solution for Kubernetes, utilizing NGINX as a reverse proxy to manage traffic and enhance network performance. -
Cert Manager Automates the management of certificates in Kubernetes, handling the issuance and renewal of certificates from various sources like Let's Encrypt, and ensures certificates are valid and updated. -
Cluster Autoscaler Dynamically adjusts Kubernetes cluster size based on resource demands and node utilization, scaling nodes in or out to optimize cost and performance. -
Metrics Server Collects and provides container resource metrics for Kubernetes, enabling features like autoscaling by interacting with Horizontal and Vertical Pod Autoscalers.
Talos Linux is a secure, minimal, and immutable OS for Kubernetes, removing SSH and shell access to reduce attack surfaces. Managed through a secure API with mTLS, Talos prevents configuration drift, enhancing both security and predictability. It follows NIST and CIS hardening standards, operates in memory, and is built to support modern, production-grade Kubernetes environments.
Firewall Protection: This module uses Hetzner Cloud Firewalls to manage external access to nodes. For internal pod-to-pod communication, support for Kubernetes Network Policies is provided through Cilium CNI.
Encryption in Transit: In this module, all pod network traffic is encrypted by default using WireGuard via Cilium CNI. It includes automatic key rotation and efficient in-kernel encryption, covering all traffic types.
Encryption at Rest: In this module, the STATE and EPHEMERAL partitions are encrypted by default with Talos Disk Encryption using LUKS2. Each node is secured with individual encryption keys derived from its unique nodeID
.
- terraform to deploy Kubernetes on Hetzner Cloud
- packer to upload Talos Images to Hetzner Cloud
- talosctl to control the Talos Cluster
- kubectl to control Kubernetes (optional)
Important
Keep the CLI tools up to date. Ensure that talosctl
matches your Talos version for compatibility, especially before a Talos upgrade.
Create kubernetes.tf
file with the module configuration:
module "kubernetes" {
source = "hcloud-k8s/kubernetes/hcloud"
version = "<version>"
cluster_name = "k8s"
hcloud_token = "<hcloud-token>"
# Export configs for Talos and Kube API access
cluster_kubeconfig_path = "kubeconfig"
cluster_talosconfig_path = "talosconfig"
# Optional Ingress Controller and Cert Manager
cert_manager_enabled = true
ingress_nginx_enabled = true
control_plane_nodepools = [
{ name = "control", type = "cax11", location = "fsn1", count = 3 }
]
worker_nodepools = [
{ name = "worker", type = "cax11", location = "fsn1", count = 3 }
]
}
Note
Each Control Plane node requires at least 4GB of memory and each Worker node at least 2GB. For High-Availability (HA), at least 3 Control Plane nodes and 3 Worker nodes are required.
Initialize Terraform and deploy the cluster:
terraform init --upgrade
terraform apply
Set config file locations:
export TALOSCONFIG=talosconfig
export KUBECONFIG=kubeconfig
Display cluster nodes:
talosctl get member
kubectl get nodes -o wide
Display all pods:
kubectl get pods -A
For more detailed information and examples, please visit:
To destroy the cluster, first disable the delete protection by setting:
cluster_delete_protection = false
Apply this change before proceeding. Once the delete protection is disabled, you can teardown the cluster using the following Terraform commands:
terraform state rm 'module.kubernetes.talos_machine_configuration_apply.worker'
terraform state rm 'module.kubernetes.talos_machine_configuration_apply.control_plane'
terraform state rm 'module.kubernetes.talos_machine_secrets.this'
terraform destroy
Cluster Access
By default, the cluster is accessible over the public internet. The firewall is automatically configured to use the IPv4 address and /64 IPv6 CIDR of the machine running this module. To disable this automatic configuration, set the following variables to false
:
firewall_use_current_ipv4 = false
firewall_use_current_ipv6 = false
To manually specify source networks for the Talos API and Kube API, configure the firewall_talos_api_source
and firewall_kube_api_source
variables as follows:
firewall_talos_api_source = [
"1.2.3.0/32",
"1:2:3::/64"
]
firewall_kube_api_source = [
"1.2.3.0/32",
"1:2:3::/64"
]
This allows explicit control over which networks can access your APIs, overriding the default behavior when set.
If your internal network is routed and accessible, you can directly access the cluster using internal IPs by setting:
cluster_access = "private"
For integrating Talos nodes with an internal network, configure a default route (0.0.0.0/0
) in the Hetzner Network to point to your router or gateway. Additionally, add specific routes on the Talos nodes to encompass your entire network CIDR:
talos_extra_routes = ["10.0.0.0/8"]
# Optionally, disable NAT for your globally routed CIDR
network_native_routing_cidr = "10.0.0.0/8"
# Optionally, use an existing Network
hcloud_network_id = 123456789
This setup ensures that the Talos nodes can route traffic appropriately across your internal network.
Optionally, a hostname can be configured to direct access to the Kubernetes API through a node IP, load balancer, or Virtual IP (VIP):
kube_api_hostname = "kube-api.example.com"
For accessing the Kubernetes API from the public internet, choose one of the following options based on your needs:
- Use a Load Balancer (Recommended):
Deploy a load balancer to manage API traffic, enhancing availability and load distribution.kube_api_load_balancer_enabled = true
- Use a Virtual IP (Floating IP):
A Floating IP is configured to automatically move between control plane nodes in case of an outage, ensuring continuous access to the Kubernetes API.control_plane_public_vip_ipv4_enabled = true # Optionally, specify an existing Floating IP control_plane_public_vip_ipv4_id = 123456789
When accessing the Kubernetes API via an internal network, an internal Virtual IP (Alias IP) is utilized by default to route API requests within the network. This feature can be disabled with the following configuration:
control_plane_private_vip_ipv4_enabled = false
To enhance internal availability, a load balancer can be used:
kube_api_load_balancer_enabled = true
This setup ensures secure and flexible access to the Kubernetes API, accommodating different networking environments.
Cluster Autoscaler
The Cluster Autoscaler dynamically adjusts the number of nodes in a Kubernetes cluster based on the demand, ensuring that there are enough nodes to run all pods and no unneeded nodes when the workload decreases.Example kubernetes.tf
snippet:
# Configuration for cluster autoscaler node pools
cluster_autoscaler_nodepools = [
{
name = "autoscaler"
type = "cax11"
location = "fsn1"
min = 0
max = 6
labels = { "autoscaler-node" = "true" }
taints = [ "autoscaler-node=true:NoExecute" ]
}
]
Optionally, pass additional Helm values to the cluster autoscaler configuration:
cluster_autoscaler_helm_values = {
extraArgs = {
enforce-node-group-min-size = true
scale-down-delay-after-add = "45m"
scale-down-delay-after-delete = "4m"
scale-down-unneeded-time = "5m"
}
}
Egress Gateway
Cilium offers an Egress Gateway to ensure network compatibility with legacy systems and firewalls requiring fixed IPs. The use of Cilium Egress Gateway does not provide high availability and increases latency due to extra network hops and tunneling. Consider this configuration only as a last resort.
Example kubernetes.tf
snippet:
# Enable Cilium Egress Gateway
cilium_egress_gateway_enabled = true
# Define worker nodepools including an egress-specific node pool
worker_nodepools = [
# ... (other node pool configurations)
{
name = "egress"
type = "cax11"
location = "fsn1"
labels = { "egress-node" = "true" }
taints = [ "egress-node=true:NoSchedule" ]
}
]
Example Egress Gateway Policy:
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
name: sample-egress-policy
spec:
selectors:
- podSelector:
matchLabels:
io.kubernetes.pod.namespace: sample-namespace
app: sample-app
destinationCIDRs:
- "0.0.0.0/0"
egressGateway:
nodeSelector:
matchLabels:
egress-node: "true"
Please visit the Cilium documentation for more details.
Firewall Configuration
By default, a firewall is configured that can be extended with custom rules. If no egress rules are configured, outbound traffic remains unrestricted. However, inbound traffic is always restricted to mitigate the risk of exposing Talos nodes to the public internet, which could pose a serious security vulnerability.Each rule is defined with the following properties:
description
: A brief description of the rule.direction
: The direction of traffic (in
for inbound,out
for outbound).source_ips
: A list of source IP addresses for outbound rules.destination_ips
: A list of destination IP addresses for inbound rules.protocol
: The protocol used (valid options:tcp
,udp
,icmp
,gre
,esp
).port
: The port number (required fortcp
andudp
protocols, must not be specified foricmp
,gre
, andesp
).
Example kubernetes.tf
snippet:
firewall_extra_rules = [
{
description = "Custom UDP Rule"
direction = "in"
source_ips = ["0.0.0.0/0", "::/0"]
protocol = "udp"
port = "12345"
},
{
description = "Custom TCP Rule"
direction = "in"
source_ips = ["1.2.3.4", "1:2:3:4::"]
protocol = "tcp"
port = "8080-9000"
},
{
description = "Allow ICMP"
direction = "in"
source_ips = ["0.0.0.0/0", "::/0"]
protocol = "icmp"
}
]
For access to Talos and the Kubernetes API, please refer to the Cluster Access configuration section.
Ingress Load Balancer
The ingress controller uses a default load balancer service to manage external traffic. For geo-redundancy and high availability, ingress_load_balancer_pools
can be configured as an alternative, replacing the default load balancer with the specified pool of load balancers.
To replace the default load balancer, use ingress_load_balancer_pools
in the Terraform configuration. This setup ensures high availability and geo-redundancy by distributing traffic from various locations across all targets in all regions.
Example kubernetes.tf
configuration:
ingress_load_balancer_pools = [
{
name = "lb-nbg"
location = "nbg1"
type = "lb11"
},
{
name = "lb-fsn"
location = "fsn1"
type = "lb11"
}
]
Configuring local traffic handling enhances network efficiency by reducing latency. Processing traffic closer to its source eliminates unnecessary routing delays, ensuring consistent performance for low-latency or region-sensitive applications.
Example kubernetes.tf
configuration:
ingress_nginx_kind = "DaemonSet"
ingress_nginx_service_external_traffic_policy = "Local"
ingress_load_balancer_pools = [
{
name = "regional-lb-nbg"
location = "nbg1"
local_traffic = true
},
{
name = "regional-lb-fsn"
location = "fsn1"
local_traffic = true
}
]
Key settings in this configuration:
local_traffic
: Limits load balancer targets to nodes in the same geographic location as the load balancer, reducing data travel distances and keeping traffic within the region.ingress_nginx_service_external_traffic_policy
set toLocal
: Ensures external traffic is handled directly on the local node, avoiding extra network hops.ingress_nginx_kind
set toDaemonSet
: Deploys an ingress controller instance on every node, enabling requests to be handled locally for faster response times.
Topology-aware routing in ingress-nginx can optionally be enabled by setting the ingress_nginx_topology_aware_routing
variable to true
. This functionality routes traffic to the nearest upstream endpoints, enhancing efficiency for supported services. Note that this feature is only applicable to services that support topology-aware routing. For more information, refer to the Kubernetes documentation.
Network Segmentation
By default, this module calculates optimal subnets based on the provided network CIDR (network_ipv4_cidr
). The network is segmented automatically as follows:
- 1st Quarter: Reserved for other uses such as classic VMs.
- 2nd Quarter:
- 1st Half: Allocated for Node Subnets (
network_node_ipv4_cidr
) - 2nd Half: Allocated for Service IPs (
network_service_ipv4_cidr
)
- 1st Half: Allocated for Node Subnets (
- 3rd and 4th Quarters:
- Full Span: Allocated for Pod Subnets (
network_pod_ipv4_cidr
)
- Full Span: Allocated for Pod Subnets (
Each Kubernetes node requires a /24
subnet within network_pod_ipv4_cidr
. To support this configuration, the optimal node subnet size (network_node_ipv4_subnet_mask_size
) is calculated using the formula:
32 - (24 - subnet_mask_size(network_pod_ipv4_cidr
)).
With the default 10.0.0.0/16
network CIDR (network_ipv4_cidr
), the following values are calculated:
- Node Subnet Size:
/25
(Max. 128 Nodes per Subnet) - Node Subnets:
10.0.64.0/19
(Max. 64 Subnets, each with/25
) - Service IPs:
10.0.96.0/19
(Max. 8192 Services) - Pod Subnet Size:
/24
(Max. 256 Pods per Node) - Pod Subnets:
10.0.128.0/17
(Max. 128 Nodes, each with/24
)
Please consider the following Hetzner Cloud limits:
- Up to 100 servers can be attached to a network.
- Up to 100 routes can be created per network.
- Up to 50 subnets can be created per network.
- A project can have up to 50 placement groups.
A /16
Network CIDR is sufficient to fully utilize Hetzner Cloud's scaling capabilities. It supports:
- Up to 100 nodes, each with its own
/24
Pod subnet route. - Configuration of up to 50 nodepools, one nodepool per subnet, each with at least one placement group.
Here is a table with more example calculations:
Network CIDR | Node Subnet Size | Node Subnets | Service IPs | Pod Subnets |
---|---|---|---|---|
10.0.0.0/16 | /25 (128 IPs) | 10.0.64.0/19 (64) | 10.0.96.0/19 (8192) | 10.0.128.0/17 (128) |
10.0.0.0/17 | /26 (64 IPs) | 10.0.32.0/20 (64) | 10.0.48.0/20 (4096) | 10.0.64.0/18 (64) |
10.0.0.0/18 | /27 (32 IPs) | 10.0.16.0/21 (64) | 10.0.24.0/21 (2048) | 10.0.32.0/19 (32) |
10.0.0.0/19 | /28 (16 IPs) | 10.0.8.0/22 (64) | 10.0.12.0/22 (1024) | 10.0.16.0/20 (16) |
10.0.0.0/20 | /29 (8 IPs) | 10.0.4.0/23 (64) | 10.0.6.0/23 (512) | 10.0.8.0/21 (8) |
10.0.0.0/21 | /30 (4 IPs) | 10.0.2.0/24 (64) | 10.0.3.0/24 (256) | 10.0.4.0/22 (4) |
Talos Backup
This module natively supports Hcloud Object Storage. Below is an example of how to configure backups with MinIO Client (mc
) and Hcloud Object Storage. While it's possible to create the bucket through the Hcloud Console, this method does not allow for the configuration of automatic retention policies.
Create an alias for the endpoint using the following command:
mc alias set <alias> \
https://<location>.your-objectstorage.com \
<access-key> <secret-key> \
--api "s3v4" \
--path "off"
Create a bucket with automatic retention policies to protect your backups:
mc mb --with-lock --region <location> <alias>/<bucket>
mc retention set GOVERNANCE 14d --default <alias>/<bucket>
Configure your kubernetes.tf
file:
talos_backup_s3_hcloud_url = "https://<bucket>.<location>.your-objectstorage.com"
talos_backup_s3_access_key = "<access-key>"
talos_backup_s3_secret_key = "<secret-key>"
# Optional: AGE X25519 Public Key for encryption
talos_backup_age_x25519_public_key = "<age-public-key>"
# Optional: Change schedule (cron syntax)
talos_backup_schedule = "0 * * * *"
For users of other object storage providers, configure kubernetes.tf
as follows:
talos_backup_s3_region = "<region>"
talos_backup_s3_endpoint = "<endpoint>"
talos_backup_s3_bucket = "<bucket>"
talos_backup_s3_prefix = "<prefix>"
# Use path-style URLs (set true if required by your provider)
talos_backup_s3_path_style = true
# Access credentials
talos_backup_s3_access_key = "<access-key>"
talos_backup_s3_secret_key = "<secret-key>"
# Optional: AGE X25519 Public Key for encryption
talos_backup_age_x25519_public_key = "<age-public-key>"
# Optional: Change schedule (cron syntax)
talos_backup_schedule = "0 * * * *"
To recover from a snapshot, please refer to the Talos Disaster Recovery section in the Documentation.
The Talos Terraform Provider does not support declarative upgrades of Talos or Kubernetes versions. This module compensates for these limitations using talosctl
to implement the required functionalities. Any minor or major upgrades to Talos and Kubernetes will result in a major version change of this module. Please be aware that downgrades are typically neither supported nor tested.
Important
Before upgrading to the next major version of this module, ensure you are on the latest release of the current major version. Do not skip any major release upgrades.
Hcloud K8s | K8s | Talos | Talos CCM | Hcloud CCM | Hcloud CSI | Long-horn | Cilium | Ingress NGINX | Cert Mgr. | Auto-scaler |
---|---|---|---|---|---|---|---|---|---|---|
(2) | (1.32) | (1.9) | ? | ? | ? | ? | ? | ? | ? | ? |
(1) | 1.31 | 1.8 | 1.8 | 1.21 | 2.10 | ? | (1.17) | (4.12) | 1.15 | 9.38 |
0 | 1.30 | 1.7 | 1.6 | 1.20 | 2.9 | 1.7.1 | 1.16 | 4.10.1 | 1.14 | 9.37 |
In this module, upgrades are conducted with care and conservatism. You will consistently receive the most tested and compatible releases of all components, avoiding the latest untested or incompatible releases that could disrupt your cluster.
Warning
Do not change any software versions in this project on your own. Each component is tailored to ensure compatibility with new Kubernetes releases. This project specifies versions that are supported and have been thoroughly tested to work together.
- Upgrade to Talos 1.8 and Kubernetes 1.31
Once all components have compatible versions, the upgrade can be performed. - Integrate native IPv6 for pod traffic
Completion requires Hetzner's addition of IPv6 support to cloud networks, expected at the beginning of 2025 as announced at Hetzner Summit 2024.
Contributions are always welcome!
Distributed under the MIT License. See LICENSE for more information.
- Talos Linux for its impressively secure, immutable, and minimalistic Kubernetes distribution.
- Hetzner Cloud for offering excellent cloud infrastructure with robust Kubernetes integrations.
- Other projects like Kube-Hetzner and Terraform - Hcloud - Talos, where we’ve contributed and gained valuable insights into Kubernetes deployments on Hetzner.