-
Notifications
You must be signed in to change notification settings - Fork 1
Infrastructure Management
The following documentation contains the different steps on the VRE infrastructure and its deployment.
This documentation is generalised as much as possible. However, CERN make use of certain resources that can only be used and/or managed by CERN users. The sections that are CERN dependant will be specified along the document.
- Create a k8s cluster in CERN OpenStack
- How to connect to the cluster
- Software stack to manage the cluster
- Create a git repository to manage the cluster
- Cluster configuration
Note that CERN OpenStack is a CERN service. You would need to adapt this section to your own service provide.
The create of a CERN OpenStack cluster can only be done from aiadm
. CERN k8s documentation (CERN restricted access) can be found in this link.
The cluster can be created either via the OpenStack GUI, either command line commands. Please adapt the following snippet to your own use case.
#/bin/bash
$ openstack coe cluster create <CLUSTER_NAME> \
--keypair <KEYPAIR> \
--cluster-template <K8S_VERSION> \
--master-count 1 \
--node-count 6 \
--flavor m2.large \
--master-flavor m2.medium \
--merge-labels \
--labels <key1>=<value1> \
--labels <key2>=<value2>
Once the cluster has been correctly created, use the following commands to download the configuration file to connect to the cluster
$ openstack coe cluster list
# This command will create a `config`
$ openstack coe cluster config <CLUSTER_NAME>
Finally move this file to the ~/.kube/
directory and export the following env variable
mv config ~/.kube/config_<CLUSTER_NAME>
export KUBECONFIG=$HOME/.kube/config_<CLUSTER_NAME>
For the CERN VRE, we are using the following software stack to manage and interact with the cluster. We highly recommend creating a software package manager to isolate this environment.
- kubectl (upstream documentation)
- k9s (upstream documentation
- Helm (upstream documentation)
- Sealed Secrets (upstream documentation)
- Flux (upstream documentation)
- git (upstream documentation)
- Docker (upstream documentation)
Flux was installed manually via:
flux bootstrap github --owner=vre-hub --repository=vre --branch=main --path=infrastructure/cluster/flux-v2 --author-name flux-ops
with version v2.0.0-rc.5.
Flux version was set to v2.0.0-rc.5
. Higher flux versions are incompatible with the current cluster version (v1.26.7 - April '24). To install this flux specific version run
curl -s https://fluxcd.io/install.sh | sudo FLUX_VERSION=v2.0.0-rc.5 bash
- To bootstrap the repository you will need to pass a valid GitHub PAT.
- After running the above command, a new
deploy-key
will be automatically set up in the repository configuration under the username of the person that run the command.
Using the above command, all the manifests inside the path infrastructure/cluster/flux-v2
will be automatically deployed to the VRE cluster.
Refer to the official flux docs for information on how to add manifests e. g. helm charts and add kustomizations.
Set up some nodes as ingress nodes, for future ingress to the cluster
#/bin/bash
NODE_PREFIX=$(kubectl get nodes -l magnum.openstack.org/role=worker --sort-by .metadata.name -o name | head -n 1)
NODE_PREFIX=${NODE_PREFIX%-0}
echo $NODE_PREFIX
# example with 3 first nodes
FIRST_3NODES=$(kubectl get nodes -l magnum.openstack.org/role=worker --sort-by .metadata.name -o name | head -n 3)
for NODE in $FIRST_3NODES
do
kubectl label "$NODE" role=ingress
done
To create and request a new db instance at: https://dbod.web.cern.ch/pages/dashboard Have a look to the section in the wiki
Check Rucio documentation or run the following snippet, after being adapted
#/bin/bas
docker run --rm \
-e RUCIO_CFG_DATABASE_DEFAULT="<DB_URL>" \
-e RUCIO_CFG_BOOTSTRAP_USERPASS_IDENTITY="<USER>" \
-e RUCIO_CFG_BOOTSTRAP_USERPASS_PWD="<PASS>" \
rucio/rucio-init
Have a look to https://github.com/vre-hub/vre/wiki/Infrastructure-Management#secrets-management
You have two strategies to use Flux + Helm
- Either pushing the flux/k8s manifest directly to the cluster and debugging using flux,
- either applying the manifest by hand
kubectl apply ....
, and after having debug any problem, apply it to the cluster - Flux in this case won't do anything as the status has already been applied
The following contains information on the infrastructure and its deployment.
Resources are created either via Terraform or flux. See more details here.
Namespace | Description |
---|---|
shared-services | Namespace for shared resources |
flux-system | Namespace for flux |
rucio-vre | Namespace for rucio resources |
reana | Namespace for reana resources |
jhub | Namespace for JupyterHub |
The two necessary services, Rucio and JupyterHub, are exposed to the internet. The ingress controllers are part of a LandDB set; see here for documentation. To add nodes as Adding new nodes s additional ingresses, add them to the LanDB set VRE-INGRESS-NODES
.
While Rucio relies on CERN Host- and Grid Robot Certificates, we also have a globally valid Sectigo certificate covering the following domains:
- vre.cern.ch (not used atm)
- rucio-auth-vre.cern.ch (not used atm)
- rucio-ui-vre.cern.ch (not used atm)
- rucio-vre.cern.ch (not used atm)
- reana-vre.cern.ch (not used atm)
- jhub-vre.cern.ch (used for JupyterHub)
The TLS key and cert can be found in tbag
.
In the future, certificates shall be issued automatically within the cluster through cert-manager
, please refer to for more information on CERN setup.
The VRE databases run in an instance on CERNs Database On Demand (DBOD). The instance is called vre
and connections details can be found in tbag (each db has a dedicated r/w user for usage with an application - do not use the main admin user).
URL: https://dbod.web.cern.ch/ User Guide: https://dbod-user-guide.web.cern.ch/
At the moment, there are three central databases in the vre
postgres
instance:
-
rucio
for the Rucio Helm Chart Installationrucio-cvre-*
-
reana
for the REANA Helm Chart Installationreana-cvre-*
-
jhub
for the Zero to JupyterHub Helm Installationjhub-cvre-*
The output of the PSQL \l
command:
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+-----------+----------+-------------+-------------+------------+-----------------+-----------------------
admin | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
dod_dbmon | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
dod_pmm | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
jhub | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
postgres | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
reana | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
rucio | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =Tc/admin +
| | | | | | | admin=CTc/admin +
| | | | | | | rucio=CTc/admin
template0 | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | <output-omitted> | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
(9 rows)
Additional databases may be added; see the PSQL section. This requires adding the host to pg_hba.conf
(edit on the DBOD Dashboard - see DBOD docs and here:
..
# TYPE DATABASE USER ADDRESS METHOD
host <db> <user> 0.0.0.0/0 md5
..
Create a sealed secret file running the command below:
kubectl create secret generic secret-name --dry-run=client --from-literal=foo=bar -o [json|yaml] | \
kubeseal \
--controller-name=sealed-secrets-controller \
--controller-namespace=kube-system \
--format yaml > mysealedsecret.[json|yaml]
kubectl create -f mysealedsecret.[json|yaml]
In order to (1) encrypt the secret from a .yaml file, (2) save it to the github repository and (3) apply it to the cluster, run the following script by first determining the variables:
- namespace:
export NS="<your_ns>"
- where is the
.yaml
containing your secret:export yaml_file_secret="<your_yaml_file>"
- path to where you cloned this repo:
export SECRETS_STORE="<path_to_github_dir_>/vre-hub/vre/iac/secrets/rucio/"
# kubeseal controller namespace
CONTROLLER_NS="shared-services"
CONTROLLER_NAME="sealed-secrets-cvre"
YAML_PRFX="ss_"
# name of output secret to apply
OUTPUT_SECRET=${SECRETS_STORE}${YAML_PRFX}${yaml_file_secret}
echo "Create and apply main server secrets"
cat ${yaml_file_secret} | kubeseal --controller-name=${CONTROLLER_NAME} --controller-namespace=${CONTROLLER_NS} --format yaml --namespace=${NS} > ${OUTPUT_SECRET}
kubectl apply -f ${OUTPUT_SECRET}
There are two ways to expose as service to the intranet or internet (with firewall opening) (at CERN):
Option A: LoadBalancer
The LoadBalancer is created automatically (see LBaas CERN, kubernetes-service-type-loadbalancer and LBaaS) after that the URL needs to set as a description:
openstack loadbalancer set --description my-domain-name mylb
Some useful commands:
openstack loadbalancer list
openstack loadbalancer show <lb>
openstack server list
openstack server show <server>
openstack server unset --property <property> <server>
openstack server show <name>
To open the firewall for the services that have been created, make a firewall request.
These are our firewall openings details at CERN:
- main Rucio server
- authentication Rucio server
- webui : TBD
If you want to allow you to expose the services for getting a certificate from e.g. lets encrypt, follow this documentation, which will create some certificates to allow external traffic. The CERN e-group is in our case called ESCAPE-WP2-CERN-ACME-LANDBSET
and we have a set associated to it called ESCAPE-WP2-CERN-ACME
.
If you experience slow service issues, or you want to find out about LoadBalancing at CERN: -Firewall https://clouddocs.web.cern.ch/containers/tutorials/firewall.html -Session Persistence https://clouddocs.web.cern.ch/networking/load_balancing.html#session-persistence -Getting a keytab for a load balancer https://clouddocs.web.cern.ch/networking/load_balancing.html#getting-a-keytab-for-a-load-balancer
Option B: NodePort
and Ingress
, this requires several worker nodes set with role=ingress
:
kubectl label <node-name> role=ingress
kubectl label <node-name> role=ingress
Infrastructure as code is used to create cluster resources. We use terraform and flux as tools for that purpose. The used terraform providers are listed below.
Terraform supports many different providers that can be used to create resources for a specific component like OpenStack, Helm, or K8s.
Terraform is very well documented and supported, please read through the docs if you're new to Terraform.
The Terraform Helm provider supports two ways of declaring Helm Chart values, either by values
file:
values = [
"${file("rucio/values-server.yaml")}"
]
or by setting single values:
set {
name = "config.database.default"
value = data.kubernetes_secret_v1.rucio_db_secret.data.dbconnectstring
}
so an entire Helm Chart definition with Terraform could look like this:
resource "helm_release" "rucio-server-chart" {
name = "rucio-server-${var.resource-suffix}"
repository = "https://rucio.github.io/helm-charts"
chart = "rucio-server"
version = "1.30.0"
namespace = var.ns-rucio
values = [
"${file("rucio/values-server.yaml")}"
]
set {
name = "config.database.default"
value = data.kubernetes_secret_v1.rucio_db_secret.data.dbconnectstring
}
}
Flux was installed manually via: flux bootstrap github --owner=vre-hub --repository=vre --branch=main --path=infrastructure/cluster/flux --author-name flux-ops
Manifests inside the path infrastructure/cluster/flux-v2
will be automatically deployed to the VRE cluster.
Refer to the official flux docs for information on adding manifests e. g. helm charts and kustomizations.
EOS is mounted through a daemon set to labelled nodes and allows the mount propagation to the users in JupyterLab.
For specific setup instructions, refer to the EOS Docs and the CERN K8s EOS Docs. Important is that the OAuth token as at least rw (chmod 0600
) permission set.
CERN VRE Technical Documentation ©CERN 2023