Skip to content

Infrastructure Management

Domenic Gosein edited this page Aug 9, 2023 · 14 revisions

The following contains information on the infrastructure and its deployment.

Cluster Setup

Resources are created either via Terraform or flux. See more details here.

Namespaces

Namespace Description
shared-services Namespace for shared resources
flux-system Namespace for flux
rucio-vre Namespace for rucio resources
reana Namespace for reana resources
jhub Namespace for JupyterHub

Networking and Firewall

The two necessary services, Rucio and JupyterHub, are exposed to the internet. The ingress controllers are part of a LandDB set; see here for documentation. To add nodes as Adding new nodes s additional ingresses, add them to the LanDB set VRE-INGRESS-NODES.

Certificates and Domains

While Rucio relies on CERN Host- and Grid Robot Certificates, we also have a globally valid Sectigo certificate covering the following domains:

  • vre.cern.ch (not used atm)
  • rucio-auth-vre.cern.ch (not used atm)
  • rucio-ui-vre.cern.ch (not used atm)
  • rucio-vre.cern.ch (not used atm)
  • reana-vre.cern.ch (not used atm)
  • jhub-vre.cern.ch (used for JupyterHub)

The TLS key and cert can be found in tbag.

In the future, certificates shall be issued automatically within the cluster through cert-manager, please refer to for more information on CERN setup.

Database

The VRE databases run in an instance on CERNs Database On Demand (DBOD). The instance is called vre and connections details can be found in tbag (each db has a dedicated r/w user for usage with an application - do not use the main admin user).

URL: https://dbod.web.cern.ch/ User Guide: https://dbod-user-guide.web.cern.ch/

At the moment, there are three central databases in the vre postgres instance:

  • rucio for the Rucio Helm Chart Installation rucio-cvre-*
  • reana for the REANA Helm Chart Installation reana-cvre-*
  • jhub for the Zero to JupyterHub Helm Installation jhub-cvre-*

The output of the PSQL \l command:

List of databases
   Name    |   Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider |   Access privileges   
-----------+-----------+----------+-------------+-------------+------------+-----------------+-----------------------
 admin     | <output-omitted>     | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
 dod_dbmon | <output-omitted> | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
 dod_pmm   | <output-omitted>    | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
 jhub      | <output-omitted>       | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
 postgres  | <output-omitted>   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
 reana     | <output-omitted>      | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | 
 rucio     | <output-omitted>      | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =Tc/admin            +
           |           |          |             |             |            |                 | admin=CTc/admin      +
           |           |          |             |             |            |                 | rucio=CTc/admin
 template0 | <output-omitted>  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =c/postgres          +
           |           |          |             |             |            |                 | postgres=CTc/postgres
 template1 | <output-omitted>   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =c/postgres          +
           |           |          |             |             |            |                 | postgres=CTc/postgres
(9 rows)

Additional databases may be added; see the PSQL section. This requires adding the host to pg_hba.conf (edit on the DBOD Dashboard - see DBOD docs and here:

..
# TYPE  DATABASE        USER            ADDRESS                 METHOD
host   <db>        <user>               0.0.0.0/0                      md5
..

Secrets Management

Create a sealed secret file running the command below:

kubectl create secret generic secret-name --dry-run=client --from-literal=foo=bar -o [json|yaml] | \
kubeseal \
    --controller-name=sealed-secrets-controller \
    --controller-namespace=kube-system \
    --format yaml > mysealedsecret.[json|yaml]

kubectl create -f mysealedsecret.[json|yaml]

In order to (1) encrypt the secret from a .yaml file, (2) save it to the github repository and (3) apply it to the cluster, run the following script by first determining the variables:

  • namespace: export NS="<your_ns>"
  • where is the .yaml containing your secret: export yaml_file_secret="<your_yaml_file>"
  • path to where you cloned this repo: export SECRETS_STORE="<path_to_github_dir_>/vre-hub/vre/iac/secrets/rucio/"
# kubeseal controller namespace
CONTROLLER_NS="shared-services"
CONTROLLER_NAME="sealed-secrets-cvre"
YAML_PRFX="ss_"

# name of output secret to apply
OUTPUT_SECRET=${SECRETS_STORE}${YAML_PRFX}${yaml_file_secret}

echo "Create and apply main server secrets"

cat ${yaml_file_secret} | kubeseal --controller-name=${CONTROLLER_NAME} --controller-namespace=${CONTROLLER_NS} --format yaml --namespace=${NS} > ${OUTPUT_SECRET}

kubectl apply -f ${OUTPUT_SECRET}

Services

There are two ways to expose as service to the intranet or internet (with firewall opening) (at CERN):

Option A: LoadBalancer

The LoadBalancer is created automatically (see LBaas CERN, kubernetes-service-type-loadbalancer and LBaaS) after that the URL needs to set as a description:

openstack loadbalancer set --description my-domain-name mylb

Some useful commands:

openstack loadbalancer list
openstack loadbalancer show <lb>
openstack server list
openstack server show <server>
openstack server unset --property <property> <server>
openstack server show <name>

To open the firewall for the services that have been created, make a firewall request.

These are our firewall openings details at CERN:

If you want to allow you to expose the services for getting a certificate from e.g. lets encrypt, follow this documentation, which will create some certificates to allow external traffic. The CERN e-group is in our case called ESCAPE-WP2-CERN-ACME-LANDBSET and we have a set associated to it called ESCAPE-WP2-CERN-ACME.

If you experience slow service issues, or you want to find out about LoadBalancing at CERN: -Firewall https://clouddocs.web.cern.ch/containers/tutorials/firewall.html -Session Persistence https://clouddocs.web.cern.ch/networking/load_balancing.html#session-persistence -Getting a keytab for a load balancer https://clouddocs.web.cern.ch/networking/load_balancing.html#getting-a-keytab-for-a-load-balancer

Option B: NodePort and Ingress, this requires several worker nodes set with role=ingress:

kubectl label <node-name> role=ingress
kubectl label <node-name> role=ingress

IaC

Infrastructure as code is used to create cluster resources. We use terraform and flux as tools for that purpose. The used terraform providers are listed below.

Terraform

Terraform supports many different providers that can be used to create resources for a specific component like OpenStack, Helm, or K8s.

Terraform is very well documented and supported, please read through the docs if you're new to Terraform.

Terraform Providers

Terraform OpenStack Provider

⚠️ The OpenStack terraform provider does not support changes to existing resources. Every change will result in a replacement operation!

Terraform Helm Provider

The Terraform Helm provider supports two ways of declaring Helm Chart values, either by values file:

  values = [
    "${file("rucio/values-server.yaml")}"
  ]

or by setting single values:

  set {
    name = "config.database.default"
    value = data.kubernetes_secret_v1.rucio_db_secret.data.dbconnectstring
  }

so an entire Helm Chart definition with Terraform could look like this:

resource "helm_release" "rucio-server-chart" {
  name       = "rucio-server-${var.resource-suffix}"
  repository = "https://rucio.github.io/helm-charts"
  chart      = "rucio-server"
  version    = "1.30.0"
  namespace  = var.ns-rucio

  values = [
    "${file("rucio/values-server.yaml")}"
  ]

  set {
    name = "config.database.default"
    value = data.kubernetes_secret_v1.rucio_db_secret.data.dbconnectstring
  }
}

Flux

Flux was installed manually via: flux bootstrap github --owner=vre-hub --repository=vre --branch=main --path=infrastructure/cluster/flux --author-name flux-ops

Manifests inside the path infrastructure/cluster/flux-v2 will be automatically deployed to the VRE cluster.

Refer to the official flux docs for information on adding manifests e. g. helm charts and kustomizations.

EOS

EOS is mounted through a daemon set to labelled nodes and allows the mount propagation to the users in JupyterLab. For specific setup instructions, refer to the EOS Docs and the CERN K8s EOS Docs. Important is that the OAuth token as at least rw (chmod 0600) permission set.

Clone this wiki locally