Skip to content

Machine and k8s node Setup

benoit74 edited this page Oct 28, 2024 · 6 revisions

Machine / VM setup

  • Setup a Debian host on a single partition
  • comment-out swap entries in /etc/fstab
  • mount -a
  • timedatectl set-timezone UTC
  • apt update && apt upgrade -y
  • apt install -y vim screen rsync htop ncdu fail2ban tree
  • Edit /etc/motd with hostname ASCII
  • Enhance shell
cat <<EOF | tee -a /etc/profile.d/aliases.sh
export LS_OPTIONS='--color=auto'
eval "\$(dircolors)"
alias ls='ls $LS_OPTIONS'
alias ll='ls $LS_OPTIONS -l'
alias l='ls $LS_OPTIONS -lA'

#Some more alias to avoid making mistakes:
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
EOF
  • Add node's IP and DNS name to bastion's /etc/hosts (otherwise ProxyJump might fail when using DNS name to reach target)
  • Setup IP blocklist (from our bastion and WMF's)
cat <<EOF | tee -a /etc/hosts.allow
sshd: localhost
sshd: bastion.kiwix.org
sshd: 51.159.132.199
sshd: [2001:bc8:1200:570a::1]
sshd: 185.15.56.0/24
EOF
echo "sshd: ALL" | tee -a /etc/hosts.deny
  • if external block storage
lsblk -f
fdisk /dev/xxx # create a single primary Linux partition
mkfs.ext4 -L data /dev/sdb1
mkdir -p /data
echo "UUID=xxxxxxxxx /data ext4 defaults 0 0" | tee -a /etc/fstab
mount -a
ls -l /data

Configure k8s

  • execute Scaleway installation tool

(below is a copy of Scaleway procedure, program is also backed-up at https://dev.kiwix.org/node-agent_linux_amd64_2023_08_31)

Note that this is the new procedure (as of September 2023), only used on system and storage2 nodes for now. Other nodes (stats, services and storage) have been provisioned with an older tool/procedure.

  1. Retrieve the node-agent program (suffix can be either amd64 or arm64 depending on the node architecture).
wget https://scwcontainermulticloud.s3.fr-par.scw.cloud/node-agent_linux_amd64 && chmod +x node-agent_linux_amd64
  1. Export the required environment variables. Replace the values where needed.
export POOL_ID=4a574aa5-737e-4993-961a-1a8d629ee4ea POOL_REGION=fr-par SCW_SECRET_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  1. Execute the program to attach the node to the Multi-cloud pool.
sudo -E ./node-agent_linux_amd64 -loglevel 0 -no-controller
  • Await addition of node on Scaleway (status=ready != status=creating)
http https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER/nodes "X-Auth-Token: $SCW_SECRET_KEY"
  • check node status on cluster
kubectl get nodes -o wide
  • set role labels on node
kubectl label node NODENAME node-role.kubernetes.io/MYROLE=true
kubectl label node NODENAME k8s.kiwix.org/role=MYROLE
  • enable kubelet service
systemctl enable kubelet.service

Special setup for dedicated machines on Hetzner

(configuration below is the one from storage2 node, to be adapted)

If not already, boot the server in Rescue mode.

Install OS with installimage from Hetzner:

  • enable only SSD disks
  • create only /boot + first LVM partition and logical volume for /
  • everything else does not need to be handled by installimage and will be done manually afterwards

Important items of the configuration

DRIVE1 /dev/nvme0n1
DRIVE2 /dev/nvme1n1
# DRIVE3 /dev/sda
# DRIVE4 /dev/sdb
# DRIVE5 /dev/sdc
# DRIVE6 /dev/sdd

SWRAID 1
SWRAIDLEVEL 1

HOSTNAME storage2

PART /boot ext3      1G
PART lvm   ssds_r1   250G

LV ssds_r1 root /    ext4  150G

This will create two raid1 partitions on nvme disks, one (1G, ext3) for /boot, one (250G) for lvm ssds_r1 VG. And inside ssds_r1 VG it will create a 150G LV root, formatted as ext4 and mounted for /. Rest of nvme disks is left unused by installimage just like HDDs.

Create /dev/nvme0n1p3 and /dev/nvme1n1p3 with fdisk:

  • fdisk /dev/nvme0n1
  • n (new partition)
  • p (primary)
  • 3 (third primary partition)
  • t (change type)
  • 3 (third primary partition)
  • fd (Linux RAID autodetected)
  • w (write changes)

Create /dev/md2 and /dev/md3 RAID

mdadm --create /dev/md2 --level=0 --raid-devices=2 /dev/nvme0n1p3 /dev/nvme1n1p3
mdadm --create /dev/md3 --level=6 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd

Target: 356133544-019b33aa-798a-40d8-bce3-10f72b73402d excalidraw

Configure LVM:

vgcreate hdds /dev/md3
lvcreate -L 30T -n data hdds

Create /data FS:

mkfs.ext4 /dev/hdds/data

Modify fstab and reload

  • create mount point mkdir /data
  • add /dev/hdds/data /data ext4 defaults 0 0 to /etc/fstab
  • systemctl daemon-reload / mount -a

Final fstab content is something like this:

proc /proc proc defaults 0 0
UUID=f2d49190-2037-4734-b094-aa7160f8bab3 /boot ext3 defaults 0 0
/dev/ssds_r1/root  /  ext4  defaults 0 0
/dev/hdds/data  /data  ext4  defaults 0 0

Configure LVM for SSD cache:

pvcreate /dev/md2
vgextend hdds /dev/md2
lvcreate -L 1.5G -n cache-meta hdds /dev/md2
lvcreate -l 359035 -n cache hdds /dev/md2
lvconvert --type cache-pool --poolmetadata hdds/cache-meta hdds/cache
lvconvert --type cache --cachepool hdds/cache --cachemode writethrough hdds/data

Why 359035 extents in third command? Because we need to keep additionally 384 extents for internal LVM metadata (384 found by running lvcreate -l 100%FREE to create the cache LV, and the lvconvert fails with an error speaking about these 384 missing extents), and ssds_r0 has 359419 free extents after creation of cache-meta (visible with vgdisplay ssds_r0).

You can then check status with lvs --all --options +devices hdds

  LV                  VG   Attr       LSize  Pool          Origin       Data%  Meta%  Move Log Cpy%Sync Convert Devices
  [cache_cpool]       hdds Cwi---C--- <1.37t                            0.09   0.38            0.00             cache_cpool_cdata(0)
  [cache_cpool_cdata] hdds Cwi-ao---- <1.37t                                                                    /dev/md2(384)
  [cache_cpool_cmeta] hdds ewi-ao----  1.50g                                                                    /dev/md2(0)
  data                hdds Cwi-aoC--- 30.00t [cache_cpool] [data_corig] 0.09   0.38            0.00             data_corig(0)
  [data_corig]        hdds owi-aoC--- 30.00t                                                                    /dev/md3(0)
  [lvol0_pmspare]     hdds ewi-------  1.50g                                                                    /dev/md3(7864320)

Should you need / want to remove the cache, this is pretty straightforward when needed (and then you can lvremove / vgreduce / pvremove if needed):

lvconvert --splitcache hdds/data