Skip to content

Commit

Permalink
Merge pull request #1 from nds-org/NDS-752
Browse files Browse the repository at this point in the history
NDS-752: etcd json tree backup
  • Loading branch information
craig-willis authored May 12, 2017
2 parents f89c2e5 + d9e653a commit 2d9894c
Show file tree
Hide file tree
Showing 12 changed files with 271 additions and 73 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
env.sh
40 changes: 22 additions & 18 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,28 +1,32 @@
FROM debian:jessie

#
# Add the tools
# Install wget/ssh/cron/vim/pip via apt, and etcdumper via pip
#
RUN \
apt-get -y update && apt-get -y install \
wget \
bash vim-tiny \
cron \
openssh-client \
xfsdump && \
apt-get -y autoremove &&\
apt-get -y autoclean &&\
apt-get -y clean all &&\
rm -rf /var/cache/apk/*
RUN apt-get -qq update && \
apt-get -qq install --no-install-recommends \
wget \
vim \
cron \
openssh-client \
python-pip && \
pip install etcddump && \
apt-get -qq autoremove && \
apt-get -qq autoclean && \
apt-get -qq clean all && \
rm -rf /var/cache/apk/* /go

#
# kubectl
# Download kubectl binary
#
ADD http://storage.googleapis.com/kubernetes-release/release/v1.5.2/bin/linux/amd64/kubectl /usr/local/bin/kubectl
RUN chmod 555 /usr/local/bin/kubectl
ARG K8S_VERSION="1.5.2"
RUN wget --no-verbose http://storage.googleapis.com/kubernetes-release/release/v${K8S_VERSION}/bin/linux/amd64/kubectl -O /usr/local/bin/kubectl && \
chmod 555 /usr/local/bin/kubectl

COPY FILES.cluster-backup /
# Move scripts to WORKDIR
WORKDIR /root
RUN chmod 755 /usr/local/bin/* /etc/cron.d/*
CMD /usr/local/bin/entrypoint
COPY scripts/* ./
COPY crontab /etc/cron.d/backup
COPY Dockerfile entrypoint.sh /

CMD ["/entrypoint.sh"]
3 changes: 0 additions & 3 deletions FILES.cluster-backup/etc/cron.d/backup

This file was deleted.

44 changes: 0 additions & 44 deletions FILES.cluster-backup/usr/local/bin/backup

This file was deleted.

8 changes: 0 additions & 8 deletions FILES.cluster-backup/usr/local/bin/entrypoint

This file was deleted.

102 changes: 102 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Workbench Cluster Backup
Automated nightly backup for etcd, shared filesystem, and cluster info dump for Workbench on Kubernetes

# Prerequisites
To build:
* Docker

To run:
* A remote machine's credentials: username / ssh key / hostname
* Kubernetes

# Build
The usual `docker build` command:
```bash
docker build -t ndslabs/cluster-backup:latest .
```

# Automated Backups
This container comes with cron installed, and a crontab file that will run backup.sh nightly.

There are two ways to run this container:
* Kubernetes (supported / recommended)
* Docker (unsupported, but theortically possible)

## Via Kubernetes
Create a Kubernetes secret named `backup-key` from the SSH key used to access the recipient of the backups:
```bash
kubectl create secret generic backup-key --from-file=ssh-privatekey=/path/to/backup.pem
```

Then modify `cluster-backup.yaml` to adjust `BACKUP_HOST` and `BACKUP_USER` to your liking and run:
```bash
kubectl create -f cluster-backup.yaml
```

## Via Docker
You will need to provide quite a few parameters to use this image without Kubernetes:
* `-v /path/to/your.pem:/root/.ssh/backup.pem`: Mount the .ssh key to access the backup machine into the container
* `-v /var/glfs:/var/glfs`: Mount the GlusterFS filesystem from the host into the container
* `-e ETCD_HOST`: The hostname of the etcd instance to back up
* `-e ETCD_PORT`: The port of the etcd instance to back up
* `-e HOSTNAME`: A short identifier for your cluster
* `-e BACKUP_HOST`: The hostname of the remote machine which will accept backups
* `-e BACKUP_USER`: The username to use to connect to the remote backup machine
* `-e BACKUP_KEY`: The path to the .pem file that we mounted above with `-v`
* `-e BACKUP_SRC`: The source path of the directory we wish to back up
* `-e BACKUP_DEST`: The destination path on the remote machine where we wish to store backups

NOTE: the kubectl dump portion of the backup will obviously fail, since your are not running under Kubernetes in this instance.

```bash
docker run -d -it -v /path/to/your.pem:/root/.ssh/backup.pem -v /var/glfs:/var/glfs -e BACKUP_USER=centos -e BACKUP_HOST=xxx.xxx.xxx.xxx -e BACKUP_KEY=/root/.ssh/backup.pem -e BACKUP_SRC=/var/glfs -e BACKUP_DEST=/ndsbackup -e ETCD_HOST=xxx.xxx.xxx.xxx -e ETCD_PORT=4001 -e HOSTNAME=cluster-name ndslabs/cluster-backup:latest bash
```

# List the Available Backups
```bash
./list-backups.sh
````

This will list all of the backups that exist on the remote machine for the given HOSTNAME:
```bash
Listing known backups for nds752:
17-04-29.2228
```

# Retrieve a Backup
```bash
./retrieve-backup.sh 17-04-29.2228
```

This will download the set of three "backup" files:
* `etcd-backup.json`: A backup of the Workbench etcd data - service catalog, users, and their added applications
* `glfs-state.tgz`: A backup of the shared cluster filesystem - the glusterfs volumes backing the users' application
* `kubectl.dump`: A verbose set of YAMLs / available log pod output from the Kubernetes API server useful for debugging (broken in Kubernetes 1.5.1)
```bash
Retrieving backup 17-04-29.2228 for nds752:
17-04-29.2228-etcd-backup.json
17-04-29.2228.glfs-state.tgz
17-04-29.2228-kubectl.dump
```
# Restore GLFS from Backup
Untar the glfs dump:
```bash
sudo tar zxvf ./17-04-29.2228.glfs-state.tgz -C /tmp
```
I recommend copying any inconsistent data from `/tmp` by hand.
WARNING: `C /` will extract over the existing glfs data
# Restore ETCD from backup
```bash
etcdumper --file=17-04-29.2228/17-04-29.2228-etcd-backup.json restore ${ETCD_HOST}:${ETCD_PORT}
```
NOTE: This is currently broken... we are investigating replacements for the `etcdumper` tool.
# Gotchas
* cron hates environment variables
* although the scripts will retrieve a set of backup files, the "restore" process is completely manual to avoid mishaps
47 changes: 47 additions & 0 deletions cluster-backup.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
apiVersion: v1
kind: ReplicationController
metadata:
name: cluster-backup
labels:
app: cluster-backup
spec:
template:
metadata:
labels:
name: cluster-backup
spec:
hostNetwork: true
containers:
- image: ndslabs/cluster-backup:latest
imagePullPolicy: Always
name: cluster-backup
env:
- name: ETCD_HOST
value: $(NDSLABS_ETCD_SERVICE_HOST)
- name: ETCD_PORT
value: $(NDSLABS_ETCD_SERVICE_PORT)
- name: BACKUP_SRC
value: "/var/glfs"
- name: BACKUP_DEST
value: /ndsbackup
- name: BACKUP_HOST
value:
- name: BACKUP_USER
value:
- name: BACKUP_KEY
value: "/etc/backup-key/ssh-privatekey"
volumeMounts:
- name: backup-src
mountPath: /var/glfs
- name: backup-key
readOnly: true
mountPath: /etc/backup-key/
volumes:
- name: backup-src
hostPath:
path: /var/glfs
- name: backup-key
secret:
secretName: backup-key
defaultMode: 0600
4 changes: 4 additions & 0 deletions crontab
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Run backup script nightly at 5:00AM UTC
0 5 * * * root env - $(cat /root/env.sh) /root/backup.sh >> /var/log/cron.log 2>&1


17 changes: 17 additions & 0 deletions entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash
LOG=/dev/cron.log
touch ${LOG}
ln -s ${LOG} /var/log/cron.log
cron

echo 'Automated cluster backups now running:'
echo "BACKUP_SRC: ${BACKUP_SRC}"
echo "BACKUP_HOST: ${BACKUP_HOST}"
echo "BACKUP_KEY: ${BACKUP_KEY}"
echo "BACKUP_USER: ${BACKUP_USER}"
echo "BACKUP_DEST: ${BACKUP_DEST}"

# cron doesn't get the same envs, so we use a trick to inject them
env > /root/env.sh && tail -f ${LOG}


36 changes: 36 additions & 0 deletions scripts/backup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
[ $DEBUG ] && set -x

# XXX: Set this to "echo" to for a dry-run
DEBUG=""

# Grab date / cluster name
DATE=$(date +%y-%m-%d.%H%M)
IFS='-' read -ra HOST <<< "${HOSTNAME:-localhost}"
TARGET_PATH="${BACKUP_DEST:-/ndsbackup}/${HOST[0]}/${DATE}"

echo "Backup started: ${DATE}"

# Use the above to build our base commands
SSH_ARGS="-i ${BACKUP_KEY:-backup.pem} -o StrictHostKeyChecking=no "
SSH_TARGET="${BACKUP_USER:-centos}@${BACKUP_HOST:-localhost}"

# Ensure data dir exists remotely
$DEBUG ssh ${SSH_ARGS} ${SSH_TARGET} "mkdir -p ${TARGET_PATH}"

# Dump shared BACKUP_SRC state
$DEBUG tar czf - ${BACKUP_SRC} | $DEBUG ssh ${SSH_ARGS} ${SSH_TARGET} "cat - > ${TARGET_PATH}/${DATE}.glfs-state.tgz"

# Dump etcd state
$DEBUG /usr/local/bin/etcdumper dump http://${ETCD_HOST:-localhost}:${ETCD_PORT:-2379} --file /tmp/${DATE}-etcd-backup.json
$DEBUG scp ${SSH_ARGS} /tmp/${DATE}-etcd-backup.json ${SSH_TARGET}:${TARGET_PATH}/${DATE}-etcd-backup.json

# Dump Kubernetes cluster state
# TODO: Verify kubeconfig is correct / present
# FIXME: kubectl cluster-info dump is currently incomplete, as it relies on the broken kubectl logs
# FIXME: See https://github.com/kubernetes/kubernetes/issues/38774
$DEBUG /usr/local/bin/kubectl cluster-info dump | $DEBUG ssh ${SSH_ARGS} ${SSH_TARGET} sudo "cat - > ${TARGET_PATH}/${DATE}-kubectl.dump"

echo "Backup complete: ${DATE}"

# TODO: Delete local backups after successful transfer?
21 changes: 21 additions & 0 deletions scripts/list-backups.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
[ $DEBUG ] && set -x

# XXX: Set this to "echo" to for a dry-run
DEBUG=""

# Grab date / cluster name
DATE=$(date +%y-%m-%d.%H%M)
IFS='-' read -ra HOST <<< "${HOSTNAME:-localhost}"
TARGET_PATH=${BACKUP_DEST:-/ndsbackup}/${HOST[0]}

echo "Listing known backups for ${HOST[0]}:"

# Use the above to build our base commands
SSH_ARGS="-i ${BACKUP_KEY:-backup.pem} -o StrictHostKeyChecking=no "
SSH_TARGET="${BACKUP_USER:-centos}@${BACKUP_HOST:-localhost}"

# Check contents of remote backup directory
$DEBUG ssh ${SSH_ARGS} ${SSH_TARGET} "ls -l ${TARGET_PATH}" | awk '{print $9}' | grep -v -e '^$'


21 changes: 21 additions & 0 deletions scripts/retrieve-backup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
[ $DEBUG ] && set -x

# XXX: Set this to "echo" to for a dry-run
DEBUG=""

# Grab date / cluster name
IFS='-' read -ra HOST <<< "${HOSTNAME:-localhost}"
TARGET_PATH=${BACKUP_DEST:-/ndsbackup}/${HOST[0]}/$1

echo "Retrieving backup $1 for ${HOST[0]}: ${DATE}"

# Use the above to build our base commands
SSH_ARGS="-i ${BACKUP_KEY:-backup.pem} -o StrictHostKeyChecking=no "
SSH_TARGET="${BACKUP_USER:-centos}@${BACKUP_HOST:-localhost}"

# Retrieve contents of remote backup from the given string
$DEBUG scp -r ${SSH_ARGS} ${SSH_TARGET}:${TARGET_PATH} $(pwd)/$1



0 comments on commit 2d9894c

Please sign in to comment.