Kubernetes - Backup/Restore etcd - RaspberryPI - Rock64 (arm/amd64)

(**) Translated with www.DeepL.com/Translator

Backup script to set up on the master

Name of the file : /usr/local/bin/kubernetes-backup-etcd.sh

Add this content :

#!/bin/bash
DIRBACKUP="/backup-etcd/snapshots/"
DEFAULTETCD="/backup-etcd/snapshots/default.etcd"
# prune in days
PRUNE=5
DEBUG=0
# Api etcd V3
export ETCDCTL_API=3 

if [ ! -d "$DIRBACKUP" ];then
  mkdir -p "$DIRBACKUP"
fi

if [ -d "$DEFAULTETCD" ];then
  rm -rf "$DEFAULTETCD"
  if [ "$?" != "0" ];then
    exit 1
  fi 
fi
# Backup
etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --endpoints=https://127.0.0.1:2379 --key=/etc/kubernetes/pki/etcd/server.key snapshot save "$DIRBACKUP/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db" > /dev/null

if [ "$?" == "0" ];then
  # Prune
  FILESTODELETE=$(find "$DIRBACKUP" -mtime +$PRUNE)
  if [ "$FILESTODELETE" != "" ];then
    if [ "$DEBUG" == "1" ];then
      echo "-----------------"
      echo "Deleting obsolete files ($PRUNE days)"
      echo "-----------------"
    fi
    for F in $FILESTODELETE
    do
      if [ "$DEBUG" == "1" ];then
        echo "Deletion of $F"
      fi
      rm -f "$F"
      if [ "$?" != "0" ];then
        echo "[ERR] - impossible to delete the file $F"
        exit 1
      fi
    done
  fi
fi

Job (cron task)

Personally this script is executed once a day (increase the frequency when you do tests) :

vi /etc/cron.d/kubernetes-backup

# Backup etcd master
0 23 * * * root /usr/local/bin/kubernetes-backup-etcd.sh

## To restore Before running the script make sure the backup job is not running during the restore (comment out the line).

On the master server run this script :

# !!!!this directory is where you save the snapshots 
# (see the shell above : /usr/local/bin/kubernetes-backup-etcd.sh )
DIRBACKUP="/backup-etcd/snapshots/"
# stop kubelet 
systemctl stop kubelet
# delete dir /var/lib/etcd
rm -r /var/lib/etcd
cd "$DIRBACKUP"
# delete all files ending in .part (partial backups not usable)
rm *.part
# Retrieving the most recent file
FILETORESTORE=$(ls -l -t | head -2 | awk '{print $9}' | xargs)
# restore
ETCDCTL_API=3 etcdctl snapshot restore './$FILETORESTORE'
mv default.etcd /var/lib/etcd
# start kubelet
systemctl stop kubelet

Wait a few minutes and test with, for example, the command: kubectl get nodes… The cluster is UP again!!!

(**) Translated with www.DeepL.com/Translator