Safeguarding against ransomware and data loss

By Rainer Wichmann rainer@nullla-samhna.de    (last update: Jan 22, 2020)

Data loss has always been a risk underestimated by many. Students have lost their thesis work because of a stolen laptop or a disk failure. Companies go out of business because of unrecoverable data loss. People lose important personal data, or digital pictures of beloved ones, because of coffee spilled over their computers. And in recent years, the threat of ransomware has exacerbated the risk of data loss.

Ransomware is malware that, after infecting your computer, will encrypt your data — thus making it inaccessible —, and demand a ransom to be paid for the key required to decrypt the data again. Often, these are targeted attacks against high-profile targets, i.e. companies or public institutions who are able to pay large ransoms. But private users or small companies are equally at risk. If you are using an outdated browser, then just visiting a website contaminated by malware may infect your computer. Likewise, opening a malicious email attachment may have the same effect.

Safeguarding with backups (on Linux)

The best way to safeguard against data loss — whatever the reason of it —, is to have backup copies. I'm using the plural because you should really have at least two, stored at different physical locations (in case of e.g. a fire breaking out at one location). Actually, you may even want to keep several backups taken at different times, so if you discover that you accidentally lost an important file sometime last month, you can find it in the backup taken the month before.

You may think this sounds quite tedious and requires a lot of disk space, but the purpose of this article is to show you that it is actually quite simple to do this on Linux. It even doesn't need an awful amount of disk space, thanks to hardlink copies.

What are hardlinks and how can you use them for backups?

Firstly, a file really is made of two parts:

  1. the file content — a blob of data stored somewhere on the disk
  2. the directory entry — giving the name of the file and pointing to the (start of) the data on the disk

Usually, when you copy a file, both parts will be copied, so you end up with another blob of data on the disk (using as much disk space as the original one), and a new directory entry pointing to that copy of the data. However, you may as well simply create a new directory entry, pointing to the same original data. This is what is called a hardlink.

 File and hardlink

Now, creating hardlink copies instead of copying the file content certainly helps to avoid wasting disk space while having multiple copies of your backup, taken at different times. There is still a problem, though: because all hardlinks of a file point to the same data, overwriting those data would change it in all your copies made at different times. So what you need is an additional step: if a file is changed, its backup needs to be done by breaking the hardlink to earlier backups and creating a fresh copy.

 File and old version

Fortunately, this is exactly what the rsync utility is doing: if a file has not changed, it will not be copied, but if it has changed then the hardlink pointing to the old data will be removed and a new, complete copy will be written. So, the basic cycle is:

  1. use rsync to create a backup copy of your data
  2. make a hardlink copy of the data (as a snapshot at the time of backup)
  3. repeat previous steps when you make a new backup
As a result, your backup copy will always contain the most recent version of your data, while snapshots of earlier dates will take up very little space, as they contain only the actual content of files that have changed between backups.

The simple backup solution

  • Buy two big, external SSD disks (remember, you want two backups at different locations).
  • Plug in the external disk, and format it as EXT4 (you can e.g. use the gnome-disks utility for that.)
  • Note the path to the disk (it should appear under the path /media/<yourname>/<diskname>.)
  • Copy the script below to a file backup.sh, make it executable (chmod +x backup.sh), and edit the first lines (replace /media/<yourname>/<diskname> with the actual path, and adjust the list of directories to backup to your needs).
  • Run the script: sudo ./backup.sh, which will perform a backup and umount the disk afterwards so you can unplug it.
  • Repeat for the other disk.
  • Remember to keep the disks in different locations.
  • Make it a habit to perform backups at fixed times (e.g. every evening, every Monday morning, or whatever suits your needs.)

The script will create two directories on the disk: backup_last, and backup_by_date. The former has the latest backup, the latter has all backups, labeled by date.


# This is the path to the backup disk

# Directories to backup (no space in path allowed)
backup_directories="/home /etc /root"


fatal () {
	echo "**ERROR $1" >&2
	exit 1


# -- Create directory structure on backup disk

if test -d "${backup_disk}"; then
	if test ! -d "${backup_current}"; then
		mkdir "${backup_current}" 
		test $? -eq 0 || fatal "mkdir ${backup_current}"
	if test ! -d "${backup_dates}"; then
		mkdir "${backup_dates}"
		test $? -eq 0 || fatal "mkdir ${backup_dates}"
	fatal "disk ${backup_disk} not present"

# -- Create directory labeled by current date

backup_now_date=`date '+%Y-%m-%d_%H:%M'`

mkdir "${backup_dates}/${backup_now_date}"
test $? -eq 0 || fatal "mkdir ${backup_dates}/${backup_now_date}"

# -- For each directory in list, backup with rsync, then make hardlink copy

echo "Start at $(date)"
for dir in ${backup_directories}
	rsync -a --delete $dir "${backup_current}/"
	test $? -eq 0 || echo "**ERROR backup of ${dir} failed" >&2

	cp -al "${backup_current}/${dir}" "${backup_dates}/${backup_now_date}/"
	test $? -eq 0 || echo "**ERROR hardlink copy of ${dir} backup failed" >&2

	echo "Backup of ${dir} finished"

# -- Sync and umount the disk

sync || fatal "sync"
umount "${backup_disk}" || fatal "umount ${backup_disk}"
echo "End at $(date '+%Y-%m-%d_%H:%M')"
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Germany License.