My NAS and Data Backup

I had experienced some painful accidents where hard drives failed and caused data loss, but luckily I didn’t lose any crucial files. I couldn’t afford to make the same mistake again, so I decided to add a backup mechanism to my existing storage system, and it has been effective. Recently, I made some adjustments to my data backup plan, and I’ll briefly record it here.

To enable fast photo preview and image processing, I saved the complete photo catalog of CaptureOne on my Apple Silicon MacBook. In order to save storage space, I use my own script to convert all ARW files to Adobe DNG files before importing. Even just compressing them once with zip can save a significant amount of space. Storing data on a Mac is obviously risky, as we all know that since MacBook with T2 chip encounters a problem, the data is gone forever. It’s unrealistic to hope that this kind of hardware, which will eventually fail, won’t fail as late as possible. Therefore, I periodically copy my catalog to my NAS using rsync.

As for NAS, it is actually a common Debian Server that has three 6TB HGST Ultrastar HC310 drives directly connected. All drives are formatted as ext4 and then merged into an 18TB union filesystem using GitHub: mergerfs. Multiple directories are created within the merged filesystem to categorize and store files.

1
2
3
4
5
6
7
8
% mount
/dev/sdc1 on /media/HGST2 type ext4 (rw,relatime,data=ordered)
/dev/sdd1 on /media/HGST1 type ext4 (rw,relatime,data=ordered)
/dev/sde1 on /media/HGST0 type ext4 (rw,relatime,data=ordered)
0:1:2 on /srv/NAS type fuse.mergerfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other)

% ls /srv/NAS
ISCSI  REDACTED  NTZYZ  PUBLIC  TIME_MACHINE

Obviously, this storage scheme where the total capacity equals the sum of disk capacities does not provide any redundancy. Essentially, it is the same as RAID0. However, mergerfs still has a slight advantage over RAID0. When using mergerfs, all underlying file systems and disks are independent ext4 systems, so it is easy to expand the storage capacity. Additionally, if any disk fails and goes offline, it will not result in the loss of all data. As long as there is a proper backup, it only requires the recovery of data from the damaged disk. The cost is not too high, only a slight loss in performance. For personal file storage purposes, there is not much demand for performance, especially considering it is only 1GbE at home.

The problem lies in the aspect of proper backup. The purpose of choosing mergerfs instead of RAID is to reduce costs. Setting up a 4U rack with a bunch of HDDs at home is too expensive and noisy. Moreover, RAID provides redundancy, not backup. Just in case of accidental deletion or falling victim to ransomware within the local network, the data would still be at risk. Building offsite backup on your own also does not significantly reduce costs, as it still requires several high-speed spinning disks in one place. So, in the end, the focus turns to similar object storage solutions provided by cloud service providers.

After some research, I found that Restic perfectly meets my requirements:

  1. It supports various data storage solutions, including one’s personal local filesystem or object storage like S3.
  2. It provides a snapshot-based backup mode, allowing me to view the historical versions of multiple backups of the same directory using snapshots.
  3. It supports incremental backups, only uploading the differences between multiple backups of the same directory.
  4. It supports encryption, ensuring that the cloud service provider hosting the data is unaware of what I have uploaded.

After solving the issue of finding the tool, the next step is to figure out where to store it. Initially, I found that restic can support Century Internet’s OneDrive for Business through rclone. So, I reluctantly subscribed to Office E3 for a year, considering that the cost of 1500CNY/year is still cheaper than buying object storage from Alibaba Cloud or Tencent Cloud separately (I have about 1TB of data to backup, and using S3 would cost $35USD per month). Using various cloud storage options would significantly reduce the cost, and the reason for not using Baidu Cloud is simple - I just dislike Baidu. The remaining services like OneDrive/Google Drive are not usable in China. Later, Alibaba Cloud joined the cloud storage market and provided a product called Aliyun Drive, where you can purchase 8TB of storage space for approximately 168CNY. Additionally, it is very convenient to find tools to bridge Aliyun Drive into WebDAV, such as the one I am using: GitHub: aliyundrive-webdav. Moreover, rclone, which is supported by restic, also supports the WebDAV backend, so the whole process is now complete and satisfactory.

Since all the dependency issues are resolved, the remaining task is to deploy this entire package to the NAS. The process is not complicated. After deploying and starting aliyundrive-webdav, use “rclone config” to add a new remote. Finally, initialize the repository with “restic init -r rclone:xxx:xxx”. With the help of systemd timer and your own script, you can achieve scheduled incremental backups. Of course, the initial upload is inevitably slow, but it will improve significantly in subsequent backups.

Here is the first uploaded vnstat record, which shows that 1.31TB was uploaded in three days, very smoothly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
ntzyz@nas-v2 ~ % vnstat -h

 ens18  /  hourly

         hour        rx      |     tx      |    total    |   avg. rate
     ------------------------+-------------+-------------+---------------
     2023-02-14
         15:00    606.69 MiB |   34.45 GiB |   35.04 GiB |   83.62 Mbit/s
         16:00    610.81 MiB |   34.25 GiB |   34.85 GiB |   83.16 Mbit/s
         17:00    623.61 MiB |   33.99 GiB |   34.60 GiB |   82.56 Mbit/s
         18:00    566.85 MiB |   29.15 GiB |   29.70 GiB |   70.87 Mbit/s
         19:00      7.76 GiB |   34.40 GiB |   42.16 GiB |  100.60 Mbit/s
         20:00     58.64 GiB |   35.24 GiB |   93.88 GiB |  224.01 Mbit/s
         21:00     43.52 GiB |   35.01 GiB |   78.52 GiB |  187.36 Mbit/s
         22:00     38.33 GiB |   35.04 GiB |   73.37 GiB |  175.07 Mbit/s
         23:00    576.09 MiB |   27.50 GiB |   28.06 GiB |   66.96 Mbit/s
     2023-02-15
         00:00    560.47 MiB |   26.72 GiB |   27.27 GiB |   65.06 Mbit/s
         01:00    652.31 MiB |   31.49 GiB |   32.13 GiB |   76.66 Mbit/s
         02:00    646.58 MiB |   32.78 GiB |   33.41 GiB |   79.73 Mbit/s
         03:00    628.83 MiB |   31.41 GiB |   32.02 GiB |   76.41 Mbit/s
         04:00    670.06 MiB |   33.76 GiB |   34.42 GiB |   82.12 Mbit/s
         05:00    675.27 MiB |   34.27 GiB |   34.93 GiB |   83.34 Mbit/s
         06:00    677.09 MiB |   34.08 GiB |   34.74 GiB |   82.90 Mbit/s
         07:00    699.17 MiB |   33.83 GiB |   34.52 GiB |   82.36 Mbit/s
         08:00    579.34 MiB |   26.45 GiB |   27.02 GiB |   64.46 Mbit/s
         09:00    524.03 MiB |   23.60 GiB |   24.11 GiB |   57.54 Mbit/s
         10:00    600.53 MiB |   29.05 GiB |   29.64 GiB |   70.73 Mbit/s
         11:00    662.55 MiB |   32.28 GiB |   32.93 GiB |   78.58 Mbit/s
         12:00    695.57 MiB |   33.62 GiB |   34.30 GiB |   81.83 Mbit/s
         13:00    715.18 MiB |   33.52 GiB |   34.22 GiB |   81.64 Mbit/s
         14:00     58.99 MiB |    2.78 GiB |    2.84 GiB |   81.31 Mbit/s
     ------------------------+-------------+-------------+---------------
comments powered by Disqus
Except where otherwise noted, content on this blog is licensed under CC-BY 2.0.
Built with Hugo
Theme Stack designed by Jimmy