Re: [MLUG] Advice needed about ZFS

30 Aug 2020

      On Wed, Aug 26, 2020 at 10:43:30AM +0000, stripes theotoky wrote:
...
...
I would suggest looking at something like a Network Attached Storage
device, with multiple drives in a suitable RAID array.
This is the ultimate plan to build a NAS from an HP Microserver. I am
leaning towards Nas4Free on an SSD or internal USB and 3, 6TB mirrors. This
is a project that has to wait because right now due to Covid19 and
Brexit we are not sure where we are.  I am here and can't leave but
expecting to be out of work (which won't stop my research), my husband is
British/Australian, resident in Austria to avoid Brexit but is stranded by
Covid in Greece. When it all settles down and we have a home again building
this NAS is going to be pretty high on the list of things to do.
In the meantime, you can use a largish (>= 4 or 6 TB) external USB drive set
up to be a ZFS pool for backups.

Then 'zfs send' your snapshots to the USB drive, and keep a multi-year
snapshot history on them.  Aggressively expire the snapshots in your laptop to
minimise the amount of space they're taking.

You can have multiple USB backup drives like this - each one has to be
initialised with a full backup, but can then be incrementally updated with
newer snapshots.  Each backup pool should have a different name - like
backup1, backup2, etc.

You can automate much of this with some good scripting, but your scripts
will need to query the backup destination pool (with 'zfs list') to find out
what the latest backup snapshot on it is.  Incremental 'zfs send' updates
send the difference between two snapshots, so you need to know what the
lastest snapshot on the backup pool is AND that snapshot has to sill exist
on the source pool.

You should use a different snapshot naming scheme for the backup snapshots.
If your main snapshots are "@zfs-autosnap-YYYYMMDD" or whatever, then use
"@backup-YYYYMMDD".  Create that snapshot, and use it for a full zfs send,
then create new "@backup-YYYYMMDD" snapshots just before each incremental
send.

e.g. the initial full backup on a pool called "source" to a pool called
"backup", if you had done it yesterday:

    zfs snapshot source@backup-20200829
    zfs send -v -R source@backup-20200829 | /sbin/zfs receive -v -d -F backup

and to do an incremental backup of *everything* (including all snapshots
created manually or by zfs-autosnap) from @backup-20200829 to today between
the same pools:

    # source@backup-20200829 already exists from the last backup, no need to create it.
    zfs snapshot source@backup-20200830
    zfs send -R -i source@backup-20200829 source@backup-20200830 | zfs receive -v -u -d backup

** NOTE: @backup-20200829 has to exist on both the source & backup pools **

Unless you need to make multiple backups to different pools, you can delete
the source@backup-20200829 snapshot at this point because the next backup will
be from source@backup-20200830 to some future @backup-YYYYMMDD snapshot.

BTW, you don't have to backup to the top level of the backup pool. e.g. to
backup to a dataset called "mylaptop" on pool backup:

    zfs create backup/mylaptop
    zfs snapshot source@backup-20200829
    zfs send -R -i source@backup-20200829 source@backup-20200830 | zfs receive -v -u -d backup/mylaptop

(you'd do this if you wanted to backup multiple machines to the same backup
drive. or if you wanted to use it for backups AND for storage of other stuff
like images or videos or audio files).

and, oh yeah, get used to using the '-n' aka '--dry-run' and '-v'/'--verbose'
options with both 'zfs send' and 'zfs receive' until you understand how they
work and are sure they're going to do what you want.

NOTE: as a single drive vdev, there will be no redundancy in the USB backup
drive. but I'm guessing that since you're using a laptop, it's probably also
a single drive and that you're only using ZFS for the auto compression and
snapshot capabilities.  If you want redundancy, you can always plug in two USB
drives at a time and set them up as a zfs mirrored pool, but then you have to
label them so that you know which pairs of drives belong together

This is not as good as a NAS but it's cheap and easy and a lot better than
nothing.

I recommend using USB drive adaptors that allow you to use any drives in them
(i.e. USB to SATA adaptors), not pre-made self-contained external drives (just
a box with a drive in it and a USB socket or cable).

Sometimes you see them with names like "disk docking station", with a power
adaptor, a USB socket, and SATA slots for 1, 2, or 4 drives.  Other forms
include plain cables with a USB plug on one end and a SATA socket on the
other.

craig

ps: If your backup pool was on some other machine somewhere on the internet,
you can pipe the zfs send over ssh. e.g.

    zfs send -R -i source@backup-20200829 source@backup-20200830 | ssh remote-host zfs receive -u -d poolname/dataset

The pool on your laptop is probably small enough that you could do the initial
full backup over the internet too, but I wouldn't want to do a multi-terabyte
send from a home connection.

daily 'zfs send's of a few gigabytes or so should be no problem at all.

Also, the data stream from 'zfs send' can be piped into gpg to encrypt it
and then just redirected to a file on a dropbox or google drive or similar
account. or sent via ssh to any machine you have a shell account on.  To
restore from them, decrypt them with gpg, and pipe into 'zfs receive' to
restore to an appropriate pool.

    zfs send -R -i source@backup-20200829 source@backup-20200830 | gpg ... > /path/to/dropbox/backups/20200830.gpg

or

    zfs send -R -i source@backup-20200829 source@backup-20200830 | gpg ... | ssh remotehost cat > ./backups/20200830.gpg

--

craig sanders <cas@taz.net.au>

Re: [MLUG] Advice needed about ZFS

Craig Sanders