On Thursday, 30 January 2020 6:05:56 PM AEDT Craig Sanders via luv-main wrote:
It really
depends on the type of data.
No, it really doesn't.
Backing up VM images via rsync is slow because
they always have relatively
small changes in the middle of large files.
rsyncing **ANY** large set of data is slow, whether it's huge files like VM
images or millions of small files (e.g. on a mail server).
Here's what I wrote previously:
# It really depends on the type of data. Backing up VM images via rsync is
# slow because they always have relatively small changes in the middle of
# large files. Backing up large mail spools can be slow as there's a
# significant number of accounts with no real changes as well as a good number
# of accounts with only small changes (like the power users who have 10,000+
# old messages stored and only a few new messages at any time because they
# delete most mail soon after it arrives). But even for those corner cases
# rsync will work if your data volume isn't too big. For other cases it works
# pretty well.
I've used rsync to backup mail spools with up to about 20,000 accounts. Not
big mail stores and only doing a backup twice a week. The regular backups
(for users deleting the wrong messages) were ZFS snapshots.
rsync has to check at least the file sizes and
timestamps, and then the
block checksums on every run. On large sets, this WILL take many hours, no
matter how much or how little has actually changed.
It's all a matter of scale.
I just did a test on a workstation with about 100G of storage in BTRFS. The
usual backups are weekly on Sunday night. A run now took 28 minutes (copying
5 days of data). A run immediately after (just rsync checking file dates) took
65 seconds. I could set that machine to have a backup every hour over the
Internet if I wanted to.
(a minor benefit of this is that if a file or
directory is moved to another
directory in the same dataset, the only blocks that actually changed were
the blocks containing the directory info, so they're the only blocks that
need be sent. rsync, however, would send the entire directory contents
Yes, that's good for that case. Not a common case I deal with.
because it's all "new" data. Transparent
compression also helps 'zfs send'
- compressed data requires fewer blocks to storer it....rsync, though,
can't benefit from transparent compression as it has to compare the source
file's *uncompressed* data with the target copy)
Rsync compares the checksums of the uncompressed data. Then sends compressed
data if you use the -z option, and if you have ssh configured to use
compression then that applies too.
rsync is still useful as a tool for moving/copying
data from one location to
another (whether on the same machine or to a different machine), but it's
no longer a good choice for backups. it just takes too long - by the time
it has finished, the source data will have changed. It's an improved "cp".
That depends on what you are backing up.
Rsync is a well known program, it doesn't require any special setup or
testing. The BTRFS and ZFS programs for sending changes would require more
testing.
I prefer to use the filesystem that's best for all
machines on the network.
If ZFS is in use on the file-server or backup-server, then that means zfs
on everything else. If it's btrfs on the server, then it should be btrfs on
everything.
Except if you have some systems storing large data that needs RAID-Z and some
systems that need the flexibility that BTRFS offers.
btrfs is not an option here because it just isn't
as good as zfs...if i'm
Unless you want to have a RAID-1 array that can have disks added to it or
removed from it at any time and of any size. This is a useful feature for a
home server and something ZFS doesn't support.
--
My Main Blog
http://etbe.coker.com.au/
My Documents Blog
http://doc.coker.com.au/