Re: Getting started with ZFS

16 Jul 2013

      On Tue, Jul 16, 2013 at 05:49:49PM +1000, Matthew Cengia wrote:
...
...
it's also useful on backup servers where you end up with dozens or
hundreds of copies of the same files (esp. if you're backing up entire
systems, including OS)
Are you actually talking about retroactive deduplication here, or just
COW? IMHO taking a little extra care when copying VM images or taking
backups and ensuring use of snapshots and/of --reflink are usually
good enough as opposed to going back and hunting for duplicate data.
neither. zfs has a de-dupe attribute which can be set. it then keeps a
table of block hashes so that blocks about to be written which have a
hash already in the table get substituted with a pointer to the original
block.

Like the compression attribute, it only affects writes performed after
the attribute has been set so it isn't retroactive.  

NOTE: de-duplication takes enormous amounts of RAM, and slows down write
performance significantly. it's really not worth doing unless you are
certain that you have a very large proportion of duplicate data....and
once it has been turned on for a pool, it can't be turned off without
destroying, re-creating, and restoring the pool (technically, you can
turn off the dedup attribute but you'll still be paying the overhead
price for it but without any benefit).

Here's the original post announcing de-duplication in zfs, from 2009
(still Sun, but old Sun webpages are now on oracle.com)

https://blogs.oracle.com/bonwick/entry/zfs_dedup

for more details, see http://zfsonlinux.org/docs.html which contains
links to:

 . Illumos ZFS Documentation
 . ZFS on Linux User Guide
 . Solaris ZFS Administration Guide

there are also numerous blog posts (of varying quality and cluefulness)
describing how to do stuff with zfs or how something in zfs works, just
a google search away.
...
That of course assumes you're using a relative size. If you're using
an absolute size this is far less obvious. That's the other thing:
lvextend -L 32G on a 64G LV will do nothing, as would lvreduce -L 64G
on a 32G LV. This is useful when ensuring an LV meets minimum size
requirements and saves significant (potentially buggy) testing code.
or i could just use zfs where there is no risk of making a mistake like
that.  IIRC, zfs will refuse to set the quota below current usage, but
even if you could accidentally set the quota on a subvol to less than
its current usage, zfs isnt going to start deleting or truncating files
to make them fit.

it's a different conceptual model to LVM, anyway.  With lvm you allocate
chunks of disk to be dedicated to specific LVs. if you allocate too much
to one LV, you waste space. if you allocate too little, you fill up the
fs too quickly. detailed forward planning is almost essential if you
don't want to be micro-managing storage (as in "i'm running out of space
on /home so take 50GB from /usr and add it to /home") all the time.

with zfs, it's a global pool of space usable by all filesystems created
on it - the default quota is no quota, so any and all sub-volumes can
use up all available space. you can set quotas so that subvolumes (and
children of that subvolume) are limited to using no more than X amount
of the total pool (but there's still no guaranteed/reserved space - it's
all shared space from the pool).

with zfs, if /home is running out of quota then I can just do something
like 'zfs set quota=xxxx export/home' as long as there's still free
space available in the pool.

you can also set a 'reservation' where a certain amount of storage in
the pool is reserved for just one subvolume (and its descendants).
reserved storage still comes from the pool, but no other subvolume or
zvol can use any of it.

both reservations and quotas are soft - you can change them at any time,
very easily and without any hassle. reservations are more similar to lvm
space allocations than are quotas, but even reservations are flexible -
you can increase, decrease, or unset them at whim.

craig

-- 
craig sanders <cas@taz.net.au>

BOFH excuse #67:

descramble code needed from software company

Re: Getting started with ZFS

Craig Sanders