
On Tue, Jul 16, 2013 at 05:49:49PM +1000, Matthew Cengia wrote:
it's also useful on backup servers where you end up with dozens or hundreds of copies of the same files (esp. if you're backing up entire systems, including OS)
Are you actually talking about retroactive deduplication here, or just COW? IMHO taking a little extra care when copying VM images or taking backups and ensuring use of snapshots and/of --reflink are usually good enough as opposed to going back and hunting for duplicate data.
neither. zfs has a de-dupe attribute which can be set. it then keeps a table of block hashes so that blocks about to be written which have a hash already in the table get substituted with a pointer to the original block. Like the compression attribute, it only affects writes performed after the attribute has been set so it isn't retroactive. NOTE: de-duplication takes enormous amounts of RAM, and slows down write performance significantly. it's really not worth doing unless you are certain that you have a very large proportion of duplicate data....and once it has been turned on for a pool, it can't be turned off without destroying, re-creating, and restoring the pool (technically, you can turn off the dedup attribute but you'll still be paying the overhead price for it but without any benefit). Here's the original post announcing de-duplication in zfs, from 2009 (still Sun, but old Sun webpages are now on oracle.com) https://blogs.oracle.com/bonwick/entry/zfs_dedup for more details, see http://zfsonlinux.org/docs.html which contains links to: . Illumos ZFS Documentation . ZFS on Linux User Guide . Solaris ZFS Administration Guide there are also numerous blog posts (of varying quality and cluefulness) describing how to do stuff with zfs or how something in zfs works, just a google search away.
That of course assumes you're using a relative size. If you're using an absolute size this is far less obvious. That's the other thing: lvextend -L 32G on a 64G LV will do nothing, as would lvreduce -L 64G on a 32G LV. This is useful when ensuring an LV meets minimum size requirements and saves significant (potentially buggy) testing code.
or i could just use zfs where there is no risk of making a mistake like that. IIRC, zfs will refuse to set the quota below current usage, but even if you could accidentally set the quota on a subvol to less than its current usage, zfs isnt going to start deleting or truncating files to make them fit. it's a different conceptual model to LVM, anyway. With lvm you allocate chunks of disk to be dedicated to specific LVs. if you allocate too much to one LV, you waste space. if you allocate too little, you fill up the fs too quickly. detailed forward planning is almost essential if you don't want to be micro-managing storage (as in "i'm running out of space on /home so take 50GB from /usr and add it to /home") all the time. with zfs, it's a global pool of space usable by all filesystems created on it - the default quota is no quota, so any and all sub-volumes can use up all available space. you can set quotas so that subvolumes (and children of that subvolume) are limited to using no more than X amount of the total pool (but there's still no guaranteed/reserved space - it's all shared space from the pool). with zfs, if /home is running out of quota then I can just do something like 'zfs set quota=xxxx export/home' as long as there's still free space available in the pool. you can also set a 'reservation' where a certain amount of storage in the pool is reserved for just one subvolume (and its descendants). reserved storage still comes from the pool, but no other subvolume or zvol can use any of it. both reservations and quotas are soft - you can change them at any time, very easily and without any hassle. reservations are more similar to lvm space allocations than are quotas, but even reservations are flexible - you can increase, decrease, or unset them at whim. craig -- craig sanders <cas@taz.net.au> BOFH excuse #67: descramble code needed from software company