
On Thu, 5 Apr 2012, Craig Sanders wrote:
Of course if you want to use large SATA disks with SSD and other forms of cache then things are totally different.
true. and for de-duping, ZFS will use your L2ARC (e.g. SSD) as well as your ARC (RAM) for the dupe hash tables. I still don't think it's worthwhile in the general case.
8GB sticks are cheap enough now that I could upgrade my home server from 16GB to 32GB for not too much money...but even though one of my zpools has a LOT of duplicate data (rsync backups of linux systems) I still don't think it's worth the bother. i'd rather use that extra RAM for disk caching or for VMs. and upgrade the backup zpool from 4x1TB to 4x2TB. or just save the money and wait for the inevitable improvements :)
Or use the proper software for the job. backuppc with rsync already dedups, and since it knows the use-case it operates on (it knows files are going to be identical between backups, and it knows that individual blocks are not likely to be duplicates and hence are irrelevant to test against), it can do it with far fewer resources (my backuppc server is a laptop with an esata disk, and the retention time is far greater than the time I've had it in operation. My oldest backups are from December 2008) -- Tim Connors