
On Sat, 11 Feb 2012, Daniel Pittman <daniel@rimspace.net> wrote:
There are some resources that suggest that one needs 2GB per TB of storage with deduplication [i] (in fact this is a misinterpretation of the text). In practice with FreeBSD, based on empirical testing and additional reading, it's closer to 5GB per TB.
http://wiki.freebsd.org/ZFSTuningGuide#Deduplication
I am inclined to take their comment seriously, given reports from the few people I know who did try that. (OTOH, 64GB of RAM is US $379 for four consumer grade sticks, so you could realistically use that for a 12TB pool in hardware that is reasonably affordable.)
You could use 64G of RAM for deduplication on 12TB of disk, that would save some amount of storage and also increase cache efficiency. But you could use the 64G of RAM for caching and just buy some extra disk. If you have a large number of DomUs then they probably aren't loaded up with a full KDE or GNOME environment. A basic DomU can have as little as 400M of data (when excluding /var) - based on inspecting one of my servers. So if I was to use deduplication for DomUs then I might need more than 24,000 to fill the 12TB of storage before deduplication. There are other ways that deduplication could really do some good, such as when a large MS-Word document is attached to an email that is sent to everyone in a company. Of course you would probably have to remove the Delivered-To: header to get a good result with that (I'm guessing that ZFS deduplication only works with block aligned data). But there are other solutions to that problem like using my maildir-bulletin program or some other delivery agent that is specific to the case of mass local mail-outs. Given modern storage prices I find it difficult to imagine a situation where deduplication does some real good but where it can't be done better at the application level. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/