
On Wed, Oct 17, 2012 at 02:15:58PM +1100, Russell Coker wrote:
On Wed, 17 Oct 2012, Craig Sanders <cas@taz.net.au> wrote:
Note that this zvol has compression enabled - this would be a good choice for a mail server's storage disk - mail is highly compressible. depending on available RAM in the server and the kind of mail typically received (e.g. multiple copies of the same email), de-duping the zvol may also be worthwhile.
The last time I checked the average message size on a medium size mail spool it was about 70K.
compression would bring that down to (very roughly) an average of about 5-15K per message.
The headers are essentially impossible to dedup as they differ in the final stage of delivery even if a single SMTP operation was used to send to multiple local users. Deduping the message body seems unlikely to provide a significant benefit as there usually aren't that many duplicates, not even when you count spam and jokes -
the scenario I was thinking of was internal email memos sent to "all staff", with a stupidly large word .doc or .pdf file attached. for an ISP mail server, de-duping isn't likely to help much (if at all). For a small-medium business or corporate mail server, it could help a lot.
I'm assuming that ZFS is even capable of deduplicating files which have the duplicate part at different offsets, but I don't care enough about this to even look it up.
zfs de-duping is done at block level. if a block's hash is an exact match with another block's hash then it can be de-duped.
For every server I run that has any duplicate content RAM is a more limited resource than disk space. For example the server which is full of raw files from digital cameras is never going to benefit from dedup even though it has enough RAM to run it. So there's no possibility of me gaining anything from it.
me too. i don't use zfs de-dupe at all. it is, IMO, of marginal use. adding more disks (or replacing with larger disks) is almost always going to be cheaper and better. but there are some cases where it could be useful...so I don't want to dismiss it just because I have no personal need for it. Editing large video files, perhaps. multiple cycles of edit & versioned save would use not much more space than the original file + the size of the diffs. VMs are quite often touted as a good reason for de-duping - hundreds of almost identical zvols. I remain far from convinced that de-duping is the best use of available RAM on a virtualisation server, or that upgrading/adding disks wouldn't be better. craig -- craig sanders <cas@taz.net.au>