Re: cache, Xen, and zfsonlinux

17 Oct 2012

      On Wed, 17 Oct 2012, Craig Sanders <cas@taz.net.au> wrote:
...
...
The last time I checked the average message size on a medium size
mail spool it was about 70K.
compression would bring that down to (very roughly) an average of about
5-15K per message.
Yes, that could be a real win.

If multiple processes are doing synchronous writes at the same time does ZFS 
bundle them in the same transaction?  At busy times I have 12 processes doing 
synchronous delivery at the same time.  If ZFS was to slightly delay a couple 
of them to create a bundle of 5+ synchronous file writes in the same operation 
it could improve overall performance and save bandwidth for the occasional 
read from disk.
...
...
The headers are essentially impossible to dedup as they differ in
the final stage of delivery even if a single SMTP operation was used
to send to multiple local users.  Deduping the message body seems
unlikely to provide a significant benefit as there usually aren't
that many duplicates, not even when you count spam and jokes -
the scenario I was thinking of was internal email memos sent to "all
staff", with a stupidly large word .doc or .pdf file attached.
for an ISP mail server, de-duping isn't likely to help much (if at all).
For a small-medium business or corporate mail server, it could help a
lot.
True.  But if that sort of thing comprises a significant portion of your email 
then there are better ways of solving the problem.  A Wiki is often one part 
of the solution to that problem.
...
...
I'm assuming that ZFS is even capable of deduplicating files which
have the duplicate part at different offsets, but I don't care enough
about this to even look it up.
zfs de-duping is done at block level. if a block's hash is an exact
match with another block's hash then it can be de-duped.
So I guess it does no good for email then unless your MTA stores the 
attachments as separate files (IE not Maildir).
...
me too. i don't use zfs de-dupe at all. it is, IMO, of marginal use.
adding more disks (or replacing with larger disks) is almost always
going to be cheaper and better. but there are some cases where it could
be useful...so I don't want to dismiss it just because I have no
personal need for it.
Presumably the Sun people who dedicated a lot of engineering and testing time 
to developing the feature had some reason to do so.
...
Editing large video files, perhaps. multiple cycles of edit & versioned
save would use not much more space than the original file + the size of
the diffs.
For uncompressed video that could be the case.  One of my clients currently 
has some problems with that sort of thing.  They are using local non-RAID 
storage on Macs and then saving the result to the file server because of file 
transfer problems with files >2G.
...
VMs are quite often touted as a good reason for de-duping - hundreds
of almost identical zvols. I remain far from convinced that de-duping
is the best use of available RAM on a virtualisation server, or that
upgrading/adding disks wouldn't be better.
For a VM you have something between 500M and 5G of OS data, if it's closer to 
5G then it's probably fairly usage specific so less to dedup.  For most of the 
VMs I run the application data vastly exceeds the OS data so the savings would 
be at most 10% or less.

Not to mention the fact that the most common VM implementations use local 
storage which means that with a dozen VMs on a single system of which multiple 
distributions are in use there is little opportunity for dedup.

Finally if you have 5G OS images for virtual machines then you could have more 
than 500 such images on a 3TB disk, so even if you can save disk space it's 
still going to be easier and cheaper to buy more disk.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: cache, Xen, and zfsonlinux

Russell Coker