Re: bcache hits mainstream kernel

4 Jul 2013

      On Wed, 3 Jul 2013, Toby Corkindale <toby.corkindale@strategicdata.com.au> 
wrote:
...
...
...
Having looked at it a bit more, it seems better suited to the
SSD-caching scenario than ZFS; there are auto-tuning parameters in
bcache to detect at what point to just bypass the cache and go straight
to the disks, saving more cache room for blocks that will benefit.
This is precisely what the ZFS L2ARC is supposed to do.
...
And the write-ahead logging is limited only by the size of the
cache.(Whereas ZFS' ZIL can't grow very large)
I don't know enough about bcache writes to make a comparison, but the
maximum ZIL size would only be dictated by write throughput.
As I understand it, ZFS flushes the ZIL after at most five seconds.
FAQs recommend the ZIL be sized at 10x your backing disk(s)'s maximum
per-second write performance. (So if 200mbyte/sec, then a ZIL of 2GB)
So my understanding of that is that if you have a burst of small writes
to the ZIL, that the backing disks can't write out fast at all, then
you'll hit a wall in less than ten seconds.
http://en.wikipedia.org/wiki/ZFS#ZFS_cache:_ARC_.28L1.29.2C_L2ARC.2C_ZIL

One thing to note is that if you don't have separate devices for the ZIL 
(which could be a pair of SSDs or could be a pair of fast disks) then part of 
the zpool will be used as the ZIL.  So the writes that go into the ZIL will be 
using up your precious IO capacity of main storage (which probably isn't that 
great if you use RAID-Z) if you don't have an external ZIL.

http://en.wikipedia.org/wiki/ZFS#Copy-on-write_transactional_model

My understanding is that transaction groups that are committed to the zpool 
will be larger because the ZIL can allow them to be coalesced.  The general 
concept of log based filesystems is that you can increase performance by 
making all those small writes end up mostly contiguous on main storage.
...
Whereas, if I understand bcache's design correctly, it will continue to
write data to the SSD until it fills up, without a maximum dirty time.
Because it's accumulating more writes before streaming it to the backing
disks, there's a better chance of random writes being aggregated into
linear ones.
(From http://bcache.evilpiepirate.org/Design/ )
The problem here is that a filesystem like Ext4 isn't designed to make writes 
contiguous.  It is designed to have some metadata near the data (which reduces 
seeks and is a good thing) but it will always have some significant seeks.  
The elevator algorithm should be more efficient if it writes more data in each 
pass, but it still shouldn't compare with a filesystem that is designed for 
making small writes contiguous.

That said, last time I compared BTRFS to Ext4 with Bonnie++ Ext4 won easily 
for write performance.  When I used Postal the results were fairly similar.  
But that was a while ago and BTRFS has improved a lot since.

I'll have to do another test with a recent version of BTRFS and also see how 
it compares to ZFS.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: bcache hits mainstream kernel

Russell Coker