
On 02/07/13 10:33, Toby Corkindale wrote:
I can't be the only one who's been waiting for the bcache stuff to hit mainstream kernels. I rebooted into a stable 3.10 kernel yesterday. Due to the requirement to reformat disks, I haven't started using bcache yet. Is anyone else here already onto it? I'd be curious to hear how it compares to the zfs+l2arc setup some of us have been using previously.
bcache.txt from the linux kernel:
https://github.com/torvalds/linux/blob/master/Documentation/bcache.txt
I do wonder if this has landed a bit too late though. Back when they started, good SSDs were expensive and small; but now you can pick up relatively large and fast drives relatively cheaply. You can afford to use one as your primary drive, and just offload big media files to spinning drive arrays. (Which are fine for that access pattern of linear reads and writes)
Even the documentation is showing its age, using Intel X-25 drives as the example, which are now four years old.
I'm sure there's still a place for this technology when you don't *want* to have to manually choose where to store different categories of files - such as in NAS/storage appliances.
Some database loads might benefit, although for PostgreSQL at least you can (and should) configure it to use SSDs for the transaction logging and such anyway, which would give you most the benefits anyway.
Having looked at it a bit more, it seems better suited to the SSD-caching scenario than ZFS; there are auto-tuning parameters in bcache to detect at what point to just bypass the cache and go straight to the disks, saving more cache room for blocks that will benefit. And the write-ahead logging is limited only by the size of the cache.(Whereas ZFS' ZIL can't grow very large)
The cases I can see where bcache wins is for multiple VM's where there may be several linear writes in progress that turn out to be not linear because they are interleaved with other VM writes. And you say SSD's are cheap, but they still aren't cheap if you want TB's of data. TB's of data + 200GB bcache SSD is cheap though and then you don't have to think too hard about what data to place on your precious SSD. Also bcache has some awareness of underlying block device metrics, in particular RAID5 where it attempts to ensure that whole stripes are written out where possible. I don't use RAID5 anymore except in some very special circumstances that aren't performance heavy anyway, but it's good to know. Also bcache optimises the access patterns for SSD which in theory reduces the case of write amplification. I ran bcache a while ago and it was a bit of a struggle getting it all going (bcache defaults to 4K block size which Windows on Xen had problems with...) and I didn't go any further than a bit of testing because there was just too much overhead in patching my own kernel and keeping Xen going. Debian is now up to 3.9 in sid and 3.10-rcX in experimental, so hopefully I'll be able to get back to testing again. I hope PCI-E SSD's become affordable soon. I attended a Dell tech class ("Master Class") recently and they were talking about how their SAN's move "hot" extents (they call them pages) to faster storage and also try to balance it out so that each array has equal load by moving extents around between arrays, which removes all the guesswork from putting what data files and transaction logs where. It would be very cool if Linux could do this sort of thing natively. James