Re: ZFS vs BTRFS

16 Apr 2012

      On Mon, 16 Apr 2012, Colin Fee <tfeccles@gmail.com> wrote:
...
...
Of course it might turn out that RAID-5 is the killer issue.  Servers
start  becoming a lot more expensive if you want more than 8 disks and
even 6 disks is a significant price point.  An 8 disk RAID-5 gives
something like 21TB usable space vs 12TB on a RAID-10 and a 6 disk
RAID-5 gives about 15TB vs 9TB on a RAID-10.
Anything else I should consider?
Not that I've got anything to add re ZFS vs BTRFS having no specialist
knowledge either way, but in other posts haven't you advocated for RAID-6
over RAID-5? Or is this something mandated on the client side?
If you use Linux Software RAID-6 then reconstruction apparently is not based 
on checking both sets of checksums but is rather just regenerating checksums 
based on the available data.  So RAID-6 covers you for the case when two disks 
entirely die, but that is rare - it's still something you want coverage from 
but it doesn't give the potential benefits.  I have no reason to believe that 
any other RAID system which still conforms to the basic RAID-6 does something 
better although I acknowledge that there are lots of implementations that 
aren't well documented so anything is possible.

http://en.wikipedia.org/wiki/Zfs

If you use ZFS with RAID-5 it will check the hashes on every block and 
regenerate things if they don't match.  Also it's possible to go back in time 
and get an earlier copy of the data if there is a corrupted block in the 
latest copy and no redundancy (see the Wikipedia page for more info).

So if you compare Linux Software RAID-5 which only properly copes with a disk 
entirely dying or returning read errors to ZFS then ZFS wins in the following 
situations:

1) A disk entirely dies (or is being replaced due to sporadic errors) and 
another disk has a single error during recovery.  ZFS can flag an error on 
RAID-5 and allow you to get an earlier version.  Linux software RAID just 
loses and leaves corruption for a fsck or data file scrub by an application.

2) 2 disks in a RAID-5 have a few read errors - a reasonably common failure 
case as most drive failures in production are based on some read failures not 
a total death.  Linux software RAID fails, it kicks out one disk and then you 
lose when the second disk has a read error.  ZFS SHOULD just read from the 
other disks in the stripe for each error (which is detected by a hash 
mismatch) and reconstruct the data.  NB I've only seen two disks in a RAID set 
fail with RAID-1, and Linux software RAID lost then.

3) A disk returns corrupt data for any reason.

Linux software RAID-6 deals with case 1.  It also deals with case 2 although 
if you suddenly get a third disk giving a few read errors (which could happen 
due to heat) then you lose.  In theory a ZFS RAID-5 (AKA RAID-Z) could cope 
better with some failure conditions than a Linux Software RAID-6!

That said, ZFS supports RAID-6 AKA RAID-Z2.  Given the prices of 3TB disks and 
the fact that reasonably affordable servers can handle 8 disks which allows 
18TB of RAID-6 storage it seems like a RAID-Z2 with ZFS is clearly a better 
choice for most uses (the copy on write feature of ZFS apparently removes the 
worst performance problems of RAID-5 and RAID-6).

Anyway in my previous message I just wasn't really concerned with RAID-5 vs 
RAID-6.  As BTRFS supports neither and ZFS supports both and as they both have 
very similar amounts of usable capacity for the 8 disk case it's not an issue 
at this stage of planning.  But I think that another general discussion of 
RAID technology at this time is a good thing so your question is good and 
deserved a long answer.

As for my client, I will give them some options with prices and ask them how 
much they want to pay more for reliability.  I expect that they will pay for 
RAID-6, not because of some sort of business analysis of risk (which they 
can't do), but because it doesn't cost much and it would really suck to have 
some down-time and data loss due to saving such a small amount of money and 
disk space.

http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

As an aside the above page about giving recovery timeouts for disk read 
operations should also be of interest to some people here given the previous 
discussions about JBOD vs RAID modes for disks.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: ZFS vs BTRFS

Russell Coker