Re: ZFS vs RAID (was gpt and grub)

12 Apr 2013

      On Fri, 12 Apr 2013, James Harper <james.harper@bendigoit.com.au> wrote:
...
...
...
. Online resize/reconfigure
both btrfs and zfs offer this.
Can it seamlessly continue over reboot? Obviously it can't progress while
the system is rebooting like a hardware raid but I'd hope it could pick up
where it left of automatically.
Traditional RAID systems (hardware and software) have fixed sectors on each 
disk for RAID stripes.  So if you have 5 disks in a RAID-5 then every 5th 
block is a parity block.

I believe that both ZFS and BTRFS do it differently.  I believe that in both 
cases if you write an amount of data corresponding to an entire stripe then it 
will write it in the traditional manner.  But if you have a small write that 
needs to be synchronous (IE filesystem metadata) then it may be written in a 
RAID-1 manner instead of a RAID-5 (or some similar construct that involves a 
subset of the disks).

In that case adding a new disk doesn't really require that all data be shuffled 
around on all disks.  For example if you had a BTRFS or ZFS RAID-Z type array 
that was 10% used and you added some more disks there wouldn't be a great need 
to balance it immediately.  You could just let the filesystem allocate new data 
across all disks and balance itself gradually.
...
My last remaining reservation on going ahead with some testing is is there
an equivalent of clvm for zfs? Or is that the right approach for zfs? My
main server cluster is:
2 machines each running 2 x 2TB disks with DRBD with the primary exporting
the whole disk as an iSCSI volume 2 machines each importing the iSCSI
volume running lvm (clvm) on top, and using the lv's as backing stores for
xen VM's.
How would this best be done using zfs?
There are scripts to use zfs send/receive in a tight loop, synchronising it as 
often as every minute.  This is about as good as DRBD.  DRBD allows a 
synchronous write to be delayed until the data is committed to disks on both 
servers.  The zfs send/receive option would allow a write to succeed before it 
appears on the other system.  This could be bad for a database but for other 
tasks wouldn't necessarily be so bad.

I've considered using zfs send/receive for a mail server.  If the primary 
failed then some delivered mail could disappear when the secondary became 
active (which would be bad).  But if the primary's failure didn't involve 
enough disks dying to break it's RAID then after recovering the problem the 
extra email could be copied across.  Email is in some ways an easier problem 
to solve because the important operation is file creation and files that matter 
aren't modified.  If the file exists somewhere then it can be copied across and 
everything's good.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: ZFS vs RAID (was gpt and grub)

Russell Coker