
On Fri, 12 Apr 2013, James Harper <james.harper@bendigoit.com.au> wrote:
. Online resize/reconfigure
both btrfs and zfs offer this.
Can it seamlessly continue over reboot? Obviously it can't progress while the system is rebooting like a hardware raid but I'd hope it could pick up where it left of automatically.
Traditional RAID systems (hardware and software) have fixed sectors on each disk for RAID stripes. So if you have 5 disks in a RAID-5 then every 5th block is a parity block. I believe that both ZFS and BTRFS do it differently. I believe that in both cases if you write an amount of data corresponding to an entire stripe then it will write it in the traditional manner. But if you have a small write that needs to be synchronous (IE filesystem metadata) then it may be written in a RAID-1 manner instead of a RAID-5 (or some similar construct that involves a subset of the disks). In that case adding a new disk doesn't really require that all data be shuffled around on all disks. For example if you had a BTRFS or ZFS RAID-Z type array that was 10% used and you added some more disks there wouldn't be a great need to balance it immediately. You could just let the filesystem allocate new data across all disks and balance itself gradually.
My last remaining reservation on going ahead with some testing is is there an equivalent of clvm for zfs? Or is that the right approach for zfs? My main server cluster is:
2 machines each running 2 x 2TB disks with DRBD with the primary exporting the whole disk as an iSCSI volume 2 machines each importing the iSCSI volume running lvm (clvm) on top, and using the lv's as backing stores for xen VM's.
How would this best be done using zfs?
There are scripts to use zfs send/receive in a tight loop, synchronising it as often as every minute. This is about as good as DRBD. DRBD allows a synchronous write to be delayed until the data is committed to disks on both servers. The zfs send/receive option would allow a write to succeed before it appears on the other system. This could be bad for a database but for other tasks wouldn't necessarily be so bad. I've considered using zfs send/receive for a mail server. If the primary failed then some delivered mail could disappear when the secondary became active (which would be bad). But if the primary's failure didn't involve enough disks dying to break it's RAID then after recovering the problem the extra email could be copied across. Email is in some ways an easier problem to solve because the important operation is file creation and files that matter aren't modified. If the file exists somewhere then it can be copied across and everything's good. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/