
On Fri, Apr 06, 2012 at 03:55:19PM +1000, Craig Sanders wrote:
On Thu, Apr 05, 2012 at 10:59:10PM -0400, Robin Humble wrote:
how did ZoL handle the drive failure? gracefully? no problem at all. as gracefully as possible under the circumstances (a missing disk). 'zpool status' reported that the pool was degraded, and that one of the component disks was offline and needed to be replaced.
cool. what about sector re-writes - have you seen ZFS do any of those? I presume ZFS does this, but again, I haven't seen it yet in my testing. I'm familiar with these from md raid1/5/6 which when it hits a disk read error will over-write the unreadable disk blocks with reconstructed (if necessary) info from the other drives. the write causes the drive to remap the sector from an internal spare and fixes the read error. we couldn't survive without this... a vital feature.
the entire process was smooth and painless and Just Worked. i've had similar experiences with mdadm and hw raid resyncs, but this just felt smoother.
nice :)
and unlike block-level raid, it didn't have to sync every block, just those blocks that needed to have a copy on the replacement.
yeah, that's a great feature.
I ask because I've been hammering ZoL (with Lustre) for a while now looking at performance and stability, but haven't tested with any drive failures (real or simulated) yet. how's lustre looking?
improving. it's quite an 'experimental' lustre version that they're using, even without the ZFS additions, so lots of new and shiny things to break! :) https://github.com/chaos/lustre/tags unfortunately the zfs backend usually still deadlocks when I push it hard :-/ md backend is ok. my typical test config is 4 lustre clients to one lustre server with 40 SAS disks in raidz2 8+2's - either in 4 separate zpools (4 OSTs in Lustre-speak) or all 4 z2's in one zpool (one OST). using 8 rpc's in flight per client works, but using 32 (more clients would prob have the same effect) isn't stable yet IMHO. Lustre ZFS is very fast for writes when it works though... even random 1m writes. md is faster for reads, but then again, it's not checksum'ing anything.
i know it's the major motivation behind behlendorf @ LLNL's work on zfsonlinux, but haven't been following news about it.
I suspect we should know more after LUG in a few weeks time http://www.opensfs.org/lug/program judging by their hw config and my benchmarks (and assuming it's the same config as in Brian's talk at LUG last year) I think they'll easily get to their target of 1TB/s write, but 1TB/s reads will be a bit harder. I suspect they'll still get there though. cheers, robin