Re: experience with XFS

6 Apr 2012

      On Fri, Apr 06, 2012 at 03:55:19PM +1000, Craig Sanders wrote:
...
On Thu, Apr 05, 2012 at 10:59:10PM -0400, Robin Humble wrote:
...
how did ZoL handle the drive failure? gracefully?
no problem at all. as gracefully as possible under the circumstances (a
missing disk). 'zpool status' reported that the pool was degraded, and
that one of the component disks was offline and needed to be replaced.
cool.

what about sector re-writes - have you seen ZFS do any of those?
I presume ZFS does this, but again, I haven't seen it yet in my testing.

I'm familiar with these from md raid1/5/6 which when it hits a disk read
error will over-write the unreadable disk blocks with reconstructed
(if necessary) info from the other drives. the write causes the drive
to remap the sector from an internal spare and fixes the read error.
we couldn't survive without this... a vital feature.
...
the entire process was smooth and painless and Just Worked. i've had
similar experiences with mdadm and hw raid resyncs, but this just felt
smoother.
nice :)
...
and unlike block-level raid, it didn't have to sync every block, just
those blocks that needed to have a copy on the replacement.
yeah, that's a great feature.
...
...
I ask because I've been hammering ZoL (with Lustre) for a while now
looking at performance and stability, but haven't tested with any
drive failures (real or simulated) yet.
how's lustre looking?
improving. it's quite an 'experimental' lustre version that they're
using, even without the ZFS additions, so lots of new and shiny things
to break! :)
  https://github.com/chaos/lustre/tags
unfortunately the zfs backend usually still deadlocks when I push it
hard :-/ md backend is ok.

my typical test config is 4 lustre clients to one lustre server with 40
SAS disks in raidz2 8+2's - either in 4 separate zpools (4 OSTs in
Lustre-speak) or all 4 z2's in one zpool (one OST). using 8 rpc's in
flight per client works, but using 32 (more clients would prob have the
same effect) isn't stable yet IMHO.

Lustre ZFS is very fast for writes when it works though... even random
1m writes.
md is faster for reads, but then again, it's not checksum'ing anything.
...
i know it's the major motivation behind behlendorf @ LLNL's work
on zfsonlinux, but haven't been following news about it.
I suspect we should know more after LUG in a few weeks time
  http://www.opensfs.org/lug/program

judging by their hw config and my benchmarks (and assuming it's the
same config as in Brian's talk at LUG last year) I think they'll easily
get to their target of 1TB/s write, but 1TB/s reads will be a bit
harder. I suspect they'll still get there though.

cheers,
robin