
On Thu, Apr 05, 2012 at 10:59:10PM -0400, Robin Humble wrote:
how did ZoL handle the drive failure? gracefully?
no problem at all. as gracefully as possible under the circumstances (a missing disk). 'zpool status' reported that the pool was degraded, and that one of the component disks was offline and needed to be replaced. the error reporting in zpool status is quite good: an error code, a short meaningful paragraph and a URL for a web page with more details (earlier versions used to refer to a sun.com URL, but now they refer to pages under http://zfsonlinux.org/msg/ - the sun.com zfs msg links weren't properly redirected when oracle rebranded the old sun web sites)
no problems putting in a new disk?
I issued the 'zpool replace ...' command and it started resilvering the data from the available disks onto the new disk. i continued using the system, and barely even noticed when it finished. the entire process was smooth and painless and Just Worked. i've had similar experiences with mdadm and hw raid resyncs, but this just felt smoother. and unlike block-level raid, it didn't have to sync every block, just those blocks that needed to have a copy on the replacement.
I ask because I've been hammering ZoL (with Lustre) for a while now looking at performance and stability, but haven't tested with any drive failures (real or simulated) yet.
how's lustre looking? i know it's the major motivation behind behlendorf @ LLNL's work on zfsonlinux, but haven't been following news about it. craig -- craig sanders <cas@taz.net.au> BOFH excuse #446: Mailer-daemon is busy burning your message in hell.