
On Thu, Jul 25, 2013 at 06:31:54PM +1000, Russell Coker wrote:
I'm getting some errors on a zpool scrub operation. It's obvious that sdd has the problem both from the below output of zpool status and from the fact that the kernel message log has read errors about sdd.
what sort of controller is it on, and do you have standby/spin-down enabled? that can cause drives to be booted from a raid array or zpool if they don't respond fast enough. that's the reason for using IT mode firmware rather than RAID mode firmware in LSI and similar cards - it's far more forgiving of consumer drives and their slow responses when waking up from standby. RAID mode firmware pretty much expects enterprise drives with standby disabled.
But how can I get detail about what has gone wrong? Has sdd given corrupt data in addition to read failures? Presumably the "(repairing)" string will disappear as soon as the scrub finishes, when that happens how would I determine that sdd was to blame without the kernel error log?
# zpool status pool: tank state: ONLINE scan: scrub in progress since Thu Jul 25 16:38:01 2013 1.01T scanned out of 10.3T at 164M/s, 16h26m to go 1.40M repaired, 9.80% done config:
NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sda ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 (repairing)
corrupt data on sdd will have been detected and corrected by zfs. if the block read doesn't match the (sha256 IIRC) hash then it will be corrected from the redundant copies on the other drives in the pool. zpool status will tell you how much data was corrected when it has finished. e.g. on my backup pool, zpool status says this: scan: scrub repaired 160K in 4h21m with 0 errors on Sat Jul 20 06:03:58 2013 also the numbers in the READ WRITE and CKSUM columns will show you the number of errors detected for each drive. craig -- craig sanders <cas@taz.net.au> BOFH excuse #202: kernel panic: write-only-memory (/dev/wom0) capacity exceeded.