
On Fri, 26 Jul 2013 12:53:32 +1000 Craig Sanders <cas@taz.net.au> wrote:
On Fri, Jul 26, 2013 at 12:38:29PM +1000, Russell Coker wrote:
The scrub has completed and I now get the following output. It seems that the status has been cleared so if it wasn't for the kernel messages I wouldn't know which disk had the problem.
that suggests to me that there is nothing wrong with the drive - if anything is going to stress a drive with problems, a 10+ hour 'zfs scrub' will do it.
the drive was probably in standby mode and took too long to wake up and respond, causing the kernel to complain. or possibly a dodgy power connector or an underpowered PSU or similar.
Entries such as the following from the kernel message log seem to clearly indicate a drive problem. Also smartctl reports a history of errors. [1515513.068668] ata4.00: status: { DRDY ERR } [1515513.068669] ata4.00: error: { UNC } [1515513.103259] ata4.00: configured for UDMA/133 [1515513.103294] sd 3:0:0:0: [sdd] Unhandled sense code [1515513.103296] sd 3:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [1515513.103298] sd 3:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] [1515513.103301] Descriptor sense data with sense descriptors (in hex): [1515513.103303] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [1515513.103307] 2f 08 4a d0 [1515513.103310] sd 3:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed [1515513.103313] sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 2f 08 4a 80 00 01 00 00 [1515513.103318] end_request: I/O error, dev sdd, sector 789072592 [1515513.103333] ata4: EH complete
i've seen the standby/sleep phantom problem happen with several drives over the years - even replaced a few until i figured out what was going on (i've since re-used those drives with no problems).
I haven't run any "clear" command, so zfs decided by itself to remove the data.
no, zfs didn't SEE any problem. when it accessed the drive, there were no errors. I interpret this as very strong evidence that there is nothing wrong with the drive.
It reported 1.4MB of data that needed to be regenerated from parity, it definitely saw problems.
if it had seen any errors, they'd be in the error counts in the zfs status report.
The status report got it wrong. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/