Re: zpool status

26 Jul 2013

      On Fri, Jul 26, 2013 at 02:24:22PM +1000, Russell Coker wrote:
...
Entries such as the following from the kernel message log seem to
clearly indicate a drive problem.  Also smartctl reports a history of
errors.
[1515513.068668] ata4.00: status: { DRDY ERR }
[1515513.068669] ata4.00: error: { UNC }
[1515513.103259] ata4.00: configured for UDMA/133
[1515513.103294] sd 3:0:0:0: [sdd] Unhandled sense code
[1515513.103296] sd 3:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
[1515513.103298] sd 3:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] 
[1515513.103301] Descriptor sense data with sense descriptors (in hex): 
[1515513.103303]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[1515513.103307]         2f 08 4a d0 
[1515513.103310] sd 3:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed 
[1515513.103313] sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 2f 08 4a 80 00 01 00 00 
[1515513.103318] end_request: I/O error, dev sdd, sector 789072592 
[1515513.103333] ata4: EH complete
if it wasn't for your mention of smartctrl errors, i'd suspect the
sata port as an equally-likely culprit.

and i still wouldn't rule it (or dodgy power/data connectors), out.

BTW, i recall reading a few years ago that drives only reallocate or
remap a sector on a WRITE failure, not a READ failure - so the only way
to force a good sector to be remapped over a bad sector on a read error
is to write to that sector.  I'm not 100% sure if this is still the
case.

googling for it, i haven't found the page where i originally read that
but found this instead:

http://www.sj-vs.net/forcing-a-hard-disk-to-reallocate-bad-sectors/

the suggestion from there is to use 'hdparm -read-sector' to verify that
sector 789072592 has a problem, then 'hdparm -write-sector' to rewrite
it. this should force the drive to remap the bad sector.

'hdparm -write-sector' will overwrite the sector with zeroes, but the
next zfs scrub (or read of the file using that sector in normal usage)
will detect and correct the error.

you can also force a resilver of the entire drive by 'zpool offline' the
disk, use dd to erase it (and this force a write and remap of any bad
sectors), and then 'zpool replace' it with itself.
...
...
...
I haven't run any "clear" command, so zfs decided by itself to
remove the data.
no, zfs didn't SEE any problem. when it accessed the drive, there
were no errors. I interpret this as very strong evidence that there
is nothing wrong with the drive.
It reported 1.4MB of data that needed to be regenerated from parity,
it definitely saw problems.
no, that's like saying "there's corruption in this one .tar.gz file
so that proves the entire disk is failing"....there's any number of
reasons why some data may be corrupt while the disk is still good.
some of those reasons are, in fact, the reason why error-detecting and
error-correcting filesystems like zfs are necessary.

if zfs had seen any read errors while scrubbing the disk, it would have
shown them in the status report.
...
...
if it had seen any errors, they'd be in the error counts in the zfs
status report.
The status report got it wrong.
or maybe there's a tiny, miniscule chance that you're just
misinterpreting what it's saying because of unfamiliarity with zfs.

craig

-- 
craig sanders <cas@taz.net.au>

Re: zpool status

Craig Sanders