James Harper <james.harper(a)bendigoit.com.au>
It's frustrating because a simple "if
hard read errors > 0 || failed
self tests > 0 then drive = not okay" would have meant I could just
read the SMART health indicator and eject the drive from the array (or
whatever it belonged to).
IIRC from heterogeneous disks in an array I had once, I was getting 10*
the number of errors on one pair of disks from the other pair. It
turned out that seagate was reporting uncorrectable errors and WD was
reporting all errors -- the seagate had an extra field where it reported
the raw error rate.
If you are gonna script a "not okay" heuristic, be careful not to
overgeneralize from one vendor to the next.
That's why I want the vendors to make the leap that "unrecoverable read error =
unhealthy disk". The reported counters are not reliable, as you say.