Re: uncorrectable read errors

22 May 2013

      On Tue, May 21, 2013 at 01:04:36PM +1000, Russell Coker wrote:
...
On Tue, 21 May 2013, "Trent W. Buck" <trentbuck@gmail.com> wrote:
...
Am I right in thinking they become slow/erratic/unusable because of the
extra time sent seeking back and forth between the original track and
the spare track
nah, remapped sectors are fine (up until you run out of them).

it's the 1/2 there 'sometimes readable' sectors that are evil - I call
them heisensectors.

best case for these is a very long delay (or a sequence of short delays
from adjacent blocks, each under TLER) that causes a raid layer to
kickout the drive. usually when it's kicked out smart says the drive
looks pretty good, but it's not, it's insane.
a more usual case is just a lot of short delays and no matter how many
times you dd over the sector (or md rewrites it) it just keeps coming
back. annoying but not fatal until it gets worse.
worst case is a scsi driver hang due to a disk that is only 1/2
responding as presumably the drive firmware got confused, but not quite
confused/hung enough to cause the firmware to watchdog and reset
itself. this happened about once a month. by driver hang I mean at
least one sas port was hung (24 disks) and sometimes a scsi host (48
disks) and sometimes all 96 disks.

anyway, every disk is different (the above is Re: 1200 seagate
'enterprise' es1 1tb drives) but all disks are basically analogue and
basically crazy.
...
...
-- or just repeatedly trying to read a not-quite-dead
sector?
If you look at the contiguous IO performance of a brand new disk (which 
presumably has few remapped sectors) you will see a lot of variance in read 
times.  The variance is so great that the occasional extra seek for a remapped 
sector is probably lost in the noise.
ack.

in my experience all the layers of kernel caching and readahead and
firmware buffering will make the occasional big seek to a remapped
sector basically free.

I guess you could argue "what if I had one hot file with a remapped
sector in it" but if you're always re-reading the same file off disk in
a tight loop then you're probably doing something wrong :-)
...
Also I'd hope that the manufacturers do smart things about remapping.  For 
example they could have reserved tracks at various parts of the disk instead 
of just reserving one spot and thus giving long seeks for remapped sectors.
IIRC that's one thing that enterprise drives claim to have that
consumer drives don't - more spare sectors that are more distributed
across the drive and smarter firmware to choose the closest spare.
...
cheaper also tend to be a lot bigger.  Both the capacity and the price make it 
feasible to use greater levels of redundancy.  For example a RAID-Z3 array of 
"desktop" disks is likely to give greater capacity and lower price than a 
RAID-5 of "enterprise" disks.
more drives == more Watts which is a growing concern. I'm back to using
single large drives (with spun-down or offline backups) at home 'cos I
don't like the power usage and noise of lots of raid drives. few *3tb
is enough - I don't need 24tb always online.
essentially I'm doing raid1 but with very delayed and power-friendly
mirroring, and am prepared to do a fair bit of work and/or lose some data
when I get unreadable sectors.

cheers,
robin

Re: uncorrectable read errors

Robin Humble