
On 14/05/13 16:35, James Harper wrote:
I've had a few disks fail with uncorrectable read errors just recently, and in the past my process is that any disk with any sort of error gets discarded and replaced, especially in a server. I did some reading though (see previous emails about SMART vs actual disk failures) and read that simply writing back over those sectors is often enough to clear the error and allow them to be remapped, possibly extending the life of the disk, depending on the cause of the error.
In actual fact after writing the entire failed disk with /dev/zero the other day, all the SMART attributes are showing a healthy disk - no pending reallocations and no reallocated sectors, yet, so maybe it wrote over the bad sector and determined it was good again without requiring a remap. I'm deliberately using some old hardware to test ceph to see how it behaves in various failure scenarios, and has been pretty good so far despite 3 failed disks over the few weeks I've been testing.
What can cause these unrecoverable read errors? Is losing power mid-write enough to cause this to happen? Or maybe a knock while writing? I grabbed these 1TB disks out of a few old PC's and NAS's I had lying around the place so their history is entirely uncertain. I definitely can't tell if they were already present when I started using ceph on them.
Is Linux MD software smart enough to rewrite a bad sector with good data to clear this type of error (keeping track of error counts to know when to eject the disk from the array)? What about btrfs/zfs? Trickier with something like ceph where ceph runs on top of a filesystem which isn't itself redundant...
A while back, when 4096 byte sectors went native, I had a disk with a - I think it said CRC error - on one sector. The interesting thing was when I read the sector with "dd conv=noerror" I got 4096 bytes, 7/8 of which was clearly valid directory info (NTFS) and 512 bytes were garbage. Go figure. Writing this sector back cleared the read error, but there was a bit of damage to the file system with 512 bytes of dud info. Now to add to the strange error messages from drives, I'm getting this one: [ 317.144766] EXT4-fs (sdb1): error count: 1 [ 317.144777] EXT4-fs (sdb1): initial error at 1345261136: ext4_find_entry:1209: inode 2 [ 317.144785] EXT4-fs (sdb1): last error at 1345261136: ext4_find_entry:1209: inode 2 sdb1 is mounted noatime, and this message turns up around the same time from boot. Smart tests and file system checks pass, I guess I'll just have to dump the entire 1TB+ to /dev/null to see if that trips anything usefull.