
On Fri, Feb 10, 2012 at 12:23:27PM -0500, Robin Humble wrote:
it can always eject a 2nd (or 3rd) disk for the same reason as it ejected the first - typically rewriting the bad sector failed, or too many bad sectors too close together, or ... so kick it out.
another annoying cause of disks being kicked from mdadm (and zfs and presumably btrfs arrays too) is disk read timeouts due to the drive sleeping. particularly common when you have a HW raid card of some sort in JBOD mode - these often have much lower timeouts than just bare disks on a m/b SATA interface, because the assumption is that it's in a high-end server with high-end "enterprise" drives (where spares are budgeted for and quickly available) rather than commodity drives in a home server. The card reports the drive as dead/dying/failed, and mdadm/zfs/btrfs kicks it....even though there's nothing wrong with the disk, it just took too long to wake up from sleeping. the solution, at least for LSI 9211-8i SAS cards(*) like I have, is to re-flash the card's firmware in IT (Initiator Target) mode rather than Raid mode. (this particular issue annoyed the hell out of me before i figured out what was going on) (*) BTW, these cards are an extraordinarily cheap way of adding 8 SAS/SATA 6Gbps ports to your system. there are numerous re-badged models (from IBM, Dell, supermicro, and others), and they sell on ebay for anywhere from about $65 to $150. these are 8-port SAS 6Gbps cards...you can't even buy 4-port SATA cards for that. they do raid1/0/10 natively, but for linux you don't want that. just re-flash them with the IT firmware and run them as HBAs. ideal for mdadm, zfs, and btrfs. uses the mpt2sas driver in linux....GPL, and in the mainline kernel. nice. craig ps: speaking of sleeping drives, why do all drive manufacturers make their drives wake up when you query their temperature with SMART? meaning you can set them to sleep when idle (e.g. with hdparm -S) to reduce power usage and temperature, *OR* you can monitor their temperature, but you can't do both. you can use SMART to query some other drive data without waking them up, but not temperature. i've seen this with WD, Seagate, Hitachi, and Samsung drives. WTF? -- craig sanders <cas@taz.net.au>