
Hi all, I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6. There's no end to problems experienced with this setup; a disk seems to time out and get ejected from the controller at least every week. It's not always the same disk though, and I've tried replacing each of the SSDs with their spares, to no avail. Is there any known issue with using SSDs in hardware RAID that I've overlooked? Performance isn't *that* much of an issue, since it's only running Apache + mod_perl and a ~100GB MySQL database. Any thoughts appreciated Thanks, Hannah

On Thu, Jul 24, 2014 at 09:45:45AM +1000, hannah commodore wrote:
I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6.
given that thet 2420SA doesn't do RAID6 natively, i guess you must be using mdadm linux software raid... if the motherboard has four spare SATA ports, have you tried plugging the SSDs directly into the m/b? i.e. i suspect that the problem is more likely with the adaptec card than with either the SSDs or the software raid. craig -- craig sanders <cas@taz.net.au>

On 24 Jul 2014, at 10:00, Craig Sanders <cas@taz.net.au> wrote:
On Thu, Jul 24, 2014 at 09:45:45AM +1000, hannah commodore wrote:
I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6.
given that thet 2420SA doesn't do RAID6 natively, i guess you must be using mdadm linux software raid...
It does support RAID6, as well as 5EE, 10,50, 60 etc
if the motherboard has four spare SATA ports, have you tried plugging the SSDs directly into the m/b? i.e. i suspect that the problem is more likely with the adaptec card than with either the SSDs or the software raid.
I've tested this disks individually when they were kicked out of the array, and always seem fine. I've only seen 1 actually failed disk in almost a year of this array running. On 24 Jul 2014, at 11:29, Trent W. Buck <trentbuck@gmail.com> wrote:
Are you confident the disks are the problem? Since this is hardware RAID, you should have a spare RAID card. Swap that in, and see if the problem goes away. If it does, RMA the original card.
I'm not confident, no. I do have a replacement 2420SA I will try out. The write-cache battery also reports as Failed recently, so I'm thinking more and more it's the controller than anything else. On 24 Jul 2014, at 12:07, Russell Coker <russell@coker.com.au> wrote:
This could be related to TRIM. If you have it enabled it can cause significant delays and if you don't then eventually the SSD will need to clear erase blocks and cause delays for random writes.
Thanks. I'll have to see about how to change the TRIM settings on these Plextor drives. They don't show up as individual disks to the OS, but just the single logical volume

hannah commodore <hannah@tinfoilhat.net> wrote:
Thanks. I'll have to see about how to change the TRIM settings on these Plextor drives. They don't show up as individual disks to the OS, but just the single logical volume
Does loading the sg module create individual devices (/dev/sg0, /dev/sg1 etc.)? It does for me, but mine is an LSI controller with SAS drives.

On Thu, Jul 24, 2014 at 12:36:47PM +1000, hannah commodore wrote:
On 24 Jul 2014, at 10:00, Craig Sanders <cas@taz.net.au> wrote:
On Thu, Jul 24, 2014 at 09:45:45AM +1000, hannah commodore wrote:
I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6.
given that thet 2420SA doesn't do RAID6 natively, i guess you must be using mdadm linux software raid...
It does support RAID6, as well as 5EE, 10,50, 60 etc
oops, you're right. i was confusing it with the 1430SA (which I have), a 4-port PCI-e SATA card that only supports RAID levels 0, 1, & 10, whereas the 2420SA is a PCI-X RAID card.
if the motherboard has four spare SATA ports, have you tried plugging the SSDs directly into the m/b? i.e. i suspect that the problem is more likely with the adaptec card than with either the SSDs or the software raid.
I've tested this disks individually when they were kicked out of the array, and always seem fine. I've only seen 1 actually failed disk in almost a year of this array running.
it really does sound like it's the adaptec card that's the problem. ok, the adapatec raid6 is proprietary so you won't be able to just plug the SSDs into motherboard ports...but you mentioned you had spares for each of the SSDs. you could plug them into m/b SATA ports, configure them as raid6 (or raid10 or btrfs or whatever) and rsync from the adaptec raid array to the new array. if the adaptec raid is the boot drive, you'll also need to re-configure/re-run grub. craig -- craig sanders <cas@taz.net.au>

hannah commodore <hannah@tinfoilhat.net> writes:
On 24 Jul 2014, at 12:07, Russell Coker <russell@coker.com.au> wrote:
This could be related to TRIM. If you have it enabled it can cause significant delays and if you don't then eventually the SSD will need to clear erase blocks and cause delays for random writes.
Thanks. I'll have to see about how to change the TRIM settings on these Plextor drives. They don't show up as individual disks to the OS, but just the single logical volume
You change it by mounting the filesystem with -o trim (or not) or calling fstrim(8). The drives either support it or they don't (they probably do). The iffy bit will be whether the controller tells the OS "this [logical] drive doesn't support TRIM", or passes it through to the physical disks, or if you can configure either behaviour. I doubt TRIM is going to be the issue here.

On 24 July 2014 16:45, Trent W. Buck <trentbuck@gmail.com> wrote:
hannah commodore <hannah@tinfoilhat.net> writes:
On 24 Jul 2014, at 12:07, Russell Coker <russell@coker.com.au> wrote:
This could be related to TRIM. If you have it enabled it can cause significant delays and if you don't then eventually the SSD will need to clear erase blocks and cause delays for random writes.
Thanks. I'll have to see about how to change the TRIM settings on these Plextor drives. They don't show up as individual disks to the OS, but just the single logical volume
You change it by mounting the filesystem with -o trim (or not) or calling fstrim(8). The drives either support it or they don't (they probably do). The iffy bit will be whether the controller tells the OS "this [logical] drive doesn't support TRIM", or passes it through to the physical disks, or if you can configure either behaviour.
I really doubt that particular RAID card supports TRIM. Sorry.
I doubt TRIM is going to be the issue here.
Agreed. Losing it hurts your performance a bit in the long run, but it wouldn't cause SSDs to get kicked out of arrays.

On 24 July 2014 09:45, hannah commodore <hannah@tinfoilhat.net> wrote:
I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6. There's no end to problems experienced with this setup; a disk seems to time out and get ejected from the controller at least every week. It's not always the same disk though, and I've tried replacing each of the SSDs with their spares, to no avail.
Why the hell would someone choose to create a four-disk RAID6 array? It doesn't make any sense! I'd start by breaking off the two parity disks, and turning them into half of a degraded RAID10 array; port the data over, then take the remaining raid6 disks and add them too. You now have the same level of redundancy but much better performance. While you're there, I'd switch from hardware raid to linux software raid, as hardware raid generally doesn't support SSDs very well at the moment. (More in the sense of handling TRIM and SMART rather than kicking them out of the array.. don't know what is causing that.) -T

On Thu, 24 Jul 2014 10:43:01 Toby Corkindale wrote:
On 24 July 2014 09:45, hannah commodore <hannah@tinfoilhat.net> wrote:
I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6. There's no end to problems experienced with this setup; a disk seems to time out and get ejected from the controller at least every week. It's not always the same disk though, and I've tried replacing each of the SSDs with their spares, to no avail.
This could be related to TRIM. If you have it enabled it can cause significant delays and if you don't then eventually the SSD will need to clear erase blocks and cause delays for random writes.
Why the hell would someone choose to create a four-disk RAID6 array? It doesn't make any sense!
http://en.wikipedia.org/wiki/Linux_software_RAID Sure it does. A RAID-10 can be made from 2*RAID-1 arrays in which case there's a 1/3 probability that a second disk failure will lose data. If a RAID-10 has every data block on 2 disks with the duplicated data striped (as described in the above Wikipedia page) then if you lose 2 disks then you are guaranteed to lose data. Now if your RAID supports a "replace" operation (where constructing the new disk takes data from the old disk OR the parity disks) then you could lose 2 disks and probably not lose data (most disk "failures" don't involve total loss of disk function, thousands of bad sectors in a terabyte disk is a significant failure). That's good for people who use BTRFS or ZFS. People who use Linux software RAID and other less capable RAID systems need to remove one disk from the array before they add a replacement, so any error found during the process of regenerating the replacement disk will lose data. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

hannah commodore <hannah@tinfoilhat.net> writes:
I've inherited a system which has an Adaptec 2420SA RAID controller and 4x SSDs running in RAID6. There's no end to problems experienced with this setup; a disk seems to time out and get ejected from the controller at least every week. It's not always the same disk though, and I've tried replacing each of the SSDs with their spares, to no avail.
Are you confident the disks are the problem? Since this is hardware RAID, you should have a spare RAID card. Swap that in, and see if the problem goes away. If it does, RMA the original card. If you have a complete backup *server*, swap the disks into it and call it the master. That way you can test for faults in the hotswap backplane &c as well.

On 24/07/2014 9:15 am, hannah commodore wrote:
Any thoughts appreciated
Hi If there not Enterprise SAS SSD's they will have issues being in array because the firmware is not optimised fir it. The same way as standard spinning media drive has issues with being in a raid array if its not configured for it. Never run SSD's in more than a raid mirror, unless the drive actually indicates it ok to be used in a RAID. Cheers Mike

On Wed, 6 Aug 2014, Mike O'Connor wrote:
On 24/07/2014 9:15 am, hannah commodore wrote:
Any thoughts appreciated
Hi
If there not Enterprise SAS SSD's they will have issues being in array because the firmware is not optimised fir it. The same way as standard spinning media drive has issues with being in a raid array if its not configured for it.
Never run SSD's in more than a raid mirror, unless the drive actually indicates it ok to be used in a RAID.
[Citation needed] It's just data. -- Tim Connors

Tim Connors <tim.w.connors@gmail.com> writes:
On Wed, 6 Aug 2014, Mike O'Connor wrote:
If there not Enterprise SAS SSD's they will have issues being in array because the firmware is not optimised fir it. The same way as standard spinning media drive has issues with being in a raid array if its not configured for it.
Never run SSD's in more than a raid mirror, unless the drive actually indicates it ok to be used in a RAID.
[Citation needed]
It's just data.
I assume he's working from the first principles that 1. the FTL will be optimized for FAT (or maybe NTFS), because that's what "normal" people put on it, and 2. a RAID5/6 workload doesn't look like a FAT workload. In the same way that in a spinning rust drive, "enterprise" or "NAS" firmwares are better because they are programmed to give up QUICKLY -- so the RAID controller can go "oh OK" and grab the block from another disk. (Also maybe they have tighter QA controls, but meh.) If it's for a super duper RDBMS cluster or something, the suboptimal performance might matter, but if it's just serving office documents then nobody is going to give a shit. Just give it a bunch more RAM so it can cache the most popular disk blocks.

The issue with hard drives in RAID arrays is the read timeout. A drive that is optimised for non-RAID will do more retries when reading in the hope of getting good data while the RAID optimised drives will return an error quickly and let the other drives have a go. In spite of thus desktop drives work well in RAID arrays apart from poor performance when things go wrong. I am not aware of SSD having such timeout issues. Can you cite a reference? In terms of what disks do there's no difference between a mirror and RAID5+. On 7 August 2014 12:19:28 AM AEST, Mike O'Connor <mike@pineview.net> wrote:
On 24/07/2014 9:15 am, hannah commodore wrote:
Any thoughts appreciated
Hi
If there not Enterprise SAS SSD's they will have issues being in array because the firmware is not optimised fir it. The same way as standard spinning media drive has issues with being in a raid array if its not configured for it.
Never run SSD's in more than a raid mirror, unless the drive actually indicates it ok to be used in a RAID.
Cheers Mike _______________________________________________ luv-main mailing list luv-main@luv.asn.au http://lists.luv.asn.au/listinfo/luv-main
-- Sent from my Samsung Galaxy Note 2 with K-9 Mail.
participants (8)
-
Craig Sanders
-
hannah commodore
-
Jason White
-
Mike O'Connor
-
Russell Coker
-
Tim Connors
-
Toby Corkindale
-
trentbuck@gmail.com