
disk has worn out), and in our case the performance went to crap sometime after the counter hit 1, causing considerable frustration to all involved.
Do you know of any good software to measure performance changes in disks? Performance often changes before drives develop faults and it would be good to measure that. EG if the number of iops the disk delivers when it's neat 100% according to the iostat algorithm suddenly changes by an order of magnitude then it's probably due for replacement.
The failure mode was really strange in this case. The performance issue actually came good by itself on 2 occasions before we finally brought the server down for investigation, and found that the disks were 'worn out' (couldn't check SMART counters "through" the raid controller so had to be done offline). Any tests we did on the disks individually showed no problems, although we didn't test them extensively. SSD's are cheap enough (even DC rated ones) that if you suspect problems, and further "try it and see" testing would cause pain to an office full of employees, it is best to just replace them. Zabbix is what I use to graph performance on servers (% utilisation and disk queue length vs iops are useful measures), but as best as I could tell there was no real gradual reduction in performance, it just suddenly went bad. It didn't help that the Intel RAID controller wasn't very good at giving up any details about cache utilisation either. James