
On Thu, Apr 05, 2012 at 01:44:00PM +1000, Marcus Furlong wrote:
We have issues where the monthly mdadm raid check grinds the system to a halt.
do you find that these monthly cron jobs are actually useful? i've never found it to be so, and suspect that it will actually cause problems because the heavy io load might be enough to push a borderline drive into failure. this is probably what you want in a data center with spare disks ready and waiting but not really what you want happening at home on a sunday morning. the computer shops are shut, the nearest swap meet might be the other side of town that week, and fixing a dead fs with the cheery sound of lawnmowers in the background is enough to send you postal :)
Initially we thought this was due to crappy raid cards, with the disks in JBOD mode, and using software raid (to get the battery backed cache). Removing the raid cards and plugging disks directly into the motherboard did alleviate the problems somewhat, and all the monthly checkarray scripts completed.
raid cards often use raid-mode timeouts even in jbod mode, causing TLER problems with slow drives, as the timeouts tend to have enterprise grade 15K RPM drives in mind. some raid cards have alternate firmware (generally referred to as "Initiator Target" or "IT" mode) which alleviates that problem. probably why the problem partly cleared up when you switched to using motherboard drive ports. recommended practice when using mdadm (or zfs or other software-raid like thing) is to use plain sata ports or IT mode firmware. BTW, this is why supermicro motherboards with LSI SAS controllers built-in have a raid-mode/IT mode switch right next to the drive sockets.
Overall ganeti is really nice, but it feels like drbd has some missing pieces that would help in debugging issues.
i'm wondering if iscsi kind of obsoletes drbd, and if mdadm raid1 over two iscsi exports would be better than drbd. part of my curiosity is due to the fact that i prefer zfs to lvm, and iscsi ... when i get time i intend to experiment with ganeti and see if i can come up with a zfs+iscsi+mdadm storage module for it as an alternative to lvm+drbd. this would also allow skipping the io-hogging mdadm check (replaced with a weekly or monthly zpool scrub). craig -- craig sanders <cas@taz.net.au> BOFH excuse #291: Due to the CDA, we no longer have a root account.