
Now I have another RAID problem. Each disk is partitioned like: /dev/sd[abcd]1 - bios boot partition /dev/sd[abcd]2 - md0 - /boot (I believe grub can now see inside lvm but this config predates that) /dev/sdX3 - md1 - lvm - /root + others I have been progressively replacing disks. sdd was fine, sdc was fine, then I replaced sda to test that booting worked (changed bios to boot from sdb, added new sda, added to raid, etc) and that didn't work because I mucked up the gpt bios boot partition. Having fixed that, I rebooted to test booting again while md1 was still rebuilding onto sda3, and it drops into the initramfs shell on boot because mdadm first sees sda, then tries to add sd[bcd]3 but because each of those disks say that sda3 is not consistent the rebuild fails leaving just sda3 present as a 'spare' drive. In the initramfs console I just manually stop the array, add sd[bcd]3, then sda3, and it's all good again, but I didn't think this was the way it was supposed to work. On the most recent boot I zero'd the superblock of sda3 before re-adding it, thinking maybe it had picked up some bitrot somewhere along the way, but I'm not confident. Because the server is responsible for running backups and last night's backups didn't work because it was down, I'm not going to touch it again until tonight's backups are complete, but any hints on how to make this work smoother next time would be appreciated! And just to recap, rebooting a RAID comprising of /dev/sd[abcd]3, when /dev/sda3 is being rebuilt, results in a boot that drops into the initramfs shell because it appears that mdadm tries to add /dev/sda3 first then rejects the other 3 disks because they say /dev/sda3 is inconsistent (which it is). Thanks James

Hi James RAID10 or? You didn't specify in this thread. Regards, Arjen. -- Exec.Director @ Open Query (http://openquery.com) MariaDB/MySQL services Sane business strategy explorations at http://upstarta.com.au Personal blog at http://lentz.com.au/blog/

On Fri, Apr 12, 2013 at 01:28:01AM +0000, James Harper wrote:
And just to recap, rebooting a RAID comprising of /dev/sd[abcd]3, when /dev/sda3 is being rebuilt, results in a boot that drops into the initramfs shell because it appears that mdadm tries to add /dev/sda3 first then rejects the other 3 disks because they say /dev/sda3 is inconsistent (which it is).
maybe try swapping sda and sdb so sda3 is good and sdb3 is inconsistent. this may allow the system to boot properly and the raid resync to proceed. BTW don't forget to make sure your system can boot with the drives swapped around. this may mean installing grub into the MBR on all drives or just changing the BIOS setting (or use the BIOS boot menu) to boot off the drive that used to be sda but is now sdb. craig -- craig sanders <cas@taz.net.au> BOFH excuse #11: magnetic interference from money/credit cards

On Fri, Apr 12, 2013 at 01:28:01AM +0000, James Harper wrote:
And just to recap, rebooting a RAID comprising of /dev/sd[abcd]3, when /dev/sda3 is being rebuilt, results in a boot that drops into the initramfs shell because it appears that mdadm tries to add /dev/sda3 first then rejects the other 3 disks because they say /dev/sda3 is inconsistent (which it is).
maybe try swapping sda and sdb so sda3 is good and sdb3 is inconsistent.
this may allow the system to boot properly and the raid resync to proceed.
I can get into the initramfs shell remotely (now that I have ipmi working properly, even if it doesn't work in grub) and sort it out so it boots, I'm just bothered by the situation where intervention is required at all.
BTW don't forget to make sure your system can boot with the drives swapped around. this may mean installing grub into the MBR on all drives or just changing the BIOS setting (or use the BIOS boot menu) to boot off the drive that used to be sda but is now sdb.
Yep, I have tested booting off sd[a-d] and they all work (current RAID situation excepted). I don't know how to tell though if the BIOS is reading the bootsector from (say) sdd but then grub is just reading from sda. I could pull sda but then the disks all shuffle down a letter and I still can't tell which one is being used for boot. Also, when putting in a brand new sda disk with no bootsector the bios just says can't boot and doesn't proceed to sdb which is the next in the boot sequence, which is a bit frustrating. This is where fakeraid wins over linux md. James

And just to recap, rebooting a RAID comprising of /dev/sd[abcd]3, when /dev/sda3 is being rebuilt, results in a boot that drops into the initramfs shell because it appears that mdadm tries to add /dev/sda3 first then rejects the other 3 disks because they say /dev/sda3 is inconsistent (which it is).
This is all fixed now. The problem was that the mdadm superblock was 0.9 and because the total /dev/sd[a-d] disk sizes were an exact multiple of the partition granularity, mdadm could see the superblock in the right place (end of disk for 0.9) on both /dev/sd[a-d]3, and /dev/sd[a-d], and got quite upset when it used /dev/sd[a-d]. I shrunk the partitions a bit and zeroed the /dev/sd[a-d] 'superblock' that was leftover and it boots cleanly every time now. The 'when /dev/sda3 is being rebuilt' was a red herring - that just happened to be the case each time I booted. Also, I think I asked somewhere if you could store data in the bios boot partition too. It doesn't work as grub-install stomps all over your data. James
participants (3)
-
Arjen Lentz
-
Craig Sanders
-
James Harper