
I run a bunch of servers with Linux software RAID-1. I use bitmaps on all of them because the ongoing overhead (*) of bitmaps is better than the occasional overhead of a full resync. Recently one of my servers suddenly decided to do a complete RAID-1 resync for no apparent reason. Other servers with the same versions of all software (Debian/Squeeze with all updates) didn't do it. The server in question did crash a few times recently (**). Is a server crash likely to result in an entire RAID resync even when bitmaps are used? Does anyone have any advice other than throwing the server in the bin? (*) I really doubt that the overhead is as bad as some people claim. I plan to test it but haven't had time so far. (**) Currently dmesg output includes the following: [87347.834590] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [87347.844958] BUG: soft lockup - CPU#0 stuck for 94s! [swapper:0] I'm not sure if this is related to the crashes. I suspected a problem with eth1 and turned off TSO etc which seems to have helped. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/