Re: drbd rebooting

On Sat, 25 Feb 2012, Daniel Pittman <daniel@rimspace.net> wrote:
Adjust the behaviour settings in drbd.conf around split brain; they have a bunch of configuration choices. See the "handlers" in the manual for the situations and responses.
What does drbd consider to be a split brain situation? root@nodeb# iptables -A OUTPUT -d $NODEA -j DROP I've setup a couple of nodes running under Xen. I ran the above command and since then I've had write commands on the ext4 filesystem mounted on nodea block, and I'm seeing lots of messages like the following in the kernel message log on nodea: [ 1831.024174] block drbd0: [drbd0_worker/844] sock_sendmsg time expired, ko = 4294967162 After 960 seconds (and some kernel panics from the ext4 code) it restarted itself. It seems that netfilter isn't catching all kernel generated packets because it managed to synchronise again. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Sat, 25 Feb 2012, Daniel Pittman <daniel@rimspace.net> wrote:
Adjust the behaviour settings in drbd.conf around split brain; they have a bunch of configuration choices. See the "handlers" in the manual for the situations and responses.
What does drbd consider to be a split brain situation?
If both nodes have been accessed as primary without synchronising with each other. You could test this like: 1. Shut down node B. 2. Make resource primary on node A 3. Write something to the resource 4. Shut down node A 5. Start up node B 6. Make resource primary on node B 7. Write something to the resource 8. Start up node A You'll probably need to do some forcing when you start up node B as it will tend to want to wait for a long time. In any case, your brain will be well and truly split at this point. You could probably construct a few other scenario's too but this should be enough to do your testing. James

On Sun, 26 Feb 2012, James Harper <james.harper@bendigoit.com.au> wrote:
What does drbd consider to be a split brain situation?
If both nodes have been accessed as primary without synchronising with each other. You could test this like:
I have a server which is NEVER getting such a situation. It is running with one node as primary and the other as secondary and then suddenly the split- brain handler is called for no good reason that I can determine.
1. Shut down node B. 2. Make resource primary on node A 3. Write something to the resource 4. Shut down node A 5. Start up node B 6. Make resource primary on node B 7. Write something to the resource 8. Start up node A
I believe that will require that the data store is invalidated for one node before it can connect again. That's not what I'm interested in here. I'm interested in why the primary node spontaneously reboots. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (2)
-
James Harper
-
Russell Coker