Re: drbd rebooting

25 Feb 2012

      On Sat, 25 Feb 2012, Daniel Pittman <daniel@rimspace.net> wrote:
...
The issue with doing this is that you will have two nodes writing
actively when you have a network problem, so you will be unable to
sync data back together correctly.  The same issue, in fact, that any
two node cluster has in retaining function during a network split.
You will only have two nodes writing at once if the secondary is made primary 
while the link is down.  If the process of resource failover is managed by a 
cluster manager which does something smarter than just looking at a single 
network interface (*) then it may be able to deal with this when DRBD can't.  
Also there's the case where you have a manual failover in which case the 
person doing that can determine when there is either no data loss or 
acceptable data loss.

The situation I'm trying to deal with is where there is an outage which isn't 
even long enough to raise a NAGIOS alert.  It would be nice if DRBD could just 
keep working in that situation.  In this case the risk of data loss is 
mitigated by the fact that any network problem which can prevent the DRBD code 
from communicating would also prevent writes as the daemons which write to 
DRBD filesystems communicate via the same network.
...
This is why two nodes and HA don't really go together in most cases:
you can't handle a whole bunch of problems in that case.  Though, you
might find that turning off the DRBD handling and using pacemaker to
manage connectivity over some alternate media helps improve general
reliability.
Thanks, I'll investigate that.

(*) Does any cluster manager do that?  It is theoretically possible to use 
Ethernet bonding to make a single device out of multiple ethernet 
ports/switches.  But in practice there are many situations where that isn't 
possible, among other things last time I tested it (years ago) I had some 
problems with certain ethernet cards and it seemed to rely on a working 
router.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: drbd rebooting

Russell Coker