
On Sat, 25 Feb 2012, Daniel Pittman <daniel@rimspace.net> wrote:
The issue with doing this is that you will have two nodes writing actively when you have a network problem, so you will be unable to sync data back together correctly. The same issue, in fact, that any two node cluster has in retaining function during a network split.
You will only have two nodes writing at once if the secondary is made primary while the link is down. If the process of resource failover is managed by a cluster manager which does something smarter than just looking at a single network interface (*) then it may be able to deal with this when DRBD can't. Also there's the case where you have a manual failover in which case the person doing that can determine when there is either no data loss or acceptable data loss. The situation I'm trying to deal with is where there is an outage which isn't even long enough to raise a NAGIOS alert. It would be nice if DRBD could just keep working in that situation. In this case the risk of data loss is mitigated by the fact that any network problem which can prevent the DRBD code from communicating would also prevent writes as the daemons which write to DRBD filesystems communicate via the same network.
This is why two nodes and HA don't really go together in most cases: you can't handle a whole bunch of problems in that case. Though, you might find that turning off the DRBD handling and using pacemaker to manage connectivity over some alternate media helps improve general reliability.
Thanks, I'll investigate that. (*) Does any cluster manager do that? It is theoretically possible to use Ethernet bonding to make a single device out of multiple ethernet ports/switches. But in practice there are many situations where that isn't possible, among other things last time I tested it (years ago) I had some problems with certain ethernet cards and it seemed to rely on a working router. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/