
On Wed, 27 Feb 2013, James Harper <james.harper@bendigoit.com.au> wrote:
Last time I tried DRBD it was killing my systems. It seems that the default configuration for DRBD is to reboot a node if certain failure conditions occur - which can be triggered by network problems. I never managed to get it to stop doing that.
There was a debian bug against those default options for exactly the reasons you noted, unless it was an actual code bug and not a default configuration bug that you ran in to?
It was an issue of the default configuration that was hard coded into the utilities or the kernel (can't remember which).
http://etbe.coker.com.au/2012/02/08/more-drbd-performance-tests/
Also in my tests DRBD when the secondary isn't connected gave performance that's suspiciously similar to the performance of a non-DRBD system with Ext4 mounted with the barrier=0 option. Presumably this means that data on a DRBD system will be at the same risk as with barrier=0 on a power failure.
I'm looking at either using bcache or moving the metadata to an ssd to try and avoid these performance problems.
drbd barrier and flush are configurable, and there are lots of warnings about turning them off.
If you read my above blog post you'll notice that default Ext4 performance was 1663, Ext4 barrier=0 was 2875, and a default configuration of DRBD with the secondary disconnected was 2409. I conclude that the data-loss protection that barriers offer resulted in a performance drop from 2875 to 1663 and that the use of DRBD by restoring the performance to 2409 reduced that protectio0n. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/