
On Tue, 26 Feb 2013, James Harper <james.harper@bendigoit.com.au> wrote:
I think drbd is going to be the best way to go, especially as it's part of Linux these days. My plan is:
Last time I tried DRBD it was killing my systems. It seems that the default configuration for DRBD is to reboot a node if certain failure conditions occur - which can be triggered by network problems. I never managed to get it to stop doing that.
There was a debian bug against those default options for exactly the reasons you noted, unless it was an actual code bug and not a default configuration bug that you ran in to?
In the default mode of operation DRBD writes everything synchronously to the secondary system, if the link between the systems is slow and your primary system is doing synchronous writes (database server or mail server) then it's going to suck. DRBD supports an asynchronous mode which due to bugs was slower than the synchronous mode in my tests with a local GigE network, maybe it's asynchronous support wouldn't look as bad with a slow network.
http://etbe.coker.com.au/2012/02/08/more-drbd-performance-tests/
Also in my tests DRBD when the secondary isn't connected gave performance that's suspiciously similar to the performance of a non-DRBD system with Ext4 mounted with the barrier=0 option. Presumably this means that data on a DRBD system will be at the same risk as with barrier=0 on a power failure.
I'm looking at either using bcache or moving the metadata to an ssd to try and avoid these performance problems. drbd barrier and flush are configurable, and there are lots of warnings about turning them off.
On Tue, 26 Feb 2013, "Trent W. Buck" <trentbuck@gmail.com> wrote:
What I probably WOULD try is to lvm snapshot a day before, and sync that. It will be incomplete and incoherent, but you don't care because on the day you rsync --only-write-batch against the snapshot and then upload only the diff and apply it. Since only the changed blocks from the last 24h have changed, that ought to reduce the downtime.
You mean running rsync on a block device? Is that even possible?
rsync can do block devices with a few patches here and there. lvmsync (https://github.com/mpalmer/lvmsync if you missed it the first time) appears to be a much better option than this if you are actually using lvm - the procedure is: . take a snapshot . dd the snapshot to the destination, at your leisure . take the vm offline . use lvmsync to copy the changes lvmsync looks at the snapshot, figures out what extents the snapshot holds (which are by definition in the snapshot because they have changed in the original) and copies the matching original extents to the destination. I tested drbd last night and the performance of the vm dropped to the point where it may as well have been offline, so it looks like I'll be doing my next test with lvmsync. drbd have a ($$$) proxy option that would make this much better in that it allows a fairly huge buffer to build up in the case where there is congestion between the primary and secondary, meaning things don't slow down to the speed of the link. The value of $$$ isn't particularly small though, especially for a one-off migration. James