
On 10/04/12 11:45, Russell Coker wrote:
On Tue, 10 Apr 2012, Toby Corkindale<toby.corkindale@strategicdata.com.au> wrote:
On 05/04/12 17:42, Craig Sanders wrote:
Overall ganeti is really nice, but it feels like drbd has some missing pieces that would help in debugging issues.
i'm wondering if iscsi kind of obsoletes drbd, and if mdadm raid1 over two iscsi exports would be better than drbd.
Oooh, no, don't do that. We've tried it. It didn't work out.
It sounds like a good idea at first, but every time you need to reboot one or other of the iscsi targets (eg. for kernel updates or suchlike) you'll need to rebuild the RAID array, and the performance of that over ethernet blows.
If you use an internal bitmap to indicate which parts of the RAID aren't synchronised then there shouldn't be much data to transfer.
http://www.coker.com.au/bonnie++/zcav/results.html
Also given that the maximum contiguous transfer rates I've seen are under 120MB/s it seems unlikely that GigE is going to be a significant bottleneck. I'm sure that there are disks that are faster than the 1TB disk I tested, but it should be noted that the inner tracks of that 1TB disk were about half GigE speed. Also when synchronising a RAID array if performance matters then you probably have other load which means that synchronisation speed is well below the maximum speed of the disk.
If you're designing one of these systems so that you have high availability of your system, then it's because you do have lots of I/O all the time and can't afford to stop it during a RAID rebuild. The random i/o interspersed with the rebuild i/o has the effect of totally trashing the rebuild performance. Random i/o over iscsi has sucked on the stable Debian kernels. (I believe the better-performing iscsi drivers (which are a totally independent rewrite) have finally made it into wheezy though.) And so, you end up in the situation where the real I/O performs badly, AND the rebuild takes so long that you're concerned there's a sizeable window where another disk error could occur.
Some people claim that RAID bitmaps hurt performance, I haven't yet tested that. But a full RAID rebuild is going to seriously hurt performance for a long time, so if performance matters it's probably best to have a small loss all the time than a large loss for the hours or days required for a full rebuild. Also note that a long rebuild increases the probability of a second failure while it's rebuilding...
Agreed with your sentiment about better to have a small performance loss constantly (that you can design for) rather than occasional massive perf loss. If you try it with the RAID bitmap over iscsi, I'd be interested to hear how it works out for you.. In the long run, I think cluster filesystems are a better bet though. Still waiting on GlusterFS, Ceph, etc to reach maturity :( Toby