
On Tue, 10 Apr 2012, Toby Corkindale <toby.corkindale@strategicdata.com.au> wrote:
It sounds like a good idea at first, but every time you need to reboot one or other of the iscsi targets (eg. for kernel updates or suchlike) you'll need to rebuild the RAID array, and the performance of that over ethernet blows.
If you use an internal bitmap to indicate which parts of the RAID aren't synchronised then there shouldn't be much data to transfer.
http://www.coker.com.au/bonnie++/zcav/results.html
Also given that the maximum contiguous transfer rates I've seen are under 120MB/s it seems unlikely that GigE is going to be a significant bottleneck. I'm sure that there are disks that are faster than the 1TB disk I tested, but it should be noted that the inner tracks of that 1TB disk were about half GigE speed. Also when synchronising a RAID array if performance matters then you probably have other load which means that synchronisation speed is well below the maximum speed of the disk.
If you're designing one of these systems so that you have high availability of your system, then it's because you do have lots of I/O all the time and can't afford to stop it during a RAID rebuild.
Yes, that is why bitmaps are a good thing.
The random i/o interspersed with the rebuild i/o has the effect of totally trashing the rebuild performance. Random i/o over iscsi has sucked on the stable Debian kernels. (I believe the better-performing iscsi drivers (which are a totally independent rewrite) have finally made it into wheezy though.)
How can someone write iSCSI drivers that hurt random IO? The reports I have seen about command queuing in hard drives indicate that it generally doesn't give more than about a 10% benefit. So an iSCSI driver that lacks command queuing and loses about 10% probably wouldn't count as making performance suck.
In the long run, I think cluster filesystems are a better bet though. Still waiting on GlusterFS, Ceph, etc to reach maturity :(
I think that BigTable type systems are the way to go. It seems that cluster filesystems generally either try for full POSIX compliance or implement a sub- set that doesn't match the sub-set you want. When applications use Cassandra or other distributed database technologies they can relax the consistency requirements as they wish. For example when designing a mail server there is no need to have the creation of a new message appear instantly, it just has to reliably appear. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/