Re: drdb and mdadm and more (was Re: mail storage in a distributed database)

5 Apr 2012

      On Thu, Apr 05, 2012 at 06:31:49PM +1000, Russell Coker wrote:
...
On Thu, 5 Apr 2012, Craig Sanders <cas@taz.net.au> wrote:
...
On Thu, Apr 05, 2012 at 01:44:00PM +1000, Marcus Furlong wrote:
...
We have issues where the monthly mdadm raid check grinds the system to
a halt.
do you find that these monthly cron jobs are actually useful? [...]
deb http://www.coker.com.au squeeze misc
In the above Debian repository for i386 and amd64 I have a version of 
mdadm patched to send email when the disks have different content.    
I am seeing lots of errors from all systems, it seems that the RAID   
code in the kernel is reporting that 128 sectors (64K) of disk space  
is wrong for every error (all reported numbers are multiples of 128).
if mdadm software raid is doing that, then to me it says "don't use
mdadm raid" rather than "stress-test raid every month and hope for the
best".

however, i've been using mdadm for years without seeing any sign of that
(and yes, with the monthly mdadm raid checks enabled. i used to grumble
about it slowing my system down but never made the decision to disable
it).

first question that occurs to me is: is there a bug in the raid code
itself or is the bug in the raid checking code?
...
Also I suspect that the Squeeze kernel has a bug in regard to this.
I'm still tracking it down.
i never really used squeeze for long on real hardware (as opposed to
on VMs)...except in passing when sid was temporarily rather similar
to what squeeze became. and i've always used later kernels - either
custom-compiled or (more recently) by installing the later linux-image
packages.
...
If you have a RAID stripe that doesn't match then you really it to be
fixed even if replacing a disk is not possible.  Having two reads from
the same address on a RAID-1 give different results is a bad thing.
Having the data on a RAID-5 or RAID-6 array change in the process of
recovering from a dead disk is also a bad thing.
true, but as above that's a "don't do that, then" situation. if you are
getting symptoms like the above then either your hardware is bad or your
kernel version is broken. in either case, don't do that. backup your
data immediately and do something else that isn't going to lose your
data.
...
Now the advantage of DRBD is that it's written with split-brain issues
in mind.  The Linux software RAID code is written with the idea that
it's impossible for the two disks to be separated and used at the
same time.  In the normal case this is not possible unless a disk is
physically removed.
yep, and the re-sync is a pain, even with bitmaps.

this interesting article i just spotted this from 2006 may indicate an
alternative.:

ZFS on iscsi

http://www.cuddletech.com/blog/pivot/entry.php?id=566

in short: it's possible to build a zpool using iscsi devices.

whether it's reliable if one of the iscsi devices disappers, i don't
know. zfs already copes well with degraded vdevs...with a mirrored vdev,
it shouldn't be a problem (and fairly easily repaired with zfs online
if it reappears or zfs replace if it's gone for good). with raidz-n, it
would depend on how many disappeared and which ones.

and this far more recent post (Aug 2011):

http://cloudcomputingresourcecenter.com/roll-your-own-fail-over-san-cluster-...

in short: zfs and glusterfs, written by someone who'd given up on drbd.

craig

ps: one of the reasons i love virtualisation is that it makes it so easy
to experiment with this stuff and get an idea of whether it's worthwhile
trying on real hardware. spinning up a few new vms is much less hassle
than scrounging parts to build another test system.

-- 
craig sanders <cas@taz.net.au>

BOFH excuse #336:

the xy axis in the trackball is coordinated with the summer solstice