RE: RAID-1 synchronisation

5 Feb 2012

      ...
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda2[0] sdb2[1]
      2917680447 blocks super 1.2 [2/2] [UU]
      [===========>.........]  check = 55.9% (1631568128/2917680447)
finish=765.1min speed=28013K/sec
      bitmap: 1/22 pages [4KB], 65536KB chunk
It seems that this is from /etc/cron.d/mdadm having a checkarray command
which runs on the 3rd of the month, my slowest server didn't complete that
in a reasonable amount of time while the other servers which aren't disk IO
bound completed it before I noticed.
Of course... I thought it was the 1st Sunday in the month, but maybe that's just a Debian thing

# By default, run at 00:57 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --
all --idle --quiet; fi

Is the server that much slower or that much more i/o bound that it would make a significant difference? You could still have a hardware problem causing a drop in disk IOPS.
...
The question is whether the checkarray command does any good.  I've run a
lot of systems with Linux software RAID and don't recall ever seeing it do any
good.  While a multi-day cron job with performance implications is going to
do some harm.
There is obviously the performance hit to consider, but I bet the failure rate of disks is higher on the 1st Sunday of the month (or whenever your distribution automatically schedules it) than at other times.

One thing it does do for you is 'touch' unused blocks, and finding that those are bad now rather than later is better IMO. Also, verifying consistency and finding that you have a silent corruption problem early can only be a good thing. This is especially important for RAID5 without battery backed write cache as it can detect the RAID5 write-hole (http://en.wikipedia.org/wiki/RAID_5_write_hole). Maybe write-intent bitmaps get around this these days though?

I wonder if you can fiddle with the settings to only use a smaller amount of idle bandwidth (lower than --idle)? (if there is such a thing as idle bandwidth on your system)

James

RE: RAID-1 synchronisation

James Harper