
On 2013-04-11 02:10, James Harper wrote: [...]
with disks (and raid arrays) of that size, you also have to be concerned about data errors as well as disk failures - you're pretty much guaranteed to get some, either unrecoverable errors or, worse, silent corruption of the data.
Guaranteed over what time period? It's easy to fault your logic as I just did a full scan of my array and it came up clean. If you say you are "guaranteed to get some" over, say, a 10 year period, then I guess that's fair enough. But as you don't specify a timeframe I can't really contest the point. [...]
With a pair of 2TB Western Digital SATA drives in my server, both in RAID 1:
| mattcen@adam:tmp$ zgrep -h 'mismatches found' /var/log/syslog* | sort - n | 2012-02-05T22:05:03.118792+11:00 adam mdadm[1545]: RebuildFinished event detected on md device /dev/md/1, component device mismatches found: 10496 | 2012-03-04T17:00:12.084923+11:00 adam mdadm[1724]: RebuildFinished event detected on md device /dev/md/1, component device mismatches found: 11008 ...
Interesting. And somewhat alarming! What does smartctl -H and smartctl -a report for those drives? Does "11008" mismatches mean that 11008 bytes were found to be different, or that 11008 sectors were found to be different? In either case I would suggest to you that you have a serious problem with your servers and that this is not normal. I have many servers running linux md RAID1 and have never seen such a thing. James