
On Wed, Apr 10, 2013 at 04:48:17AM +0000, James Harper wrote:
On 2013-04-09 02:40, James Harper wrote:
I have a server that had 4 x 1.5TB disks installed in a RAID5 configuration (except /boot is a 'RAID1' across all 4 disks). One of the disks failed recently and so was replaced with a 3TB disk,
I'd be very wary of running RAID5 on disks >2TB
Remember that, when you have a disk failure, in order to rebuild the array, it needs to scan every sector of every remaining disk, then write to every sector of the replacement disk.
Debian does a complete scan every month anyway. A HP raid controller will basically be constantly (slowly) doing a background scan during periods of low use.
And a full resync on my 4x3TB array only takes 6 hours, so the window is pretty small.
with disks (and raid arrays) of that size, you also have to be concerned about data errors as well as disk failures - you're pretty much guaranteed to get some, either unrecoverable errors or, worse, silent corruption of the data.
Guaranteed over what time period? It's easy to fault your logic as I just did a full scan of my array and it came up clean. If you say you are "guaranteed to get some" over, say, a 10 year period, then I guess that's fair enough. But as you don't specify a timeframe I can't really contest the point. I can say though that I do monitor the SMART values which do track corrected and uncorrected error rates, and by extrapolating those figures I can say with confidence that there is not a guarantee of unrecoverable errors.
this is why error-detecting and error-correcting filesystems like ZFS and btrfs exist - they're not just a good idea, they're essential with the large disk and storage array sizes common today.
see, for example:
The part that says "not visible to the host software" kind of bothers me. AFAICS these are reported via SMART and are entirely visible, with some exceptions of poor SMART implementations.
personally, i wouldn't use raid-5 (or raid-6) any more. I'd use ZFS RAID-Z (raid5 equiv) or RAID-Z2 (raid6 equiv. with 2 parity disks) instead.
Putting the error correction/detection in the filesystem bothers me. Putting it at the block device level would benefit a lot more infrastructure - LVM volumes for VM's, swap partitions, etc. I understand you can run those things on top of a filesystem also, but if you are doing this just to get the benefit of error correction then I think you might be doing it wrong. Actually when I was checking over this email before hitting send it occurred to me that maybe I'm wrong about this, knowing next to nothing about ZFS as I do. Is a zpool virtual device like an LVM lv, and I can use it for things other than running ZFS filesystems on?
actually, i wouldn't have used RAID-5 without a good hardware raid controller with non-volatile write cache - the performance sucks without that - but ZFS allows you to use an SSD as ZIL (ZFS Intent Log or sync. write cache) and as read cache.
For anything for which performance is a constraint I don't use RAID5 at all. This case is an exception in that it stores backup volumes from Bacula (eg streaming writes), and only needs to write as fast as data can come off the 1GBit/sec wire, so disk performance isn't an issue here as my array can easily handle 100mbytes/second streaming writes and backup compression means it never gets sent data that fast anyway.
if performance was more important than capacity, I'd use RAID-1 or so-called raid-"10" or ZFS mirrored disks - a ZFS pool of mirrored pairs is similar to raid-10 but with all the extra benefits (error detection, volume management, snapshots, etc) of zfs.
Yes I use RAID10 almost exclusively these days.
ZFSonLinux just released version 0.61, which is the first release they're happy to say is ready for production use. i've been using prior versions for a year or two now(*) with no problems and just switched from my locally compiled packages to their release .debs (for amd64 wheezy, although they work find with sid too).
Despite my reservations mentioned above, ZFS is still on my (long) list of things to look into and learn about, more so given that you say it is now considered stable :)
BTW, btrfs just got raid5/6 emulation support too...in a year or so (after the early-adopter guinea pigs have discovered the bugs), it could be worth considering that as an alternative. my own personal experience with btrfs raid1 & raid10 emulation was quite bad, but some people swear by it and lots of bugs have been fixed since i last used it. for large disks and large arrays, it's still a better choice than ext3/4 or xfs.
As above, but I'll continue to let others find bugs :) James