
On Sun, Jan 19, 2020 at 05:34:46PM +1100, russell@coker.com.au wrote:
I generally agree that RAID-1 is the way to go. But if you can't do that then BTRFS "dup" and ZFS "copies=2" are good options, especially with SSD.
I don't see how that's the case, how it can help much (if at all). Making a second copy of the data on the same drive that's failing doesn't add much redundancy, but does add significantly to the drive's workload (increasing the risk of failure). It might be ok on a drive with only a few bad sectors or in conjunction with some kind of RAID, but it's not a substitute for RAID.
So far I have not seen a SSD entirely die, the worst I've seen is a SSD stop
I haven't either, but I've heard & read of it. Andrew's rootfs SSD seems to have died (or possibly just corrupted so badly it can't be mounted. i'm not sure) I've seen LOTS of HDDs die. Even at home I've had dozens die on me over the years - I've got multiple stacks of dead drives of various ages and sizes cluttering up shelves (mostly waiting for me to need another fridge magnet or shiny coffee-cup coaster :)
I've also seen SSDs return corrupt data while claiming it to be good, but not in huge quantities.
That's one of the things that btrfs and zfs can detect...and correct if there's any redundancy in the storage.
For hard drives also I haven't seen a total failure (like stiction) for many years. The worst hard drive problem I've seen was about 12,000 read errors, that sounds like a lot but is a very small portion of a 3TB disk and "dup" or "copies=2" should get most of your data back in that situation.
If a drive is failing, all the read or write re-tries kill performance on a zpool, and that drive will eventually be evicted from the pool. Lose enough drives, and your pool goes from "DEGRADED" to "FAILED", and your data goes with it. craig -- craig sanders <cas@taz.net.au>