
On Mon, 6 Apr 2015 01:42:33 PM Douglas Ray wrote:
Years ago I had an internal disk giving corrupted large files, which turned out to be a RAM problem which only showed up on large file accesses. Intermittent RAM fault seemed to have fallen on statically allocated kernel buffers which only got accessed on large file writes (and may be reads, can't remember).
Filesystems like BTRFS and ZFS make it easier to detect these problems. I had one system that gave unusual BTRFS consistency errors on 2 occasions and the BTRFS developers suggested testing memory. Memtest86+ reported errors and one of the DIMMs had Memtest86+ errors when moved to another system. I replaced that DIMM and things worked a lot better afterwards. Filesystems that don't have checksums on all data and metadata (IE everything other than BTRFS and ZFS) will just get corrupted files when such things happen. As an aside, the ZFS "resilver" operation can really mess things up if you run it when you have memory errors. ECC RAM is a really good thing. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/