In any case I'm limited to non-ECC ram by the form factor of my bookshelf..

I wonder if better hardware tests would be an area worth looking into, for example, monthly online memory/CPU tests, etc?

I wonder also if there are deterministic tests we can proactively do to catch corruptions at a higher level, for example scanning any file type that includes a checksum eg .zip for corruptions and comparing to previous runs.

It seems the problem is uncaught hardware failure. If we minimize the window the failure is unknown then we increase the change of being able to compare the source to backups and recover information.

On 2 July 2014 13:54, Danny Robson <danny@nerdcruft.net> wrote:

On Wed, 2 Jul 2014 12:34:48 +1000

"Noah O'Donoghue" <noah.odonoghue@gmail.com> wrote:

> Also, if I am going to error check memory then why stop at the file
> server? It means I have to have ECC memory in all clients that touch
> the data, including mobile devices, to cater for data corruption in
> RAM being written to disk.

The file server is a good halfway measure given it's the biggest single
point of failure. If a client experiences memory failure then only
their data is corrupted. If the server experiences memory failure then
all the clients have potential issues.

I imagine the point at which ECC memory starts making sense depends
entirely on your workload and the number of clients.

_______________________________________________
luv-main mailing list
luv-main@luv.asn.au
http://lists.luv.asn.au/listinfo/luv-main