
On Tue, 1 Jul 2014 10:03:44 Rohan McLeod wrote:
Noah O'Donoghue wrote:
Hi All,
After reading about bitrot and feeling guilty for storing my most valuable data on cheap drives (although with backups!) I've been thinking about moving to something more resilient.
Out of curiosity I googled "bitrot" and whilst there seems to be some usage of "bitrot" in relation to RAM; mostly it seems to be in the context of storage media. As a novice I found:
http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows -inside-next-gen-filesystems/
http://en.wikipedia.org/wiki/Write_Anywhere_File_Layout That article claims that ZFS is the oldest of the "next generation filesystems". WAFL did it first and NetApp (the developer of WAFL) sued Sun alleging patent violation in ZFS.
informative; but apart from a suggestion that it might be related to 'cosmic rays' and thermal magnetic effects; couldn't seem to find (a) a definition which is a measure of bitrot and (b) actual measures of this phenomenon in various media and differing conditions.
Presumably as a probabilistic phenomenon; bitrot might be defined in terms of the half-life of the data ?
http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html The above paper is the best reference I've seen. Half-life isn't a good measure as you can expect to lose ~50 sectors at a time on a TB+ disk. On Tue, 1 Jul 2014 12:29:39 Peter Ross wrote:
For Russell: Have you seen this?
The first TODO entry is about file(1) and magic.
That is about ZFS dump files (the output of "zfs send") not the block devices. As I have never run zfs send and don't have any immediate plans to do so this hasn't been a concern for me. Thanks for the suggestion though, I've attached it to the Debian bug report. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker wrote:
On Tue, 1 Jul 2014 10:03:44 Rohan McLeod wrote:
Noah O'Donoghue wrote:
Hi All,
After reading about bitrot and feeling guilty for storing my most valuable data on cheap drives (although with backups!) I've been thinking about moving to something more resilient. snip informative; but apart from a suggestion that it might be related to 'cosmic rays' and thermal magnetic effects; couldn't seem to find (a) a definition which is a measure of bitrot and (b) actual measures of this phenomenon in various media and differing conditions.
Presumably as a probabilistic phenomenon; bitrot might be defined in terms of the half-life of the data ? http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html
The above paper is the best reference I've seen. Half-life isn't a good measure as you can expect to lose ~50 sectors at a time on a TB+ disk. Thanks for responding Russell This paper doesn't seem to distinguish between corruption attributable to drive 'malfunction' and corruption which would have happened anyway ? eg while the drive was switched off But I notice : "/(ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality,"/ Presumably bitrot due to temperature would not show " high spatial locality " ? https://en.wikipedia.org/wiki/Curie_temperature ; so perhaps we can deduce cosmic rays a more likely cause, than temperature ?
regards Rohan McLeod

On Tue, 1 Jul 2014 14:30:07 Rohan McLeod wrote:
Presumably as a probabilistic phenomenon; bitrot might be defined in terms of the half-life of the data ?
http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html
The above paper is the best reference I've seen. Half-life isn't a good measure as you can expect to lose ~50 sectors at a time on a TB+ disk.
Thanks for responding Russell This paper doesn't seem to distinguish between corruption attributable to drive 'malfunction' and corruption which would have happened anyway ? eg while the drive was switched off
Their business is in running drives 24*7, as is almost everyone who would have such statistics. Someone might do some research on drives maintaining data when turned off, but it seems unlikely that someone would have good access to both types of data. Also they only care about data loss not why it happens. Maybe someone at archive.org could do something in that area.
But I notice : "/(ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality,"/ Presumably bitrot due to temperature would not show " high spatial locality " ?
Silent bitrot doesn't tend to happen due to temperature, the Curie point would be well above the temperature for mechanical failure. In the disks that I've seen fail when overheated there has been obvious spatial locality, but I haven't seen a statistically significant sample.
https://en.wikipedia.org/wiki/Curie_temperature ; so perhaps we can deduce cosmic rays a more likely cause, than temperature ?
Cosmic rays doesn't seem a likely cause. The steel case of a PC, the steel case of the hard drive, and whatever building you live in all cut down radiation. Something going wrong when data is written seems to be the most likely cause. Although it's worth noting that the ZFS "resilver" option must be there for a reason so I guess there's some data loss over time. But that might be due to writing adjacent tracks. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (2)
-
Rohan McLeod
-
Russell Coker