btrfs/ZFS, sans raid and bitrot

older
Re: btrfs/ZFS, sans raid and bitrot

Noah O'Donoghue

30 Jun 2014 30 Jun '14

8:21 a.m.

Hi All, After reading about bitrot and feeling guilty for storing my most valuable data on cheap drives (although with backups!) I've been thinking about moving to something more resilient. My current setup is a Ubuntu laptop, with 2 external drives. 1X 2TB ext4 for data storage 1X 3TB ext4 for backup (using Crashplan commercial backup software). My question, is if I change the first drive to btrfs or ZFS, will I gain resiliency from bitrot? My understanding is I need 2 drives in at least a RAID 1 to get automatic healing from bitrot, but if I at least use a filesystem with check summing support then I will be able to at least restore my affected files from my Crashplan backups (which are compressed then checksummed and regularly checked for errors automatically) and I won't have the risk of my main drive corrupting my backups, because the read will FAIL if it doesn't pass the checksum. Is my understanding correct? Cheers, Noah

Attachments:

attachment.html (text/html — 1.1 KB)

Show replies by date

Russell Coker

30 Jun 30 Jun

10:12 a.m.

On Mon, 30 Jun 2014 18:21:06 Noah O'Donoghue wrote:

...

My current setup is a Ubuntu laptop, with 2 external drives.

1X 2TB ext4 for data storage 1X 3TB ext4 for backup (using Crashplan commercial backup software).

My question, is if I change the first drive to btrfs or ZFS, will I gain resiliency from bitrot?

BTRFS in a default configuration will use "dup" for metadata. So a bad metadatablock can be corrected. But a bad data block causes data loss - at least you know you have data loss (as opposed to silent data corruption on older filesystems). ZFS has one more copy of metadata than of data. If you have 1 copy of data (the default) then you have 2 copies of metadata. If you want protection against read errors or data corruption on a dingle disk with ZFS you can use the "copies=" option to use multiple copies. The number of copies of metadata blocks is 1 greater than the number of copies of data. The copies= option can be set on a per "filesystem" (where a ZFS "filesystem" is like a subdirectory on a traditional filesystem) basis.

...

My understanding is I need 2 drives in at least a RAID 1 to get automatic healing from bitrot,

No, "dup" for BTRFS metadata and "copies=" for data on ZFS give you this on a single disk.

...

but if I at least use a filesystem with check summing support then I will be able to at least restore my affected files from my Crashplan backups (which are compressed then checksummed and regularly checked for errors automatically) and I won't have the risk of my main drive corrupting my backups, because the read will FAIL if it doesn't pass the checksum.

Yes. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Noah O'Donoghue

10:25 a.m.

On 30 June 2014 20:12, Russell Coker <russell@coker.com.au> wrote:

...

No, "dup" for BTRFS metadata and "copies=" for data on ZFS give you this on a single disk.

Ouch, that sounds like it could lead to some "spiral of death" type scenarios as a failing drive continually tries to write to; a failing drive.. On reading more about ZFS, is it true the latest source code isn't available for ZFS? So Sun is withholding new features, fixes etc from the codebase?

Russell Coker

10:43 a.m.

On Mon, 30 Jun 2014 20:25:31 Noah O'Donoghue wrote:

...

On 30 June 2014 20:12, Russell Coker <russell@coker.com.au> wrote:

...
No, "dup" for BTRFS metadata and "copies=" for data on ZFS give you this on a single disk.

Ouch, that sounds like it could lead to some "spiral of death" type scenarios as a failing drive continually tries to write to; a failing drive..

http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html I don't think you are at risk of that. The above paper shows that usually drive corruption involves a small number of sectors. ~50 errors out of a 3TB disk isn't much. The "dup" and "copies=" option just allows you to have a copy of each important block that's not on one of those 50 sectors.

...

On reading more about ZFS, is it true the latest source code isn't available for ZFS? So Sun is withholding new features, fixes etc from the codebase?

Not sure. But the current version works pretty well, better than any other filesystem for large amounts of storage where reliability is desired. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Anthony Shipman

12:28 p.m.

On Mon, 30 Jun 2014 08:12:45 pm Russell Coker wrote:

...

BTRFS in a default configuration will use "dup" for metadata. So a bad metadatablock can be corrected. But a bad data block causes data loss - at least you know you have data loss (as opposed to silent data corruption on older filesystems).

What does data loss look like? Does the whole file become unavailable or are there just error codes delivered along with the data? -- Anthony Shipman Mamas don't let your babies als@iinet.net.au grow up to be outsourced.

Russell Coker

1:24 p.m.

On Mon, 30 Jun 2014 22:28:52 Anthony Shipman wrote:

...

On Mon, 30 Jun 2014 08:12:45 pm Russell Coker wrote:

...
BTRFS in a default configuration will use "dup" for metadata. So a bad metadatablock can be corrected. But a bad data block causes data loss -

at least you know you have data loss (as opposed to silent data corruption on older filesystems).

What does data loss look like? Does the whole file become unavailable or are there just error codes delivered along with the data?

If only data blocks are corrupted and you read from a part of the file that doesn't contain those blocks then the kernel won't know that some other part of the file is corrupted. So it's possible to get some data back from a corrupted file. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Allan Duncan

1 Jul 1 Jul

12:47 a.m.

On 30/06/14 23:24, Russell Coker wrote:

...

On Mon, 30 Jun 2014 22:28:52 Anthony Shipman wrote:

...
On Mon, 30 Jun 2014 08:12:45 pm Russell Coker wrote:

...
BTRFS in a default configuration will use "dup" for metadata. So a bad metadatablock can be corrected. But a bad data block causes data loss -

at least you know you have data loss (as opposed to silent data corruption on older filesystems).

What does data loss look like? Does the whole file become unavailable or are there just error codes delivered along with the data?

If only data blocks are corrupted and you read from a part of the file that doesn't contain those blocks then the kernel won't know that some other part of the file is corrupted. So it's possible to get some data back from a corrupted file.

You can "easily" recover the rest of the data on any file system if you are prepared to go low level unless the sector in question is a hard write error, and even then I think it will work if the drive's firmware does a bad sector remap. Use dd with the noerror flag to read the physical sector (I have had a case where a one of the eight logical sectors was dud) then write it back, and hey presto the file can be read. Actually in the case above it was in a fat32 partition directory block and we got a lot of files back. Glory to bootable linux sticks.

Russell Coker

30 Jun 30 Jun

10:27 a.m.

On Mon, 30 Jun 2014 18:21:06 Noah O'Donoghue wrote:

...

After reading about bitrot and feeling guilty for storing my most valuable data on cheap drives (although with backups!) I've been thinking about moving to something more resilient.

Another thing you should consider is the possibility of bitrot inside your PC. A while ago I had a damaged DIMM in my PC and it corrupted the BTRFS filesystem twice before I realised the cause. As BTRFS and ZFS are more complex than most filesystems there are more ways that things can go wrong in the face of sustained random corruption. If you use the "resilver" option in ZFS (to read and write-back data to cover the case where magnetic fields fail over time) and have memory errors it can write back bad data. http://www.dell.com/au/business/p/poweredge-t110-2/pd The Dell PowerEdge T110 is a cheap system that takes ECC RAM. It's worth considering for a home ZFS or BTRFS file server, I have one running a BTRFS RAID-1 array on 2*3TB disks. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Noah O'Donoghue

10:47 a.m.

On 30 June 2014 20:27, Russell Coker <russell@coker.com.au> wrote:

...

Another thing you should consider is the possibility of bitrot inside your PC. A while ago I had a damaged DIMM in my PC and it corrupted the BTRFS filesystem twice before I realised the cause.

Wouldn't this still cause detectable bitrot though? If the ram somehow corrupts data being written to disk, it's not going to be able to write a checksum that matches, so something is going to fail at the next read?

Russell Coker

11:04 a.m.

On Mon, 30 Jun 2014 20:47:28 Noah O'Donoghue wrote:

...

On 30 June 2014 20:27, Russell Coker <russell@coker.com.au> wrote:

...
Another thing you should consider is the possibility of bitrot inside your PC. A while ago I had a damaged DIMM in my PC and it corrupted the BTRFS filesystem twice before I realised the cause.

Wouldn't this still cause detectable bitrot though? If the ram somehow corrupts data being written to disk, it's not going to be able to write a checksum that matches, so something is going to fail at the next read?

It is possible to have a data block corrupted just before the checksum is calculated, that wouldn't register as a filesystem error. In the cases I know of the filesystem metadata blocks didn't match each other and the kernel paniced. I copied all the data off the filesystem both times but I have no way of ever knowing if some data was corrupted first. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

trentbuck＠gmail.com

1 Jul 1 Jul

1:26 a.m.

"Noah O'Donoghue" <noah.odonoghue@gmail.com> writes:

...

After reading about bitrot and feeling guilty for storing my most valuable data on cheap drives (although with backups!) I've been thinking about moving to something more resilient.

My current setup is a Ubuntu laptop, with 2 external drives.

1X 2TB ext4 for data storage 1X 3TB ext4 for backup (using Crashplan commercial backup software).

IME USB caddies do not pass SMART through to the drive. Get eSATA and run a short self test once a month (and check the results), so you know when a drive is dying.

trentbuck＠gmail.com

1:29 a.m.

trentbuck@gmail.com (Trent W. Buck) writes:

...

"Noah O'Donoghue" <noah.odonoghue@gmail.com> writes:

...
After reading about bitrot and feeling guilty for storing my most valuable data on cheap drives (although with backups!) I've been thinking about moving to something more resilient.

My current setup is a Ubuntu laptop, with 2 external drives.

1X 2TB ext4 for data storage 1X 3TB ext4 for backup (using Crashplan commercial backup software).

IME USB caddies do not pass SMART through to the drive. Get eSATA and run a short self test once a month (and check the results), so you know when a drive is dying.

Derp. I missed that it was a laptop. I guess for SMART your only option is to put the drives into a "real" computer (e.g. a NAS chassis) instead of just a bus adapter. I had a look at them recently and I couldn't find one I liked that could also be reflashed with a real distro, so I gave up.

Noah O'Donoghue

7:53 a.m.

On Tuesday, July 1, 2014, Trent W. Buck <trentbuck@gmail.com> wrote:

...

Derp. I missed that it was a laptop. I guess for SMART your only option is to put the drives into a "real" computer (e.g. a NAS chassis) instead of just a bus adapter.

Actually, it is one of those rare laptops with esata ports, so the main (data) drive is on esata while the backup is on USB. As the backup software does it's own verifications this should suffice. -Noah

4018

Age (days ago)

4019

Last active (days ago)

List overview

Download

12 comments

5 participants

participants (5)

Allan Duncan
Anthony Shipman
Noah O'Donoghue
Russell Coker
trentbuck＠gmail.com