Re: btrfs/ZFS, sans raid and bitrot

newer
Re: Android/PC editing

older
Reliable data storage

Peter Ross

1 Jul 2014 1 Jul '14

2:29 a.m.

From: "Noah O'Donoghue" <noah.odonoghue@gmail.com>

...

On reading more about ZFS, is it true the latest source code isn't available for ZFS? So Sun is withholding new features, fixes etc from the codebase?

AFAIK, the open source ZFS "lives" on illumos, FreeBSD and zfsonlinux, coordinated via OpenZFS. The development seems to be independent from Sun/Oracle these days. I am not aware of active contributions from Oracle but I am not 100% sure. E.g. newest open source ZFS versions have "feature flags" instead of the version numbers as used by Oracle. For Russell: Have you seen this? https://wiki.freebsd.org/ZFS The first TODO entry is about file(1) and magic. Regards Peter

Show replies by date

Noah O'Donoghue

2 Jul 2 Jul

2:34 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On 1 July 2014 12:29, Peter Ross <Petros.Listig@fdrive.com.au> wrote:

...

The development seems to be independent from +Sun/Oracle these days. I am not aware of active contributions from Oracle but I am not 100% sure.

I think this rules out ZFS for me... The main thing in ZFS's favor was to have it backed by sun, but if it's been forked and is going off in it's own direction then that kinda puts it on equal footing with btrfs from my perspective. To address Russell's comment on ECC RAM, I think I'm going to take the position that it's probably taking it a bit too far, at least until I see some research on non-ECC memory causing bit-rot on checksummed file systems. I tend to think faulty ram is going to become obvious and not hide beneath the surface, and result in symptoms like the kernel panics that Russell experienced. Also, if I am going to error check memory then why stop at the file server? It means I have to have ECC memory in all clients that touch the data, including mobile devices, to cater for data corruption in RAM being written to disk.

Brian May

3:08 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On 2 July 2014 12:34, Noah O'Donoghue <noah.odonoghue@gmail.com> wrote:

...

I tend to think faulty ram is going to become obvious and not hide beneath the surface, and result in symptoms like the kernel panics that Russell experienced.

I have seen at least one computer where memory errors were resulting in silent corruption of files. This going to be the new file server, but fortunately I noticed random seg faults occurring before deploying. Didn't get any kernel panics. In fact I already had Samba up and running, and it seemed fine. Didn't initially realize files were silently being corrupted until later on in the debugging process (from memory). This was 2007.

...

From my email, sent internally. I can't remember why I was trying to compile ssh at the time, suspect I may not have noticed the problem otherwise. After this I convinced management to purchase a new computer for the file server.

=== cut === root@apple2:~# aptitude install dpkg-dev Reading package lists... Done Segmentation faulty tree... 50% root@apple2:~# mv /var/cache/apt/*.bin /tmp/ root@apple2:~# aptitude install dpkg-dev Reading package lists... Done Segmentation faulty tree... 50% root@apple2:~# mv /var/cache/apt/*.bin /tmp/ root@apple2:~# mv /var/cache/apt/*.bin /tmp/ mv: cannot stat `/var/cache/apt/*.bin': No such file or directory root@apple2:~# aptitude install dpkg-dev Reading package lists... Done Building dependency tree... Done Reading extended state information Initializing package states... Done Building tag database... Done The following packages have been kept back: linux-image-powerpc linux-restricted-modules-powerpc 0 packages upgraded, 0 newly installed, 0 to remove and 2 not upgraded. Need to get 0B of archives. After unpacking 0B will be used. Writing extended state information... Done /bin/sh: line 1: 10560 Segmentation fault /usr/sbin/dpkg-preconfigure --apt I also get random errors in the initial stages of trying to build a package: dpkg-source: building openssh using existing openssh_4.2p1.orig.tar.gz dpkg-source: building openssh in openssh_4.2p1-7ubuntu3.1.bam.1.diff.gz dpkg-source: internal error: unknown line from diff -u on configure: `' ----- dpkg-source: building openssh using existing openssh_4.2p1.orig.tar.gz dpkg-source: building openssh in openssh_4.2p1-7ubuntu3.1.bam.1.diff.gz /usr/bin/dpkg-buildpackage: line 173: 10917 Segmentation fault "$@" ---- po2debconf debian/openssh-server.templates.master > debian/openssh-server.templates /bin/sh: line 1: 11529 Segmentation fault po2debconf debian/openssh-server.templates.master >debian/openssh-server.templates make: *** [clean] Error 139 ---- po2debconf debian/openssh-server.templates.master > debian/openssh-server.templates execute_command: bad command type: 1073741828 Aborting.../bin/sh: line 1: 11939 Aborted po2debconf debian/openssh-server.templates.master >debian/openssh-server.templates make: *** [clean] Error 134 I get a different error every time! -- Brian May <brian@microcomaustralia.com.au>

Danny Robson

3:54 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On Wed, 2 Jul 2014 12:34:48 +1000 "Noah O'Donoghue" <noah.odonoghue@gmail.com> wrote:

...

Also, if I am going to error check memory then why stop at the file server? It means I have to have ECC memory in all clients that touch the data, including mobile devices, to cater for data corruption in RAM being written to disk.

The file server is a good halfway measure given it's the biggest single point of failure. If a client experiences memory failure then only their data is corrupted. If the server experiences memory failure then all the clients have potential issues. I imagine the point at which ECC memory starts making sense depends entirely on your workload and the number of clients.

Noah O'Donoghue

4:14 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

In any case I'm limited to non-ECC ram by the form factor of my bookshelf.. I wonder if better hardware tests would be an area worth looking into, for example, monthly online memory/CPU tests, etc? I wonder also if there are deterministic tests we can proactively do to catch corruptions at a higher level, for example scanning any file type that includes a checksum eg .zip for corruptions and comparing to previous runs. It seems the problem is uncaught hardware failure. If we minimize the window the failure is unknown then we increase the change of being able to compare the source to backups and recover information. On 2 July 2014 13:54, Danny Robson <danny@nerdcruft.net> wrote:

...

On Wed, 2 Jul 2014 12:34:48 +1000 "Noah O'Donoghue" <noah.odonoghue@gmail.com> wrote:

...
Also, if I am going to error check memory then why stop at the file server? It means I have to have ECC memory in all clients that touch the data, including mobile devices, to cater for data corruption in RAM being written to disk.

The file server is a good halfway measure given it's the biggest single point of failure. If a client experiences memory failure then only their data is corrupted. If the server experiences memory failure then all the clients have potential issues.

I imagine the point at which ECC memory starts making sense depends entirely on your workload and the number of clients. _______________________________________________ luv-main mailing list luv-main@luv.asn.au http://lists.luv.asn.au/listinfo/luv-main

Toby Corkindale

3 Jul 3 Jul

12:54 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On 2 July 2014 14:14, Noah O'Donoghue <noah.odonoghue@gmail.com> wrote:

...

In any case I'm limited to non-ECC ram by the form factor of my bookshelf..

HP MicroServers are tiny and support ECC RAM.

Russell Coker

2 Jul 2 Jul

9:59 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On Wed, 2 Jul 2014 12:34:48 Noah O'Donoghue wrote:

...

On 1 July 2014 12:29, Peter Ross <Petros.Listig@fdrive.com.au> wrote:

...
The development seems to be independent from +Sun/Oracle these days. I am not aware of active contributions from Oracle but I am not 100% sure.

I think this rules out ZFS for me... The main thing in ZFS's favor was to have it backed by sun, but if it's been forked and is going off in it's own direction then that kinda puts it on equal footing with btrfs from my perspective.

I disagree. Working code doesn't suddenly stop working when support disappears. The current ZFS code will continue working just as well as it currently does for the forseeable future, and it's currently working better than any other filesystem by most objective measures. ZFS doesn't seem suitable for a Linux root filesystem but that doesn't have much impact on it's utility.

...

To address Russell's comment on ECC RAM, I think I'm going to take the position that it's probably taking it a bit too far, at least until I see some research on non-ECC memory causing bit-rot on checksummed file systems. I tend to think faulty ram is going to become obvious and not hide beneath the surface, and result in symptoms like the kernel panics that Russell experienced.

Memory errors can cause corruption anywhere. While it can cause checksum failures or metadata inconsistency (which is what I've seen) it can also cause data corruption.

...

Also, if I am going to error check memory then why stop at the file server? It means I have to have ECC memory in all clients that touch the data, including mobile devices, to cater for data corruption in RAM being written to disk.

For a memory corruption to corrupt stored data it has to miss corrupting anything that will cause a system crash or application SEGV. While it is possible for a memory corruption to affect kernel data structures and make an application write to the wrong file it would be less likely to corrupt kernel data structures and not crash something. The most likely case when a client has memory corruption is that it will only affect files that you are deliberately writing to. For example while reading mail via IMAP it's conceivable that an error might cause the deletion of a recent message you wanted to keep (just an index error on which message to delete). But it's very unlikely that your archive of mail from last year will be corrupted. Filesystem corruption could affect entire sub-trees. On Wed, 2 Jul 2014 13:08:42 Brian May wrote:

...

I have seen at least one computer where memory errors were resulting in silent corruption of files. This going to be the new file server, but fortunately I noticed random seg faults occurring before deploying. Didn't get any kernel panics. In fact I already had Samba up and running, and it seemed fine. Didn't initially realize files were silently being corrupted until later on in the debugging process (from memory).

A couple of years ago I was given a bunch of old AMD64 computers for free. I installed Linux on a PentiumD system from that batch and I got lots of SEGVs from applications for no good reason (EG "gzip < /dev/urandom | gzip -d | gzip | gzip -d" would get a SEGV fairly quickly). Then I ran debsums and discovered that about 1% of files installed had checksum mismatches (real errors verified by putting the disk in another PC). That was a fairly extreme case and I'm sure that there are lots of other systems with similar errors that occur less frequently. As an aside the RAM from that system worked perfectly in another system, so it would be a CPU or motherboard problem. But it does show that electronic problems can cause data loss. On Wed, 2 Jul 2014 14:14:29 Noah O'Donoghue wrote:

...

In any case I'm limited to non-ECC ram by the form factor of my bookshelf..

There's not much you can do about that right now. But small servers with ECC RAM that would fit your shelf should appear soon enough.

...

I wonder if better hardware tests would be an area worth looking into, for example, monthly online memory/CPU tests, etc?

Debian has a package named "memtester" that might suit your requirements in that regard. One problem with it is that memory errors aren't always random, so an error that happens to always hit a bit of RAM that contains kernel buffers wouldn't be found.

...

I wonder also if there are deterministic tests we can proactively do to catch corruptions at a higher level, for example scanning any file type that includes a checksum eg .zip for corruptions and comparing to previous runs.

You could do that. You could have a list of checksums of your files and verify them, maybe like tripwire. But BTRFS and ZFS do enough checks internally for this.

...

It seems the problem is uncaught hardware failure. If we minimize the window the failure is unknown then we increase the change of being able to compare the source to backups and recover information.

True. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Tony Crisp

11:01 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On 02/07/14 12:34, Noah O'Donoghue wrote:

...

On 1 July 2014 12:29, Peter Ross <Petros.Listig@fdrive.com.au <mailto:Petros.Listig@fdrive.com.au>> wrote:

The development seems to be independent from +Sun/Oracle these days. I am not aware of active contributions from Oracle but I am not 100% sure.

I think this rules out ZFS for me... The main thing in ZFS's favor was to have it backed by sun, but if it's been forked and is going off in it's own direction then that kinda puts it on equal footing with btrfs from my perspective.

FWIW, XFS over LVM might be another alternative. RHEL7 appear to be going down that path: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm... It does not do snapshots directly, leaving that job for the volume manager. https://en.wikipedia.org/wiki/XFS#Snapshots (I haven't personally used XFS yet (or ZFS or Btrfs for that matter. IKR, I haven't lived! ;]) Crispy.

Russell Coker

11:13 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On Wed, 2 Jul 2014 21:01:48 Tony Crisp wrote:

...

FWIW, XFS over LVM might be another alternative. RHEL7 appear to be going down that path:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ht ml/7.0_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.0_Release_Notes-File_Sy stems.html

It does not do snapshots directly, leaving that job for the volume manager.

https://en.wikipedia.org/wiki/XFS#Snapshots

http://en.wikipedia.org/wiki/Comparison_of_file_systems XFS is listed as having "partial" checksums in the above URL, from memory it does checksums on some metadata. It won't deal with the "bitrot" that started this discussion. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Tony Crisp

11:02 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On 02/07/14 12:34, Noah O'Donoghue wrote:

...

On 1 July 2014 12:29, Peter Ross <Petros.Listig@fdrive.com.au <mailto:Petros.Listig@fdrive.com.au>> wrote:

The development seems to be independent from +Sun/Oracle these days. I am not aware of active contributions from Oracle but I am not 100% sure.

I think this rules out ZFS for me... The main thing in ZFS's favor was to have it backed by sun, but if it's been forked and is going off in it's own direction then that kinda puts it on equal footing with btrfs from my perspective.

Chris Samuel

6 Jul 6 Jul

12:12 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

On Wed, 2 Jul 2014 12:34:48 PM Noah O'Donoghue wrote:

...

I think this rules out ZFS for me... The main thing in ZFS's favor was to have it backed by sun, but if it's been forked and is going off in it's own direction then that kinda puts it on equal footing with btrfs from my perspective.

On the other hand btrfs isn't (yet) the foundation for a 55 PB Lustre filesystem for the #3 (former #1) supercomputer on the Top500 list. :-) http://cdn.opensfs.org/wp-content/uploads/2013/04/Morrone_Sequoia_LUG2013.pd... All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Rohan McLeod

1:22 a.m.

New subject: btrfs/ZFS, sans raid and bitrot

Chris Samuel wrote:

...

................snip On the other hand btrfs isn't (yet) the foundation for a 55 PB Lustre filesystem for the #3 (former #1) supercomputer on the Top500 list. :-)

http://cdn.opensfs.org/wp-content/uploads/2013/04/Morrone_Sequoia_LUG2013.pd...

and I did like that throwa-way line: "The designers of ZFS famously claimed that flipping every bit in a maximum-sized zpool would "require enough energy to boil every ocean on the planet." http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows... regards Rohan McLeod

4118

Age (days ago)

4123

Last active (days ago)

List overview

Download

11 comments

9 participants

participants (9)

Brian May
Chris Samuel
Danny Robson
Noah O'Donoghue
Peter Ross
Rohan McLeod
Russell Coker
Toby Corkindale
Tony Crisp