
On Thu, Apr 11, 2013 at 11:43:12PM +1000, Russell Coker wrote:
I've got a BTRFS filesystem that was corrupted by a RAM error (I discarded a DIMM after doing all the relevant Memtest86+ tests). Currently I have been unable to get btrfsck to work on it and make it usable again. But at least I know the data was corrupted which is better than having the system keep going and make things worse.
yeah, well, btrfs is currently buggy. it's the main reason I use zfs instead of btrfs (if it was just the incomplete feature set compared to zfs, i probably wouldn't have bothered). i have no doubt that btrfs will eventually get to a safely usable state, and I hear that it's getting close....but i'm already committed to ZFS on my current machines/drives. i've read that the bugs which caused me to abandon btrfs and switch to zfs have been fixed, but i just don't have any compelling reason to go back right now.
Putting the error correction/detection in the filesystem bothers me. Putting it at the block device level would benefit a lot more infrastructure - LVM volumes for VM's, swap partitions, etc.
having used ZFS for quite some time now, it makes perfect sense to me for it to be in the filesystem layer rather in the block level - it's the file system that knows about the data, what/where it is, and whether it's in use or not (so, faster scrubs - only need to check blocks in use rather than all blocks).
http://etbe.coker.com.au/2012/04/27/btrfs-zfs-layering-violations/
excellent post. thanks for the reminder about it.
There are real benefits to having separate layers, I've written about this at the above URL.
yep, there are. personally, i think that the practical advantages of integrating the layers (as btrfs and zfs do) more than outweighs the disadvantages. in particular, the reason why RAID-Z is so much better than mdadm RAID (which is, in turn, IMO much better than most hardware RAID) is that the "raid" knows about the filesystem and the data allowing ZFS to fix data corruption as it discovers it (you lose this ability of ZFS if you give it a raid array to work with rather than JBOD) there's also the usability benefits of the btrfs and zfs tools - using them is far simpler and far less hassle than using mdadm and lvm. for many people, this will be reason enough in itself to use btrfs or zfs as the complexity of mdadm and LVM is a significant barrier to entry.
But there are also significant benefits to doing things in the way that BTRFS and ZFS do it and it seems that no-one is interested in developing any other way of doing it (EG a version of Linux Software RAID that does something like RAID-Z).
probably because development effort in that direction is going into btrfs and zfs, and it's hard to see any good reason to re-implement parts of zfs or btrfs in mdadm - it would be just a tick-a-box-feature without the practical benefits offered by btrfs and zfs. IMO with btrfs gaining raid5/6-like support, there'll be even less reason to use mdadm (once the initial bugs have been shaken out), even for people who don't want to use out-of-tree code like zfsonlinux. My guess is that within a few years btrfs will be the mainstream default choice (possibly with ZFS being the second most common option), and technologies like mdadm, LVM and "old-fashioned" filesystems like ext2/3/4 and XFS etc will be considered obsolete, existing mostly on legacy systems (and on VMs running on block devices exported from zfs or btrfs servers). even laptops with single small drives will commonly use btrfs because of its snapshotting and btrfs send/receive for backups (same concept as zfs send/receive). both btrfs and zfs offer enough really compelling advantages over older filesystems that I see this as inevitable (and a Good Thing).
Also if you use ZVOLs then ZFS can be considered to be a LVM replacement with error checking (as Craig already noted).
(and, finally, we get to the bit that motivated me to reply) it can also be considered an LVM replacement even if you don't use ZVOLs. while there are other uses for them, ZVOLs are mostly of interest to people who run kvm or xen or similar, or want to use iscsi rather than NFS or Samba to export a chunk of storage space for use by other systems. One of the common uses for LVM is to divide up a volume group (VG) into logical volumes (LV) to be formatted and mounted as particular directories - e.g. one for /, /home, /var, /usr or whatever. With LVM you have to decide in advance how much of each VG is going to be dedicated to each LV you create. LVs *can* be resized and (depending on the filesystem it's formatted with) the fs can be grown to match the new size (e.g. with xfs_growfs or resize2fs), but the procedure is moderately complicated and can't be done while the fs is mounted and in use. Practically, you can increase the size of an LV but shrinking it is best done by backup, delete the LV, recreate LV and restore. With ZFS, the analagous concept is a sub-volume or a filesystem. You can create and change a filesystem at any time, you can resize it while it is in use (including shrinking it to any size >= currently used space). In fact, you don't even have to set a quota or a reservation on it if you don't want to - its size will be limited by the total size of the pool (as shared with all other sub-volumes) (FYI a quota sets a limit on the filesystem's maximum size but does not reserve space for that fs. a reservation is guaranteeing that space in the pool WILL be available / reserved for that filesystem: http://docs.oracle.com/cd/E23823_01/html/819-5461/gazvb.html) e.g. if i have a zpool called "tank" and want to create a filesystem (aka sub-volume) to be mounted as /home with a quota of 100G and compression enabled: zfs create tank/home zfs set quota=100G tank/home zfs set compression=on tank/home zfs set mountpoint=/home tank/home if i start running out of space in my 100G /home, it is trivial to change the quota: zfs set quota=200G tank/home i don't need to unmount it (a PITA if i have open files on it, as is extremely likely with /home) or run xfs_growfs on it or do anything else. from memory, it's just as easy to to the same thing with btrfs. similarly, if i've reserved way too much space for e.g. /var and urgently need more space in /home, i can shrink /var's reservation and increase /home's quota. back in the bad old days of small disks, allocating too much space for one partition and not enough for another used to be extremely common and solving it involved time-consuming and tedious amounts of downtime with file-system juggling (backup,repartition,format,restore)....which is pretty much why the idea of "one big root filesystem" took over from the idea of lots of separate small partitions for /, /home, /var, /usr, and so on. btrfs and zfs give us back the benefits of separating filesystems like that but without the drawbacks (LVM did too, but it was much more difficult to use, so most people didn't unless they had a good reason to). BTW, you can also use zfs sub-volumes for container-style virtualisation (e.g. Solaris Containers, or FreeeBSD Jails, or OpenVZ on Linux, and the like), and apparently works quite well to save disk space with de-duping if you have hundreds of very similar VMs (with the caveat that de-duping takes shitloads of RAM and disk space is much cheaper than RAM. OTOH de-duping can offer significant performance benefits due to disk caching of the duped blocks/files). craig -- craig sanders <cas@taz.net.au> BOFH excuse #431: Borg implants are failing