Re: gpt and grub

12 Apr 2013

      On Thu, Apr 11, 2013 at 11:43:12PM +1000, Russell Coker wrote:
...
I've got a BTRFS filesystem that was corrupted by a RAM error (I
discarded a DIMM after doing all the relevant Memtest86+ tests).
Currently I have been unable to get btrfsck to work on it and make it
usable again.  But at least I know the data was corrupted which is
better than having the system keep going and make things worse.
yeah, well, btrfs is currently buggy. it's the main reason I use zfs
instead of btrfs (if it was just the incomplete feature set compared to
zfs, i probably wouldn't have bothered). i have no doubt that btrfs will
eventually get to a safely usable state, and I hear that it's getting
close....but i'm already committed to ZFS on my current machines/drives.

i've read that the bugs which caused me to abandon btrfs and switch to
zfs have been fixed, but i just don't have any compelling reason to go
back right now.
...
...
...
Putting the error correction/detection in the filesystem bothers
me.  Putting it at the block device level would benefit a lot more
infrastructure - LVM volumes for VM's, swap partitions, etc.
having used ZFS for quite some time now, it makes perfect sense to
me for it to be in the filesystem layer rather in the block level -
it's the file system that knows about the data, what/where it is,
and whether it's in use or not (so, faster scrubs - only need to
check blocks in use rather than all blocks).
http://etbe.coker.com.au/2012/04/27/btrfs-zfs-layering-violations/
excellent post.  thanks for the reminder about it.
...
There are real benefits to having separate layers, I've written about
this at the above URL.
yep, there are.

personally, i think that the practical advantages of integrating the
layers (as btrfs and zfs do) more than outweighs the disadvantages. in
particular, the reason why RAID-Z is so much better than mdadm RAID
(which is, in turn, IMO much better than most hardware RAID) is that the
"raid" knows about the filesystem and the data allowing ZFS to fix data
corruption as it discovers it (you lose this ability of ZFS if you give
it a raid array to work with rather than JBOD)

there's also the usability benefits of the btrfs and zfs tools - using
them is far simpler and far less hassle than using mdadm and lvm. for
many people, this will be reason enough in itself to use btrfs or zfs as
the complexity of mdadm and LVM is a significant barrier to entry.
...
But there are also significant benefits to doing things in the way
that BTRFS and ZFS do it and it seems that no-one is interested in
developing any other way of doing it (EG a version of Linux Software
RAID that does something like RAID-Z).
probably because development effort in that direction is going into
btrfs and zfs, and it's hard to see any good reason to re-implement
parts of zfs or btrfs in mdadm - it would be just a tick-a-box-feature
without the practical benefits offered by btrfs and zfs.

IMO with btrfs gaining raid5/6-like support, there'll be even less
reason to use mdadm (once the initial bugs have been shaken out), even
for people who don't want to use out-of-tree code like zfsonlinux.

My guess is that within a few years btrfs will be the mainstream default
choice (possibly with ZFS being the second most common option), and
technologies like mdadm, LVM and "old-fashioned" filesystems like
ext2/3/4 and XFS etc will be considered obsolete, existing mostly on
legacy systems (and on VMs running on block devices exported from zfs
or btrfs servers). even laptops with single small drives will commonly
use btrfs because of its snapshotting and btrfs send/receive for backups
(same concept as zfs send/receive).

both btrfs and zfs offer enough really compelling advantages over older
filesystems that I see this as inevitable (and a Good Thing).
...
Also if you use ZVOLs then ZFS can be considered to be a LVM
replacement with error checking (as Craig already noted).
(and, finally, we get to the bit that motivated me to reply)

it can also be considered an LVM replacement even if you don't use ZVOLs.

while there are other uses for them, ZVOLs are mostly of interest to
people who run kvm or xen or similar, or want to use iscsi rather
than NFS or Samba to export a chunk of storage space for use by other
systems.

One of the common uses for LVM is to divide up a volume group (VG)
into logical volumes (LV) to be formatted and mounted as particular
directories - e.g. one for /, /home, /var, /usr or whatever.

With LVM you have to decide in advance how much of each VG is going to
be dedicated to each LV you create.  LVs *can* be resized and (depending
on the filesystem it's formatted with) the fs can be grown to match
the new size (e.g. with xfs_growfs or resize2fs), but the procedure is
moderately complicated and can't be done while the fs is mounted and in
use.  Practically, you can increase the size of an LV but shrinking it
is best done by backup, delete the LV, recreate LV and restore.

With ZFS, the analagous concept is a sub-volume or a filesystem.  You
can create and change a filesystem at any time, you can resize it while
it is in use (including shrinking it to any size >= currently used
space).  In fact, you don't even have to set a quota or a reservation on
it if you don't want to - its size will be limited by the total size of
the pool (as shared with all other sub-volumes)

(FYI a quota sets a limit on the filesystem's maximum size but does
not reserve space for that fs. a reservation is guaranteeing that
space in the pool WILL be available / reserved for that filesystem:
http://docs.oracle.com/cd/E23823_01/html/819-5461/gazvb.html)

e.g. if i have a zpool called "tank" and want to create a filesystem
(aka sub-volume) to be mounted as /home with a quota of 100G and
compression enabled:

    zfs create tank/home
    zfs set quota=100G tank/home
    zfs set compression=on tank/home
    zfs set mountpoint=/home tank/home

if i start running out of space in my 100G /home, it is trivial to
change the quota:

    zfs set quota=200G tank/home

i don't need to unmount it (a PITA if i have open files on it, as is
extremely likely with /home) or run xfs_growfs on it or do anything
else.

from memory, it's just as easy to to the same thing with btrfs.

similarly, if i've reserved way too much space for e.g. /var and
urgently need more space in /home, i can shrink /var's reservation and
increase /home's quota.

back in the bad old days of small disks, allocating too much space for
one partition and not enough for another used to be extremely common and
solving it involved time-consuming and tedious amounts of downtime with
file-system juggling (backup,repartition,format,restore)....which is
pretty much why the idea of "one big root filesystem" took over from the
idea of lots of separate small partitions for /, /home, /var, /usr, and
so on. btrfs and zfs give us back the benefits of separating filesystems
like that but without the drawbacks (LVM did too, but it was much more
difficult to use, so most people didn't unless they had a good reason
to).

BTW, you can also use zfs sub-volumes for container-style virtualisation
(e.g. Solaris Containers, or FreeeBSD Jails, or OpenVZ on Linux, and the
like), and apparently works quite well to save disk space with de-duping
if you have hundreds of very similar VMs (with the caveat that de-duping
takes shitloads of RAM and disk space is much cheaper than RAM.  OTOH
de-duping can offer significant performance benefits due to disk caching
of the duped blocks/files).

craig

-- 
craig sanders <cas@taz.net.au>

BOFH excuse #431:

Borg implants are failing

Re: gpt and grub

Craig Sanders