Re: ZFS vs RAID (was gpt and grub)

12 Apr 2013

      On Fri, Apr 12, 2013 at 03:31:20PM +1000, Kevin wrote:
...
On Fri, Apr 12, 2013 at 3:17 PM, James Harper <james.harper@bendigoit.com.au> wrote:
...
This is where a lot of people get this wrong. Once the BIOS has succeeded
in reading the bootsector from a boot disk it's committed. If the
bootsector reads okay (even after a long time on a failing disk) but
anything between the bootsector and the OS fails, your boot has failed.
This 'anything between' includes the grub bootstrap, xen hypervisor, linux
kernel, and initramfs, so it's a substantial amount of data to read from a
disk that may be on its last legs. A good hardware RAID will have long
since failed the disk by this point and booting will succeed.
i think we're talking about different things here. if you can tell the
BIOS "don't boot from sda, boot from sdb instead" then it really doesn't
matter how messed up sda is, the system's not going to use it, it's
going to boot from sdb like you told it to.
...
...
My last remaining reservation on going ahead with some testing is is
there an equivalent of clvm for zfs? Or is that the right approach for zfs?
My main server cluster is:
2 machines each running 2 x 2TB disks with DRBD with the primary exporting the whole disk as an iSCSI volume
2 machines each importing the iSCSI volume running lvm (clvm) on top, and using the lv's as backing stores for xen VM's.
interesting. that's kind of the opposite of how google's ganeti works,
where each node exports small LVs which are combined with DRBD on the
host that actually runs the VM to provide a "disk" for a particular VM.

ganeti is scalable to multiple machines (up to 40 according to the docs)
as a VM's DRBD volume can be constructed from LVs exported by any two
machines, but this sounds like it's limited to two and only two machines
as the storage servers.

(in theory you could make ganeti work with ZFS, ZVOLs and iscsi but i
don't think anyone's actually done it)
...
...
How would this best be done using zfs?
short answer: zfs doesn't do that.

in theory you could export each disk individually with iscsi and build
ZFS pools (two mirrored pools). if that actually worked, you'd have to
do a lot of manual stuffing around to make sure that the pools were only
in use on one machine at a time, and more drudgery to handle fail-over
events. seems like a fragile PITA and not worth the bother, even if it
could be made to work.

i can think of a few other ugly kludgy ways you could emulate something
like clvm (like iscsi export ZVOLs from each server and combine with
drbd) but they would just be shoe-horning the wrong technology into a
particular model. better to look around for other alternatives actually
designed to do the job.
...
If i was building new infrastructure today with 2 or more machines
hosting VMs i would probably look at using CEPH as the storage layer
for the Virtual machines. this would provide distributed mirrored
storage that is accessible from all machines. all machines could then
be storage and VM hosts.
ref:
http://www.slideshare.net/xen_com_mgr/block-storage-for-vms-with-ceph
http://ceph.com/
i'd agree with this - CEPH is cool.

in fact, i'd also be inclined to use CEPH as the object store with
Openstack instead of Swift - ceph's object store does everything that
swift does and also offers the distributed block storage layer on top of
that - thus avoiding the need for QCOW2 over NFS (yuk!) or a dedicated
netapp server or similar for shared VM images.

(apparently ceph's distributed filesystem layer isn't ready for
production use yet but the object store and block storage are)

craig

-- 
craig sanders <cas@taz.net.au>

Re: ZFS vs RAID (was gpt and grub)

Craig Sanders