
On Fri, Apr 12, 2013 at 03:31:20PM +1000, Kevin wrote:
On Fri, Apr 12, 2013 at 3:17 PM, James Harper <james.harper@bendigoit.com.au> wrote:
This is where a lot of people get this wrong. Once the BIOS has succeeded in reading the bootsector from a boot disk it's committed. If the bootsector reads okay (even after a long time on a failing disk) but anything between the bootsector and the OS fails, your boot has failed. This 'anything between' includes the grub bootstrap, xen hypervisor, linux kernel, and initramfs, so it's a substantial amount of data to read from a disk that may be on its last legs. A good hardware RAID will have long since failed the disk by this point and booting will succeed.
i think we're talking about different things here. if you can tell the BIOS "don't boot from sda, boot from sdb instead" then it really doesn't matter how messed up sda is, the system's not going to use it, it's going to boot from sdb like you told it to.
My last remaining reservation on going ahead with some testing is is there an equivalent of clvm for zfs? Or is that the right approach for zfs? My main server cluster is:
2 machines each running 2 x 2TB disks with DRBD with the primary exporting the whole disk as an iSCSI volume 2 machines each importing the iSCSI volume running lvm (clvm) on top, and using the lv's as backing stores for xen VM's.
interesting. that's kind of the opposite of how google's ganeti works, where each node exports small LVs which are combined with DRBD on the host that actually runs the VM to provide a "disk" for a particular VM. ganeti is scalable to multiple machines (up to 40 according to the docs) as a VM's DRBD volume can be constructed from LVs exported by any two machines, but this sounds like it's limited to two and only two machines as the storage servers. (in theory you could make ganeti work with ZFS, ZVOLs and iscsi but i don't think anyone's actually done it)
How would this best be done using zfs?
short answer: zfs doesn't do that. in theory you could export each disk individually with iscsi and build ZFS pools (two mirrored pools). if that actually worked, you'd have to do a lot of manual stuffing around to make sure that the pools were only in use on one machine at a time, and more drudgery to handle fail-over events. seems like a fragile PITA and not worth the bother, even if it could be made to work. i can think of a few other ugly kludgy ways you could emulate something like clvm (like iscsi export ZVOLs from each server and combine with drbd) but they would just be shoe-horning the wrong technology into a particular model. better to look around for other alternatives actually designed to do the job.
If i was building new infrastructure today with 2 or more machines hosting VMs i would probably look at using CEPH as the storage layer for the Virtual machines. this would provide distributed mirrored storage that is accessible from all machines. all machines could then be storage and VM hosts.
ref: http://www.slideshare.net/xen_com_mgr/block-storage-for-vms-with-ceph http://ceph.com/
i'd agree with this - CEPH is cool. in fact, i'd also be inclined to use CEPH as the object store with Openstack instead of Swift - ceph's object store does everything that swift does and also offers the distributed block storage layer on top of that - thus avoiding the need for QCOW2 over NFS (yuk!) or a dedicated netapp server or similar for shared VM images. (apparently ceph's distributed filesystem layer isn't ready for production use yet but the object store and block storage are) craig -- craig sanders <cas@taz.net.au>