
On Sun, Oct 14, 2012 at 09:01:49PM -0700, Daniel Pittman wrote:
On Sun, Oct 14, 2012 at 7:25 PM, Russell Coker <russell@coker.com.au> wrote:
I'm looking at converting some Xen servers to ZFS. This includes a couple of servers for a reasonable size mail store (8,000,000 files and 600G of Maildir storage).
For much of the Xen on ZFS stuff I'll just use zvols for block devices and then use regular Linux filesystems such as Ext3 inside them. This isn't particularly efficient but for most DomUs it doesn't matter at all. Most of the DomUs have little disk access as they don't do much writing and have enough cache to cover most reads.
For the mail spool a zvol would be a bad idea, fsck on a 400G Ext3/4 filesystem is a bad thing and having the double filesystem overhead of Ext3/4 on top of a zvol is going to suck for the most disk intensive filesystem.
zvol is more like an LVM logical volume than a filesystem, so the overhead isn't nearly as much as this comment suggests.
yep.
That said, running ext3 (especially) or ext4 on top of it is going to be slower, and means you can't use the RAID style features of ZFS, and you give up object level checksums.
that's not exactly true - the guest won't know anything about the ZFS features, but the ZFS file-server certainly will....the zvol will be a chunk of allocated space from one of the zpools on the system. it can optionally be sparse-allocated (for thin-provisioning, greatly reduces space used, but performance can suffer). The zvol has all the benefits of the zfs pool, including snapshotting and cloning, COW, error checking and recovery, SSD read and write caching. The zvol can be backed up (or moved to another ZFS server) with 'zfs send' & 'zfs receive'. It can also be exported as an iscsi volume (e.g. so that a remote virtualisation cpu node can access the volume storage on the zfs file server). cloning is particularly useful for VMs - in short, set up a 'template' VM image, clean it up (e.g. run 'apt-get clean', delete /etc/udev/rules.d/70-persistent-net.rules, and so on), snapshot it, and then clone the snapshot whenever you need a new VM. you could even, for example, build a squeeze 6.0 VM template, snapshot it, then later boot it up and upgrade to 6.01, 6.02, ..., 6.06, and have a cleaned up snapshot of each point-release, any of which could be cloned into a new VM at any time.
From the guest VM's point-of-view, it's just a disk with nothing special about it.
ext3 or ext4 performance in the guest will be similar to performance if the guest were given an LVM lv. I haven't done any benchmarking to compare zvol with lv (mostly because and I can't afford to add 4 drives to my ZFS server just to test LVM lv vs ZFS zvol performance), but I can give a subjective anecdote that the performance improvement from using a ZFS zvol instead of a qcow2 disk image is about the same as using an LVM lv instead of a qcow2 file. i.e. *much* faster. if i had to guess, i'd say that there are probably some cases where LVM (with its nearly direct raw access to the underlying disks) would be faster than ZFS zvols but in most cases, ZFS' caching, compression, COW and so on would give the performance advantage to ZFS. ZFS's other advantages, especially lightweight and unlimited snapshots, make it worth using over LVM anyway. FYI, here are the details on one of several zvols of various sizes that I have on my home ZFS server. They're all used by KVM virtual machines. # zfs get all export/sid NAME PROPERTY VALUE SOURCE export/sid type volume - export/sid creation Sun Mar 25 14:19 2012 - export/sid used 5.16G - export/sid available 694G - export/sid referenced 1.91G - export/sid compressratio 1.69x - export/sid reservation none default export/sid volsize 5G local export/sid volblocksize 8K - export/sid checksum on default export/sid compression on inherited from export export/sid readonly off default export/sid copies 1 default export/sid refreservation 5.16G local export/sid primarycache all default export/sid secondarycache all default export/sid usedbysnapshots 0 - export/sid usedbydataset 1.91G - export/sid usedbychildren 0 - export/sid usedbyrefreservation 3.25G - export/sid logbias latency default export/sid dedup off default export/sid mlslabel none default export/sid sync standard default export/sid refcompressratio 1.69x - export/sid written 1.91G - Note that this zvol has compression enabled - this would be a good choice for a mail server's storage disk - mail is highly compressible. depending on available RAM in the server and the kind of mail typically received (e.g. multiple copies of the same email), de-duping the zvol may also be worthwhile.
Any suggestions?
I would aim to run ZFS in the mail domU, and treat the zvol as a "logical volume" block device. You will have some overhead from the double checksums, but robust performance. It treats the underlying dom0 ZFS as a fancy LVM, essentially. You probably also need to allocate substantially more memory to the domU than you would otherwise.
That's really not needed. Most VMs just need fast, reliable storage, and know or care exactly what the underlyingstorage is (nor should they have to) - it's abstracted away as a virtio disk, /dev/vda or /dev/vdb or as an iscsi disk. There may be some exceptions where the VM needs to run ZFS itself on a bunch of zvols, but the only real use-case i've found is for experimenting with and testing zfs itself (e.g. i've created numerous zvols of a few hundred MB each and used them in a VM to create a zpool from them) being able to snapshot and zfs send within the VM itself could be useful. OTOH rsync provides a similar incremental backup. craig -- craig sanders <cas@taz.net.au>