Re: cache, Xen, and zfsonlinux

On Mon, 15 Oct 2012, Daniel Pittman <daniel@rimspace.net> wrote:
For the mail spool a zvol would be a bad idea, fsck on a 400G Ext3/4 filesystem is a bad thing and having the double filesystem overhead of Ext3/4 on top of a zvol is going to suck for the most disk intensive filesystem.
zvol is more like an LVM logical volume than a filesystem, so the overhead isn't nearly as much as this comment suggests. That said, running ext3 (especially) or ext4 on top of it is going to be slower, and means you can't use the RAID style features of ZFS, and you give up object level checksums.
I had the impression that a zvol gets all the consistency benefits of ZFS but for a block device. ZFS on top of a zvol may not be a good idea as ZFS is fairly heavy, BTRFS on top of a zvol might be an option though.
Any suggestions?
I would aim to run ZFS in the mail domU, and treat the zvol as a "logical volume" block device. You will have some overhead from the double checksums, but robust performance. It treats the underlying dom0 ZFS as a fancy LVM, essentially. You probably also need to allocate substantially more memory to the domU than you would otherwise.
Yes, lots more RAM in both Dom0 and DomU. The problem is that I have servers with 16G of RAM that I'd prefer not to upgrade.
My fallback would be NFS shared ZFS in the domU - much cheaper because you only have one ARC, set of checksums, etc, to manage, but with the added bonus of NFS between the domU and dom0. Fun times.
Are you sure that LXC or OpenVZ wouldn't better fit your needs than Xen? You trade off marginally less isolation between containers for the simplicity of having a single kernel image - so native ZFS performance.
That's a possibility. What is the support for them like in Debian/Wheezy? On Mon, 15 Oct 2012, Jason White <jason@jasonjgw.net> wrote:
XFS doesn't have the fsck problem, but it isn't optimized for large numbers of small files, as I recall. I can't comment on reliability/performance. I don't know much about JFS either - XFS seems to be receiving more development attention from Red Hat and elsewhere at the moment. I don't think Reiser 3 is seeing much work anymore either.
XFS has had FSCK problems in the past. I think that JFS is as good as dead. Even before Hans was arrested there were some issues with ReiserFS such as the fact that hostile data written to a file by a user could cause a corruption at FSCK time. If you use ReiserFS for any server that allows binary data to be stored by a user then you risk security problems. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker <russell@coker.com.au> wrote:
XFS has had FSCK problems in the past. I think that JFS is as good as dead.
JFS seems mostly to have been regarded as a compatibility option for IBM systems anyway.
Even before Hans was arrested there were some issues with ReiserFS such as the fact that hostile data written to a file by a user could cause a corruption at FSCK time. If you use ReiserFS for any server that allows binary data to be stored by a user then you risk security problems.
And SUSE used it as the default file system for a while, but I would be very surprised if that's still so.

Russell Coker <russell@coker.com.au> writes:
My fallback would be NFS shared ZFS in the domU - much cheaper because you only have one ARC, set of checksums, etc, to manage, but with the added bonus of NFS between the domU and dom0. Fun times.
Are you sure that LXC or OpenVZ wouldn't better fit your needs than Xen? You trade off marginally less isolation between containers for the simplicity of having a single kernel image - so native ZFS performance.
That's a possibility. What is the support for them like in Debian/Wheezy?
I am currently running LXC on 2.6.32, to separate services (e.g. apache, postfix/dovecot, nsd3), on consolidated hardware. IME LXC as at 2.6.32 was inadequate for this task -- in particular, per-container resource allocation and encapsulation (e.g. of /sys) was not ready. Wheezy will be 3.2. I have not done any significant work on LXC on a post-2.6.32 system, but AFAICT it has vastly improved, especially in the areas that annoyed me. IMO LXC on 3.2 is definitely worth at least considering. Important things to note: - the "lxc" and "libvirtd" packages provide COMPETING implementations of the LXC userland tools; you only need one. The former had much more functionality last time I looked; the latter might be useful if you need to give partial privileges to other users (e.g. to manage their own containers). - by default LXC containers are not very secure; if you care about security it is definitely worth spending time dropping as many pcaps as you can (esp. CAP_SYS_ADMIN and mount privileges), and limiting what resources you meaningfully can. This *will* break containers that are stock debootstrap, and you will need to fiddle with init scripts increasingly as you lock it down. http://cyber.com.au/~twb/snarf/lxc-create may be helpful there. - obviously since the kernel is shared, and you (probably) drop modprobe privs within the containers, stuff like iptables-restore can't implicitly load new modules, they have to be modprobed in advance in the host OS. Likewise exposing hotplug equipment to the container is extremely nontrivial. - OTOH, you can choose which parts to containerize, e.g. you can have a separate filesystem but the same network interfaces, or vice-versa. I'm not sure how useful this is in practice. Regarding OpenVZ, AFAIK it was dropped by both Debian and Ubuntu years ago, so the only reason I can think of to run it if you're already invested and LXC doesn't yet do what you need.

On 16/10/12 12:49, Trent W. Buck wrote:
Russell Coker <russell@coker.com.au> writes:
My fallback would be NFS shared ZFS in the domU - much cheaper because you only have one ARC, set of checksums, etc, to manage, but with the added bonus of NFS between the domU and dom0. Fun times.
Are you sure that LXC or OpenVZ wouldn't better fit your needs than Xen? You trade off marginally less isolation between containers for the simplicity of having a single kernel image - so native ZFS performance.
That's a possibility. What is the support for them like in Debian/Wheezy?
I am currently running LXC on 2.6.32, to separate services (e.g. apache, postfix/dovecot, nsd3), on consolidated hardware. IME LXC as at 2.6.32 was inadequate for this task -- in particular, per-container resource allocation and encapsulation (e.g. of /sys) was not ready.
Wheezy will be 3.2. I have not done any significant work on LXC on a post-2.6.32 system, but AFAICT it has vastly improved, especially in the areas that annoyed me. IMO LXC on 3.2 is definitely worth at least considering.
Just chipping in with a quick "Yes, LXC is good". I've been running it for a while now, it's been great. As Trent suspected, I can confirm it's improved vastly by the era of 3.2 kernels. You really want an up-to-date lxc userland. A lot of things used to be annoying and fiddly and not-quite-work-right, but seem to have been pretty good for the past year. The defaults for new containers are more sensible. I haven't used the libvirtd userland stuff at all; had a quick peer at it once, and it seemed to be very limited/incomplete.. don't know if it's improved much yet. It would be nice if the other lxc userland had a GUI and easier tools for people to use though. (I don't mind doing it on the command line myself)

Toby Corkindale writes:
I haven't used the libvirtd userland stuff at all; had a quick peer at it once, and it seemed to be very limited/incomplete.. don't know if it's improved much yet. It would be nice if the other lxc userland had a GUI and easier tools for people to use though. (I don't mind doing it on the command line myself)
Last time I looked, virt-manager was still officially stamped "beta quality, do not use", despite everyone using it in production...
participants (4)
-
Jason White
-
Russell Coker
-
Toby Corkindale
-
trentbuck@gmail.com