cache, Xen, and zfsonlinux

I'm looking at converting some Xen servers to ZFS. This includes a couple of servers for a reasonable size mail store (8,000,000 files and 600G of Maildir storage). For much of the Xen on ZFS stuff I'll just use zvols for block devices and then use regular Linux filesystems such as Ext3 inside them. This isn't particularly efficient but for most DomUs it doesn't matter at all. Most of the DomUs have little disk access as they don't do much writing and have enough cache to cover most reads. For the mail spool a zvol would be a bad idea, fsck on a 400G Ext3/4 filesystem is a bad thing and having the double filesystem overhead of Ext3/4 on top of a zvol is going to suck for the most disk intensive filesystem. So it seems that the correct solution is to do one of the following: 1) Run the mail store in the Dom0 which will be good for performance at the cost of management. A server which has direct user access in any form (including POP and IMAP) needs to be running all the latest security patches while a Dom0 can have patches delayed if they don't seem relevant to network issues or virtualisation. 2) Use NFS to mount a ZFS filesystem from the Dom0. This will be good for management but there's the problem of caching. I don't think that NFS caches that aggressively so I'd need to give more RAM to the Dom0 for ZFS caching and I'd probably still lose some read performance. 3) Run ZFS in the DomU separately from the Dom0. This will work well for the DomU as long as there is enough RAM. But having a ZFS filesystem in the Dom0 as well as a separate one in the DomU (which would use different partitions of the same disks) would be difficult (ZFS in the Dom0 will probably want to grab all zpools). Also write performance will take a hit if there are two separate zpools on the same disks as there will be seeks between writes - this will be particularly bad for mail delivery where the message and the log entry will be written to different parts of the disk. 4) Run ZFS for mail storage in the DomU and use something other than ZFS for the Dom0. This has the same performance problems as 3) but without the issue of different ZFS instances fighting about it. Also I would lose the support for hashes on the zvol data, I could use files on BTRFS for similar data integrity (I'm using RAID-1 so RAID-Z isn't an option and therefore the benefits of ZFS over BTRFS are fewer) but that would still give performance issues. Any suggestions? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker <russell@coker.com.au> wrote:
4) Run ZFS for mail storage in the DomU and use something other than ZFS for the Dom0. This has the same performance problems as 3) but without the issue of different ZFS instances fighting about it. Also I would lose the support for hashes on the zvol data, I could use files on BTRFS for similar data integrity (I'm using RAID-1 so RAID-Z isn't an option and therefore the benefits of ZFS over BTRFS are fewer) but that would still give performance issues.
XFS doesn't have the fsck problem, but it isn't optimized for large numbers of small files, as I recall. I can't comment on reliability/performance. I don't know much about JFS either - XFS seems to be receiving more development attention from Red Hat and elsewhere at the moment. I don't think Reiser 3 is seeing much work anymore either.

Russell Coker <russell@coker.com.au> writes:
2) Use NFS to mount a ZFS filesystem from the Dom0. This will be good for management but there's the problem of caching. I don't think that NFS caches that aggressively so I'd need to give more RAM to the Dom0 for ZFS caching and I'd probably still lose some read performance.
$coworker was smoking more crack than usual recently, and he suggested we try p9 instead of NFS. There's a userland server and a in-kernel client. Access control is normal DAC -- p9 server just accesses files as the user it was run as. I assume there's some kind of SSH-style crypto handshake to then allow the client to talk to the server; I didn't look. Anyway, in your case in the dom0 you'd create a mail user, sudo -u mail p9thingo /srv/mail, then have the domU mount it. And probably a firewall to ensure the bound port isn't visible to the rest of the world. If you are crazy enough to try it, do let me know how it turns out :-) Oh, and if you do go NFSv3, don't forget to do the usual things like cranking the read/write blocks up and telling portmapper to assign fixed ports. Of course, maildir is about the worst thing to on NFS, short of perhaps tar x.

Hi Russell, I have a similar problem on FreeBSD where I am using ZFS. Most of the instances are running on jail, the concept similar as the Linux containers. Jails are integrated part of the FreeBSD architecture for years so I trust it security-wise. BTW: "Version 2" (VIMAGE, VNET) is similar as the Crossbar architecture found on Solaris (every jail has its own network stack) Linux containers are not that new either, AFAIK there are webhosting providers offering VPS based on it. It just feels more like an "add-on".. You may use your SE Linux wizardry to increase security if you don't trust it enough. And I have a (commercially licensed) Zimbra server which needs Linux. I am running it in Virtualbox. The disks are files on a "normal" zfs so they profit from snapshoting, zfs send/receive mechanism for off-site backup etc. Inside it is Ubuntu on ext3. It is the mail server for 50 users (and probably 10% of your mentioned mail storage) , and works without problems, as long as the zpool is not short of space (I think I mentioned here the "stand-still" if a 1 TB zpool is going below 50 GB free space). But I don't think it is a really good setup, it is just "good enough" here, and as I have all other stuff running in jails native on FreeBSD, I keep it. (FreeBSD is offering the Linux kernel ABI [with few limitations] so one day I might try to run Ubuntu on it, in a jail). I don't think you win much if you use NFS over ZFS instead. You may increase performance if you use "raw zpool" underneath but then you don't have the "cool stuff" (snapshots, cloning etc.) that wants you to use ZFS in the first place. I could imagine using LVM on Dom0 and giving partitions to the DomUs and running ZFS inside. That way you can snapshot the partitions with LVM outside (to get "disk images") and ZFS management inside. Regards Peter On Mon, 15 Oct 2012, Russell Coker wrote:
I'm looking at converting some Xen servers to ZFS. This includes a couple of servers for a reasonable size mail store (8,000,000 files and 600G of Maildir storage).
For much of the Xen on ZFS stuff I'll just use zvols for block devices and then use regular Linux filesystems such as Ext3 inside them. This isn't particularly efficient but for most DomUs it doesn't matter at all. Most of the DomUs have little disk access as they don't do much writing and have enough cache to cover most reads.
For the mail spool a zvol would be a bad idea, fsck on a 400G Ext3/4 filesystem is a bad thing and having the double filesystem overhead of Ext3/4 on top of a zvol is going to suck for the most disk intensive filesystem.
So it seems that the correct solution is to do one of the following:
1) Run the mail store in the Dom0 which will be good for performance at the cost of management. A server which has direct user access in any form (including POP and IMAP) needs to be running all the latest security patches while a Dom0 can have patches delayed if they don't seem relevant to network issues or virtualisation.
2) Use NFS to mount a ZFS filesystem from the Dom0. This will be good for management but there's the problem of caching. I don't think that NFS caches that aggressively so I'd need to give more RAM to the Dom0 for ZFS caching and I'd probably still lose some read performance.
3) Run ZFS in the DomU separately from the Dom0. This will work well for the DomU as long as there is enough RAM. But having a ZFS filesystem in the Dom0 as well as a separate one in the DomU (which would use different partitions of the same disks) would be difficult (ZFS in the Dom0 will probably want to grab all zpools). Also write performance will take a hit if there are two separate zpools on the same disks as there will be seeks between writes - this will be particularly bad for mail delivery where the message and the log entry will be written to different parts of the disk.
4) Run ZFS for mail storage in the DomU and use something other than ZFS for the Dom0. This has the same performance problems as 3) but without the issue of different ZFS instances fighting about it. Also I would lose the support for hashes on the zvol data, I could use files on BTRFS for similar data integrity (I'm using RAID-1 so RAID-Z isn't an option and therefore the benefits of ZFS over BTRFS are fewer) but that would still give performance issues.
Any suggestions?
-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ _______________________________________________ luv-main mailing list luv-main@luv.asn.au http://lists.luv.asn.au/listinfo/luv-main

On Wed, 17 Oct 2012, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
It just feels more like an "add-on".. You may use your SE Linux wizardry to increase security if you don't trust it enough.
AFAIK none of the LXC type things support different SE Linux policies on a per- jail basis. But in the past I've written policy for chroot environments and could use that. Another option is to use different MCS categories for the various jails. But for the case where I'm running a single set of applications (mail delivery, IMAP, and POP servers) I can just use the same policy. It would only be if I had entirely different mail servers in different chroots that there would be a need for different policies.
The disks are files on a "normal" zfs so they profit from snapshoting, zfs send/receive mechanism for off-site backup etc.
So you can't do that with a zvol?
But I don't think it is a really good setup, it is just "good enough" here, and as I have all other stuff running in jails native on FreeBSD, I keep it.
In what way isn't it "really good"?
I don't think you win much if you use NFS over ZFS instead.
Yes, there are significant issues, but it's a matter of whether the other issues are worse.
You may increase performance if you use "raw zpool" underneath but then you don't have the "cool stuff" (snapshots, cloning etc.) that wants you to use ZFS in the first place.
What is a "raw zpool"? Is that a zvol?
I could imagine using LVM on Dom0 and giving partitions to the DomUs and running ZFS inside.
That means you lose the contiguous write feature of ZFS which is essential to good performance. Ext3/4 on LVM volumes gives somewhat contiguous reads where possible, ZFS when it owns the disks gives contiguous writes, but ZFS on multiple LVM volumes gives neither.
That way you can snapshot the partitions with LVM outside (to get "disk images") and ZFS management inside.
Why would you want to do that? As ZFS owns the devices and the mount points it's surely not going to be easy to have multiple snapshots of a ZFS filesystem active at once. It would probably be like trying to take a snapshot of a PV that's used for LVM - something that can theoretically be usable if you take the snapshot to another system but otherwise will be a massive PITA and probably cause data loss. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Hi Russell, http://en.wikipedia.org/wiki/ZFS#Storage_pools." "ZFS filesystems are built on top of virtual storage pools called zpools." The management commands are zfs and zpool. The man pages don't have any "zvols" in it but Wikipedia's article has. Ah, http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html "Block Devices (Are Gone)" ;-) Okay, "your" zvols are the block devices related to a zfs under Linux. I don't have them.. That's changing the picture slightly. On Wed, 17 Oct 2012, Russell Coker wrote:
The disks are files on a "normal" zfs so they profit from snapshoting, zfs send/receive mechanism for off-site backup etc.
So you can't do that with a zvol?
The commands are on zfs level (zfs snapshot, zfs send, zfs receive) and are not available via zpool commands. To zvol..
You may increase performance if you use "raw zpool" underneath but then you don't have the "cool stuff" (snapshots, cloning etc.) that wants you to use ZFS in the first place.
What is a "raw zpool"? Is that a zvol?
Reading http://zfsonlinux.org/example-zvol.html Never tried (or even thought of trying) to partition "something" created with "zfs create" under FreeBSD.. Don't think I can do that - I don't have block devices.. Anyway, would a zvol be significantly better than a file in the zfs as I do now? I actually thought of giving a dedicated zpool to the guest.
But I don't think it is a really good setup, it is just "good enough" here, and as I have all other stuff running in jails native on FreeBSD, I keep it.
In what way isn't it "really good"?
You mention them: inside the VirtualBox I don't have contiuous reads/writes etc. It is layering for the convenience of easy administration - not for high performance. But it is just one VirtualBox - all other services are running in jails. I am using zfs snapshots/send/receive to mirror all services on other boxes so a machine failing is not the end of the world (also: "good enough" here - it does not happen frequently, in fact it did not happen over the two years I am using this setup, and the business could tolerate one day's data loss - and what others do.. see the cloud disasters over the last two years;-) Not having to deal with distinguished layers (e.g. LVM and ZFS) has the advantage of not having to maintain two sets of administration tools.
I could imagine using LVM on Dom0 and giving partitions to the DomUs and running ZFS inside.
That means you lose the contiguous write feature of ZFS which is essential to good performance. Ext3/4 on LVM volumes gives somewhat contiguous reads where possible, ZFS when it owns the disks gives contiguous writes, but ZFS on multiple LVM volumes gives neither.
Agreed. Easy administration vs. performance.
That way you can snapshot the partitions with LVM outside (to get "disk images") and ZFS management inside.
Why would you want to do that?
As ZFS owns the devices and the mount points it's surely not going to be easy to have multiple snapshots of a ZFS filesystem active at once. It would probably be like trying to take a snapshot of a PV that's used for LVM - something that can theoretically be usable if you take the snapshot to another system but otherwise will be a massive PITA and probably cause data loss.
I don't understand what you mean here. Outside you have LVM and can do snapshots - if you force writing to ZFS in side beforehand, and suspend the guest for the snapshot (it does not take that long, and for a mail server, e.g. it is acceptable). Inside you can do snapshots that don't have to know whether the data is written physically, as long as the data is written to the virtual disk inside the guest system. Regards Peter

On Wed, 17 Oct 2012, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html
"Block Devices (Are Gone)" ;-)
That sounds bogus to me. Despite the lack of the BSD split device model we manage to not lose data on Linux. I think that they are advocating a BSD design flaw as a feature.
Anyway, would a zvol be significantly better than a file in the zfs as I do now?
It would probably be much the same in that they are both strings of bytes managed by the same ZFS code. Of course a file has mtime and atime fields while a block device probably doesn't.
I could imagine using LVM on Dom0 and giving partitions to the DomUs and running ZFS inside.
That means you lose the contiguous write feature of ZFS which is essential to good performance. Ext3/4 on LVM volumes gives somewhat contiguous reads where possible, ZFS when it owns the disks gives contiguous writes, but ZFS on multiple LVM volumes gives neither.
Agreed. Easy administration vs. performance.
Multiple ZFS instances is not "easy administration" IMHO. ZFS is a bit of a pain to setup, once it's going it makes some things easier, but it's not as easy as other filesystems. It's not something you do mkfs ...; mount ...
As ZFS owns the devices and the mount points it's surely not going to be easy to have multiple snapshots of a ZFS filesystem active at once. It would probably be like trying to take a snapshot of a PV that's used for LVM - something that can theoretically be usable if you take the snapshot to another system but otherwise will be a massive PITA and probably cause data loss.
I don't understand what you mean here.
With a filesystem like Ext3 you can umount it from a DomU, shutdown the DomU, and then mount it in the Dom0. It's no big deal at all. Ext3/4 with mount by UUID gets a little more complex as snapshots can result in mounting the wrong one, so you just don't use UUID mounting with LVM and similar things (LVM gives you a persistent name anyway). With something heavy like LVM and ZFS (which has similar functionality to LVM in some ways) you can't just freely mount snapshots etc. You need to scan for the devices and then the kernel keeps it's own list mapping names to devices. So if you have a second snapshot of the same device it's going to go badly wrong. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Wed, 17 Oct 2012, Russell Coker wrote:
On Wed, 17 Oct 2012, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html
"Block Devices (Are Gone)" ;-)
That sounds bogus to me. Despite the lack of the BSD split device model we manage to not lose data on Linux. I think that they are advocating a BSD design flaw as a feature.
I think the main reason is KISS. FreeBSD does not have block devices, /proc, /sys, does not rely on hald and dbus and whatever was invented on Linux - and I don't miss it.. Security-wise it makes a difference if you have multiple entry points to whatever it is.
It would probably be much the same in that they are both strings of bytes managed by the same ZFS code. Of course a file has mtime and atime fields while a block device probably doesn't.
There is "zfs set atime=off" but I don't think you can disable changing mtime.
I could imagine using LVM on Dom0 and giving partitions to the DomUs and running ZFS inside.
That means you lose the contiguous write feature of ZFS which is essential to good performance. Ext3/4 on LVM volumes gives somewhat contiguous reads where possible, ZFS when it owns the disks gives contiguous writes, but ZFS on multiple LVM volumes gives neither.
Agreed. Easy administration vs. performance.
Multiple ZFS instances is not "easy administration" IMHO. ZFS is a bit of a pain to setup, once it's going it makes some things easier, but it's not as easy as other filesystems. It's not something you do mkfs ...; mount ...
I don't know. I am doing "zfs create" and "zfs set mountpoint" all the time, and the number of machine tweaks isn't that high, mainly restricting the ARC size, or disabling prefetch. But: ZFS is memory hungry, and if you have multiple DomUs, you would have all of them sharing the physical RAM, and next to nothing is shared amongst them. So you probably don't want to have many DomUs with ZFS on the machine. Regards Peter

Peter Ross writes:
I think the main reason is KISS. FreeBSD does not have block devices, /proc, /sys, does not rely on hald and dbus and whatever was invented on Linux - and I don't miss it..
Sheesh, even Solaris has /proc. I guess fbsd isn't interested in stealing good ideas from plan9? ;-P hald has been gone for twelve months, it was replaced by (more) udev, and udisks and upower. The latter two are just as horrible as hald was. dbus is a stupid GUI thing -- and who cares about GUIs? (I was recently informed that its use in dnsmasq is a stupid Ubuntuism that upstream rejects, yay.) All I can say for /sys is at least it moves non-procs out of /proc. FWIW my containers don't mount /sys (for security reasons) and they have never complained...

On Thu, 18 Oct 2012, Trent W. Buck wrote:
Peter Ross writes:
I think the main reason is KISS. FreeBSD does not have block devices, /proc, /sys, does not rely on hald and dbus and whatever was invented on Linux - and I don't miss it..
Sheesh, even Solaris has /proc. I guess fbsd isn't interested in stealing good ideas from plan9? ;-P
Well, if it would stay with /proc/$pid.. but even then, in the Linux way at least (sorry, last look at Plan9 was too long ago to remember) it isn't very efficient to open a dozen files in a directory to get all process relevant information. http://en.wikipedia.org/wiki/Sysctl#Performance_considerations describes the dilemma.
All I can say for /sys is at least it moves non-procs out of /proc.
It moves.. slowly. My Ubuntu desktop here, a 3.2 kernel, has at least 40 or 50 more entries in /proc that aren't process related.
FWIW my containers don't mount /sys (for security reasons) and they have never complained...
Yep, would be good, e.g. for security, if all non-process info would move there, finally. Regards Peter

On Thu, 18 Oct 2012, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Well, if it would stay with /proc/$pid.. but even then, in the Linux way at least (sorry, last look at Plan9 was too long ago to remember) it isn't very efficient to open a dozen files in a directory to get all process relevant information.
http://en.wikipedia.org/wiki/Sysctl#Performance_considerations
describes the dilemma.
I tested the example given of running "top" and holding down the space-bar. That gave about 30% system CPU across both cores of my system, IE 60% of one core. Of course that included the X overhead, presumably less CPU time would have been used for a virtual console. Also any test that involves polling something at maximum speed can use 100% of a CPU core. The issue is does some task that is useful take so much CPU time? Opening files that are entirely RAM based is still a reasonably fast operation. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Thu, 18 Oct 2012, Russell Coker wrote:
On Thu, 18 Oct 2012, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Well, if it would stay with /proc/$pid.. but even then, in the Linux way at least (sorry, last look at Plan9 was too long ago to remember) it isn't very efficient to open a dozen files in a directory to get all process relevant information.
http://en.wikipedia.org/wiki/Sysctl#Performance_considerations
describes the dilemma.
I tested the example given of running "top" and holding down the space-bar. That gave about 30% system CPU across both cores of my system, IE 60% of one core. Of course that included the X overhead, presumably less CPU time would have been used for a virtual console.
I did it on both (X and text console), the result is ca. 25% in both cases. A FreeBSD system I pushed to ca. 1.6% CPU time that way (note: they are not exactly comparable systems but not the way apart that I would expect a 1:15 result. They have both 4GB RAM and the FreeBSD system runs 125 processes, the Linux 165).
Also any test that involves polling something at maximum speed can use 100% of a CPU core. The issue is does some task that is useful take so much CPU time? Opening files that are entirely RAM based is still a reasonably fast operation.
My /proc has ca. 165 process directories with 43 entries in each. Regards Peter

On Wed, Oct 17, 2012 at 01:51:35PM +1100, Peter Ross wrote:
Hi Russell,
http://en.wikipedia.org/wiki/ZFS#Storage_pools."
"ZFS filesystems are built on top of virtual storage pools called zpools."
The management commands are zfs and zpool. The man pages don't have any "zvols" in it but Wikipedia's article has.
zfs(8) on my linux system mentions mostly "volumes" but also "zvols" aka "zfs volumes". almost all commands look like zfs create [-ps] [-b blocksize] [-o property=value] ... -V size volume zfs destroy [-fnpRrv] filesystem|volume zfs clone [-p] [-o property=value] ... snapshot filesystem|volume zfs rename filesystem|volume|snapshot zfs set property=value filesystem|volume|snapshot ... zfs inherit [-r] property filesystem|volume|snapshot ... zfs receive | recv [-vnFu] filesystem|volume|snapshot # zfs create -V 2g pool/volumes/vol1 # zfs set shareiscsi=on pool/volumes/vol1 the page has zfsonlinux specific info (like device nodes under /dev/zvol) but the generic "volumes" info should be relevant to all current ports of zfs. zfs create [-ps] [-b blocksize] [-o property=value] ... -V size volume Creates a volume of the given size. The volume is exported as a block device in /dev/zvol/path, where path is the name of the vol ume in the ZFS namespace. The size represents the logical size as exported by the device. By default, a reservation of equal size is created. The FreeBSD man page at the following URL mentions volumes, same as the zfsonlinux man page. http://www.freebsd.org/cgi/man.cgi?query=zfs&manpath=FreeBSD+9.0-RELEASE the text for 'zfs create' is almost identical to the linux version. zfs create [-ps] [-b blocksize] [-o property=value] ... -V size volume Creates a volume of the given size. The volume is exported as a block device in /dev/zvol/{dsk,rdsk}/path, where path is the name of the volume in the ZFS namespace. The size represents the logical size as exported by the device. By default, a reservation of equal size is created.
Reading http://zfsonlinux.org/example-zvol.html
Never tried (or even thought of trying) to partition "something" created with "zfs create" under FreeBSD.. Don't think I can do that - I don't have block devices..
A VM given a zvol can partition it just like any other disk. dunno about freebsd, but on linux i can also access the partitions from the ZFS server (although i wouldn't want to mount one RW while the VM was running). # ls -l /dev/zvol/export/{sid,sid-*} brw-rw---T 1 libvirt-qemu kvm 230, 144 Oct 17 15:46 /dev/zvol/export/sid brw-rw---T 1 root disk 230, 145 Sep 8 17:10 /dev/zvol/export/sid-part1 brw-rw---T 1 root disk 230, 146 Sep 8 17:10 /dev/zvol/export/sid-part2
Anyway, would a zvol be significantly better than a file in the zfs as I do now?
performance wise? yes, definitely management convenience? almost certainly. e.g. zfs snapshot instead of qemu-img snapshot. craig -- craig sanders <cas@taz.net.au> BOFH excuse #372: Forced to support NT servers; sysadmins quit.

On Wed, 17 Oct 2012, Craig Sanders wrote:
The FreeBSD man page at the following URL mentions volumes, same as the zfsonlinux man page.
http://www.freebsd.org/cgi/man.cgi?query=zfs&manpath=FreeBSD+9.0-RELEASE
the text for 'zfs create' is almost identical to the linux version.
zfs create [-ps] [-b blocksize] [-o property=value] ... -V size volume
Creates a volume of the given size. The volume is exported as a block device in /dev/zvol/{dsk,rdsk}/path, where path is the name of the volume in the ZFS namespace. The size represents the logical size as exported by the device. By default, a reservation of equal size is created.
Yes, it works, and gives me a (character) device: # zfs create -b 512 -V 512 zpool/testvol # ls -l /dev/zvol/zpool/testvol crw-r----- 1 root operator 0, 204 Oct 17 16:41 /dev/zvol/zpool/testvol
Anyway, would a zvol be significantly better than a file in the zfs as I do now?
performance wise? yes, definitely
management convenience? almost certainly. e.g. zfs snapshot instead of qemu-img snapshot.
I have the disks for the VirtualBox in one directory and snapshot it. Thanks for finding "my zvols":-) Peter

Peter Ross <Peter.Ross@bogen.in-berlin.de> writes:
Linux containers are not that new either [...] It just feels more like an "add-on".. You may use your SE Linux wizardry to increase security if you don't trust it enough.
I'm not sure where you get that impression. AFAICT, there was OpenVZ, maintained as a third-party fork of linux because it changed lots of little bits all over the shop, and it did a few pragmatic hacks to solve problems. Then there was LXC, which is basically where OpenVZ work is cleaned up and integrated back into the mainline kernel, and becomes a first-class part of the kernel like sysfs or the tcp stack or whatever. Incidentally, I was talking to an fbsd user a while back and he gave me the impression that lxc "containerized" more resources than fbsd jails -- I forget which ones, though.

Hi Trent, On Wed, 17 Oct 2012, Trent W. Buck wrote:
Peter Ross <Peter.Ross@bogen.in-berlin.de> writes:
Linux containers are not that new either [...] It just feels more like an "add-on".. You may use your SE Linux wizardry to increase security if you don't trust it enough.
I'm not sure where you get that impression. AFAICT, there was OpenVZ, maintained as a third-party fork of linux because it changed lots of little bits all over the shop, and it did a few pragmatic hacks to solve problems. Then there was LXC, which is basically where OpenVZ work is cleaned up and integrated back into the mainline kernel, and becomes a first-class part of the kernel like sysfs or the tcp stack or whatever.
AFAIK it was integrated into the mainline in bits and pieces, starting with cgroups and namespaces in 2.6.29 (2009). FreeBSD jails are around since 1999.
Incidentally, I was talking to an fbsd user a while back and he gave me the impression that lxc "containerized" more resources than fbsd jails -- I forget which ones, though.
http://en.wikipedia.org/wiki/Operating_system-level_virtualization#Implement... gives an overview. FreeBSD does not have I/O resource limiting, as far as I know. Regards Peter
participants (5)
-
Craig Sanders
-
Jason White
-
Peter Ross
-
Russell Coker
-
trentbuck@gmail.com