
The server which among other things holds my email runs Debian/Wheezy for the Dom0 and most DomUs. It's running Xen for virtualisation and ZFS for data storage. Currently I have some problems, the hardware is defective and has started spontaneously rebooting. This is really bad but a combination of ECC RAM and ZFS means that the risk of data loss is low, but uptime isn't good. The server needs to be replaced so now is a good time to consider other changes. ZFS is not well supported on Linux due to license issues. There are two aspects of this which are particularly important to me, one is that I've found ZFS to cause the kernel to OOM in situations where I think it shouldn't (EG a server with 4G of RAM doing a light Samba load). The other is that ZFS doesn't always start properly on boot so I need to hack the init scripts to make sure it's mounted before starting daemons that depend on it (which doesn't always work, see my message about MySQL). I believe that my coding skills are up to the task of making ZFS work as I desire. But no-one is going to pay me to do that and I don't feel inclined to waste hobby time on work that won't benefit the FOSS community. For reliable data storage on Linux the options are ZFS and BTRFS. The down- side of BTRFS is that it is new and most distributions don't support it as well as one wants for production code. But Oracle apparently support it well. Oracle are also really into Xen and should be able to give better MySQL support than anyone else. The options I'm considering at the moment are using Oracle Linux for the Dom0 and using Debian for the Dom0 with an Oracle kernel. Both of those should be good for running BTRFS and Xen. Debian with an Oracle kernel will make kernel upgrades a little painful but it means I don't have to change the rest of the OS configuration (I've run Debian systems with CentOS kernels before). A complete Oracle OS will work well together but it means I have to change everything to a different OS, admittedly that won't be so hard as the Dom0 does little other than managing DomUs and running MySQL. Any suggestions? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Hey Russell, On 8 Nov 2013, at 6:17 pm, Russell Coker <russell@coker.com.au> wrote:
The options I'm considering at the moment are using Oracle Linux for the Dom0 and using Debian for the Dom0 with an Oracle kernel.
Oracle Linux itself doesn’t run Xen well, as it’s a (mostly) pure derivative of Red Hat Enterprise Linux and they removed Xen Dom0 support from their latest release. I would recommend/suggest trying out the combination of Oracle VM 3.2 (which is Xen 4.2) as your Dom0 with Oracle Linux as the DomUs. This would give you an OCFS2-based repository which could be shared across multiple systems. Note that Oracle VM and Oracle Linux are both free to download, distribute, use and update. Oracle VM also includes a full web-based management tool that configures the server(s), networking, storage and guests, including live migration and HA support out-of-the-box. Feel free to ping me directly for more info. Cheers, Avi

On Fri, 8 Nov 2013, Avi Miller <avi.miller@gmail.com> wrote:
The options I'm considering at the moment are using Oracle Linux for the Dom0 and using Debian for the Dom0 with an Oracle kernel.
Oracle Linux itself doesn’t run Xen well, as it’s a (mostly) pure derivative of Red Hat Enterprise Linux and they removed Xen Dom0 support from their latest release. I would recommend/suggest trying out the combination of Oracle VM 3.2 (which is Xen 4.2) as your Dom0 with Oracle Linux as the DomUs. This would give you an OCFS2-based repository which could be shared across multiple systems.
Note that Oracle VM and Oracle Linux are both free to download, distribute, use and update. Oracle VM also includes a full web-based management tool that configures the server(s), networking, storage and guests, including live migration and HA support out-of-the-box.
What exactly is Oracle VM? Just a Linux distribution specifically for Xen? Does it have the full features? IE can I have the Dom0 running as a full router and use shell access as a supported feature? Or is it expected to be only run via web access? Is BTRFS supported in Oracle VM? OCFS2 doesn't seem to give any benefits over Ext4 for a non-clustered filesystem. HA support out of the box sounds interesting. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Hi, On 11 Nov 2013, at 10:25 am, Russell Coker <russell@coker.com.au> wrote:
What exactly is Oracle VM? Just a Linux distribution specifically for Xen?
Oracle VM is Oracle's Xen-based virtualization product.
Does it have the full features? IE can I have the Dom0 running as a full router and use shell access as a supported feature? Or is it expected to be only run via web access?
Shell access is not a supported feature, but I suspect you're not going to be paying for support anyway. :) Oracle VM is designed to be used only with the web-based management tool, which configures and manages one or more Oracle VM Servers to create clustered or non-clustered pools. All aspects of management are available via the web UI, including server configuration (storage and network), VM creation, template creation, HA/migration, etc.
Is BTRFS supported in Oracle VM? OCFS2 doesn't seem to give any benefits over Ext4 for a non-clustered filesystem.
Neither btrfs nor ext4 are supported for local/non-clustered repositories, only OCFS2. This gives us consistency in filesystem between the local and clustered operation. OCFS2 also is tuned for VM storage (i.e. large file sizes) and allows for reflinking, which provides for hot snapshots of running VMs. For clustered storage, we support OCFS2 via iSCSI/FC SAN or NFS.
HA support out of the box sounds interesting.
HA requires a clustered pool, so you will need a small iSCSI or NFS mount (12GB) to provide the pool filesystem and a virtual IP. Once the cluster is enabled, you can flag any VM for HA. This means the VM will be restarted if it crashes, or automatically started on another server if the physical hosting the VM dies. Cheers, Avi

Quoting Russell Coker (russell@coker.com.au):
For reliable data storage on Linux the options are ZFS and BTRFS.
I'm really curious: Are you quite sure those are more reliable than ext4 or ext3 (in their default metadata journaling/write modes)? For performance, especially on very large volumes, there I'm sure you are right, if only on account of excessive fsck times. (I notice you said the machine that handles your mail, which seems to imply modest filesystem size.) However, that was not the criterion you mentioned, but rather reliability. I have laughably little data, but my naive suspicion is that both ext3 and ext4 are significantly more reliable than the other two.

Rick Moen <rick@linuxmafia.com> wrote:
Quoting Russell Coker (russell@coker.com.au):
For reliable data storage on Linux the options are ZFS and BTRFS.
I'm really curious: Are you quite sure those are more reliable than ext4 or ext3 (in their default metadata journaling/write modes)?
I suspect "reliable" here includes a requirement for data check sums, which only ZFS and BTRFS can satisfy. As I recall, XFS has (or is about to receive) support for metadata check sums/CRCs, but not for full data checking.

Quoting Jason White (jason@jasonjgw.net):
I suspect "reliable" here includes a requirement for data check sums, which only ZFS and BTRFS can satisfy.
Yes, that makes sense. (It was late here, and I'd forgotten that key difference.) And there is also scrubbing. And certainly Oracle Linux gives you the best access to Chris Mason's btrfs work in its most polished form, for now.

Hi, On 9 Nov 2013, at 3:32 am, Rick Moen <rick@linuxmafia.com> wrote:
And certainly Oracle Linux gives you the best access to Chris Mason's btrfs work in its most polished form, for now.
As much as I wish this were still 100% true, I should point out that Chris Mason left Oracle for Fusion-IO a while ago. However, Oracle does have Liu Bo and Anand Jain on our mainline kernel development team and both of them work on btrfs full time. We've also just released our Unbreakable Enterprise Kernel Release 3 for Oracle Linux 6 (x86_64), which is based on the 3.8 mainline and contains significant btrfs improvements over the previous 3.0-based UEK2. Cheers, Avi

On Sat, 9 Nov 2013 07:32:43 AM Avi Miller wrote:
We've also just released our Unbreakable Enterprise Kernel Release 3 for Oracle Linux 6 (x86_64), which is based on the 3.8 mainline and contains significant btrfs improvements over the previous 3.0-based UEK2.
You don't want to be using anything earlier than 3.10 if you are using snapshots and defrag as that's when the snapshot aware defrag code landed. Also note there is some angst about occasional 3.11 and 3.12 btrfs filesystem corruption due to a bug relating to doing a rebalance when defrag is happening, the patch you would need is from Liu Bo and is called: Btrfs: fix a crash when running balance and defrag concurrently it's apparently patched in the Fedora kernel (as it causes corruption of systemd logs) but it's not hit mainline or stable kernels yet (despite Greg K- H saying he'll take stable patches before they hit the mainline whilst Linus is off-net). It's labelled an experimental filesystem for a reason.. ;-) cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP

On Sat, 9 Nov 2013, Chris Samuel <chris@csamuel.org> wrote:
On Sat, 9 Nov 2013 07:32:43 AM Avi Miller wrote:
We've also just released our Unbreakable Enterprise Kernel Release 3 for Oracle Linux 6 (x86_64), which is based on the 3.8 mainline and contains significant btrfs improvements over the previous 3.0-based UEK2.
You don't want to be using anything earlier than 3.10 if you are using snapshots and defrag as that's when the snapshot aware defrag code landed.
Fortunately I have no great interest in defrag. Currently the only systems I run which have disk performance problems are mail servers and with an average message size of 60K I don't think that fragmentation is going to be a big problem.
Also note there is some angst about occasional 3.11 and 3.12 btrfs filesystem corruption due to a bug relating to doing a rebalance when defrag is happening, the patch you would need is from Liu Bo and is called:
Also a rebalance when running systemd causes problems with all the recent versions, something about the way it pre-allocates space for it's log files. 3.11 has problems with removing dozens of snapshots at the same time.
It's labelled an experimental filesystem for a reason.. ;-)
Yes, but unfortunately the non-experimental filesystems either lack the data integrity features or are "experimental" in other ways (IE ZFS). -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Sat, 9 Nov 2013 12:16:50 PM Russell Coker wrote:
Also a rebalance when running systemd causes problems with all the recent versions, something about the way it pre-allocates space for it's log files.
I *think* that's the same bug, at least it's the only outstanding balance bug with a patch that I've seen (and on the btrfs list it's described that way in response to your systemd bug report).
3.11 has problems with removing dozens of snapshots at the same time.
I think that's meant to be fixed as of 3.11.6 and 3.12 (again as mentioned by someone on the btrfs list in response to your report). The sort of testing & reporting you're doing is really important! I've not managed to break it in any of these ways yet, but then I'm just being (deliberately) really boring with my use of it (for now). :-) cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP

On Sat, 9 Nov 2013, Chris Samuel <chris@csamuel.org> wrote:
On Sat, 9 Nov 2013 12:16:50 PM Russell Coker wrote:
Also a rebalance when running systemd causes problems with all the recent versions, something about the way it pre-allocates space for it's log files.
I *think* that's the same bug, at least it's the only outstanding balance bug with a patch that I've seen (and on the btrfs list it's described that way in response to your systemd bug report).
Yes. It's an unfortunate combination given that there's a strong correlation between people who are willing to test a new init and people who are willing to test a new filesystem...
3.11 has problems with removing dozens of snapshots at the same time.
I think that's meant to be fixed as of 3.11.6 and 3.12 (again as mentioned by someone on the btrfs list in response to your report).
I've just tested that on one of my less important systems and it seems OK. But that system didn't crash so much anyway, I'll probably wait a week before I try 3.11.6 on my file server. My file server is currently running 3.10.11 which is OK as long as I don't run a balance (systemd issue) or a scrub (kernel OOM and lockup).
The sort of testing & reporting you're doing is really important!
I've not managed to break it in any of these ways yet, but then I'm just being (deliberately) really boring with my use of it (for now). :-)
I think I'm being boring too. No quotas, the only extra features I'm using are scrubbing (which you should do with a RAID-1 no matter how it's run) and snapshots. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Fri, 8 Nov 2013, Rick Moen <rick@linuxmafia.com> wrote:
Quoting Russell Coker (russell@coker.com.au):
For reliable data storage on Linux the options are ZFS and BTRFS.
I'm really curious: Are you quite sure those are more reliable than ext4 or ext3 (in their default metadata journaling/write modes)?
For performance, especially on very large volumes, there I'm sure you are right, if only on account of excessive fsck times. (I notice you said the machine that handles your mail, which seems to imply modest filesystem size.) However, that was not the criterion you mentioned, but rather reliability.
I have laughably little data, but my naive suspicion is that both ext3 and ext4 are significantly more reliable than the other two.
In terms of having a system reliably boot up and just start working zfsonlinux performs poorly. My experience is that I had to hack the system start scripts every time because they just don't work properly. As ZFS isn't properly free software I'm not going to spend the extra effort in fixing it properly and sending patches upstream. ZFS also isn't so good for long term upgrades. I'm not looking forward to upgrading a file server running Debian/Squeeze to Jessie - I'm anticipating that the downtime and hassle of the upgrade makes it not worth upgrading to Wheezy unless I rebuild it with bigger disks before Jessie comes out. But one of the biggest benefits of BTRFS and ZFS is the fact that they have checksums on everything. With the volumes of data that everyone has it's not practical to do manual checks. You have to have a filesystem that does checks or accept the fact that eventually data will just disappear. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On 8 November 2013 18:17, Russell Coker <russell@coker.com.au> wrote:
The options I'm considering at the moment are using Oracle Linux for the Dom0 and using Debian for the Dom0 with an Oracle kernel. Both of those should be good for running BTRFS and Xen. Debian with an Oracle kernel will make kernel upgrades a little painful but it means I don't have to change the rest of the OS configuration (I've run Debian systems with CentOS kernels before). A complete Oracle OS will work well together but it means I have to change everything to a different OS, admittedly that won't be so hard as the Dom0 does little other than managing DomUs and running MySQL.
Any suggestions?
Consider Ubuntu Server? It's similar to Debian, so you'll find it easier to get the hang of it; however it tends to have more active maintenance on the stable releases and tracks closer to current versions of software and the linux kernel. You'd probably want the LTS (Long Term Support) version, but more recent kernels are backported for it in the official repos, so you can have a recent kernel version for good btrfs support. Patches are backported by the Ubuntu team into the kernels too. Ubuntu has supported btrfs for quite a while now, and it seems to work well - at least I've been running btrfs filesystems for a while with no issues. I've tried ZFS for a while as well, but came to the same conclusion as you have -- that it's just a bit too flaky and a bit too much bother if the subset of features you actually want are available in btrfs. tjc
participants (6)
-
Avi Miller
-
Chris Samuel
-
Jason White
-
Rick Moen
-
Russell Coker
-
Toby Corkindale