
So it seems the best way of getting Raid1, for the purpose of protecting your data if one hard disk fails, is to use Software Raid, and then put bttrfs on top? Is this correct? Is it ok to run bttrfs in a VM using virtualized disk? I have heard people say you shouldn't do this, because there is no guarantee that data has been written to physical disk when requested by the VM. Is this correct? Overly paranoid?

On 08/07/2015 3:40 PM, "Brian May" <brian@microcomaustralia.com.au> wrote:
So it seems the best way of getting Raid1, for the purpose of protecting
your data if one hard disk fails, is to use Software Raid, and then put bttrfs on top? Is this correct? No. Others will give a more comprehensive answer but you use btrfs (or for that matter zfs) instead of software raid and let it manage the raid and file system. BTRFS and Zfs work from the block level up. That is what I've done successfully on my media server using zfs.

On Wed, 8 Jul 2015 at 17:18 Colin Fee <tfeccles@gmail.com> wrote:
Others will give a more comprehensive answer but you use btrfs (or for that matter zfs) instead of software raid and let it manage the raid and file system. BTRFS and Zfs work from the block level up.
That is what I've done successfully on my media server using zfs.
My understanding though (from what was said yesterday) is that BTRFS with "RAID" support (currently) offers no guarantee that two copies of everything will be on separate disks. So if one disk fails, you could loose both copies of your data.

On 08/07/2015 5:39 PM, "Brian May" <brian@microcomaustralia.com.au> wrote:
On Wed, 8 Jul 2015 at 17:18 Colin Fee <tfeccles@gmail.com> wrote:
Others will give a more comprehensive answer but you use btrfs (or for
that matter zfs) instead of software raid and let it manage the raid and file system. BTRFS and Zfs work from the block level up.
That is what I've done successfully on my media server using zfs.
My understanding though (from what was said yesterday) is that BTRFS with "RAID" support (currently) offers no guarantee that two copies of everything will be on separate disks. So if one disk fails, you could loose both copies of your data.
I can't speak for the case with btrfs but zfs the raid component works as expectec. Last year when i had a disk failure zfs effectively kicked it out of the array but my data was still there as expected. The replacement process was very easy to handle as well with zfs's well thought commands.

On Wed, 8 Jul 2015 07:39:40 AM Brian May wrote:
My understanding though (from what was said yesterday) is that BTRFS with "RAID" support (currently) offers no guarantee that two copies of everything will be on separate disks. So if one disk fails, you could loose both copies of your data.
No, what happens with BTRFS RAID-1 is that you only get 2 copies of data on different device, no matter how many devices you have. So if you have a 3 drive RAID-1 using BTRFS then you will NOT have 3 identical mirrors; some data will be on device 0 & 1, some data will be on device 1 & 2 and some will be on device 0 & 2. https://btrfs.wiki.kernel.org/index.php/SysadminGuide#RAID_and_data_replicat... # With RAID-1 and RAID-10, only two copies of each byte of data are written, # regardless of how many block devices are actually in use on the filesystem. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Wed, 8 Jul 2015 at 21:50 Chris Samuel <chris@csamuel.org> wrote:
No, what happens with BTRFS RAID-1 is that you only get 2 copies of data on different device, no matter how many devices you have.
Oh, OK, so I misunderstood. So in other words, using BTRFS RAID-1 is fine if you have exactly two disks?

On Thu, 9 Jul 2015 01:35:40 AM Brian May wrote:
So in other words, using BTRFS RAID-1 is fine if you have exactly two disks?
It's OK if you have more than 2 disks as long as you understand the difference in behaviour and that's not important to you. cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Wed, 8 Jul 2015 03:40:29 PM Brian May wrote:
So it seems the best way of getting Raid1, for the purpose of protecting your data if one hard disk fails, is to use Software Raid, and then put bttrfs on top? Is this correct?
No. A scrub of Linux software RAID-1 will result in both disks having identical contents, it can copy bad data to the good disk to make it so. A scrub of BTRFS (only one T) will determine which data is correct based on checksums and copy good data over bad.
Is it ok to run bttrfs in a VM using virtualized disk? I have heard people say you shouldn't do this, because there is no guarantee that data has been written to physical disk when requested by the VM. Is this correct? Overly paranoid?
I am not aware of any reason why BTRFS might handle that case any worse than other filesystems. One thing that BTRFS doesn't handle well is snapshots of devices. For example if you use LVM to snapshot devices you don't want to use BTRFS in that situation. So if your VMs use LVM volumes for storage then BTRFS isn't a good choice. For some of my virtual servers I create a BTRFS subvolume named /xenstore (or whatever seems suitable) and then use files in that for block devices in virtual machines (both Xen and KVM handle files as virtual block devices). Then I can use BTRFS snapshots of /xenstore to back it up. In that case I COULD use BTRFS for the virtual machines and only make a loopback device of one of those files at a time, but I haven't felt the need to. I think that using BTRFS to manage the storage in Dom0 gives adequate data protection and snapshot features and Ext3/4 is sufficient for running in the DomU. On Thu, 9 Jul 2015 11:35:40 AM Brian May wrote:
So in other words, using BTRFS RAID-1 is fine if you have exactly two disks?
BTRFS RAID-1 is fine if you have more than 1 disk. But if you have more than 2 disks then it operates like the Linux Software RAID-10. The difference is that if you have disks that are 1TB, 2TB, and 3TB running BTRFS "RAID-1" you will get 2TB of mirrored storage. Then if you add another 1TB disk and do a balance you will get another 500G of free space. If you lose a single disk then it's no big deal. If you lose 2 disks then you have a problem. The more disks there are in the array the greater the probability of losing 2 of them. RAID-6 is a good option to solve such problems, but I don't think the BTRFS implementation is ready for important data. The ZFS RAID-6 (AKA RAID-Z2) is very solid, but you need all disks to be the same size. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi, The attached email might be useful for your understanding. I haven't gotten around to watching both videos myself yet. I've always thought that ZFS would be better than btrfs myself, but now I'm not as convinced of that. ZFS on BSD for sure, but btrfs on Linux ... probably better and will be much better yet. The big risk, as far as I am concerned, is that it relies too much on specific kernel versions and of course it relies upon Linux too. For a number of reasons, I would be happier if this was also available in the BSD world. Cheers A. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREIAAYFAlWnZnkACgkQqBZry7fv4vtf9wEAxpLoLyYfNEZoSshbJ84FESu7 tauR8gttFqHwHGfhHcYA/3vEozOvySeau3JOPoQX1Aj8nDEEyYzUW7j0GGrtmIV/ =HHGM -----END PGP SIGNATURE-----

On Thu, 16 Jul 2015 at 18:08 Andrew McGlashan < andrew.mcglashan@affinityvision.com.au> wrote:
I've always thought that ZFS would be better than btrfs myself, but now I'm not as convinced of that.
I have always heard people say ZFS is better, Oracle should support ZFS not btrfs, etc. Don't know what to think myself.
ZFS on BSD for sure, but btrfs on Linux ... probably better and will be much better yet. The big risk, as far as I am concerned, is that it relies too much on specific kernel versions and of course it relies upon Linux too. For a number of reasons, I would be happier if this was also available in the BSD world.
coreos stopped using btrfs by default. They say they encountered too many problems with it. https://groups.google.com/forum/m/#!topic/coreos-dev/NDEOXchAbuU

On Thu, 16 Jul 2015 10:28:40 AM Brian May wrote:
I have always heard people say ZFS is better, Oracle should support ZFS not btrfs, etc. Don't know what to think myself.
Oracle now own the copyrights and the IP and they could dual license ZFS if they wanted it to be available for Linux. But they don't. Chris Mason left Oracle years ago and is now at Facebook (after a brief stint at Fusion-IO). cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi, On 16/07/2015 8:28 PM, Brian May wrote:
On Thu, 16 Jul 2015 at 18:08 Andrew McGlashan <andrew.mcglashan@affinityvision.com.au <mailto:andrew.mcglashan@affinityvision.com.au>> wrote:
I've always thought that ZFS would be better than btrfs myself, but now I'm not as convinced of that.
I have always heard people say ZFS is better, Oracle should support ZFS not btrfs, etc. Don't know what to think myself.
Oracle inherited ZFS, it wasn't their own product; they were already working on btrfs and they also have OCFS2 (Oracle clustered file system 2). Consequently it isn't surprising to see that Oracle has all but abandoned ZFS, there are other reasons too, so it's unfortunate. It /may/ be that ZFS will be the betamax of file systems; but perhaps not, perhaps it just hasn't been given enough love to shine out and become the winner.
ZFS on BSD for sure, but btrfs on Linux ... probably better and will be much better yet. The big risk, as far as I am concerned, is that it relies too much on specific kernel versions and of course it relies upon Linux too. For a number of reasons, I would be happier if this was also available in the BSD world.
coreos stopped using btrfs by default. They say they encountered too many problems with it.
https://groups.google.com/forum/m/#!topic/coreos-dev/NDEOXchAbuU
That forum thread was from over 6 months ago, that's a long time in btrfs speak. I'm not sure, but I think we need appliances that server btrfs file systems via iSCSI or other methods. Having an appliance that is limited to handling all the file system stuff and allowing it to have static kernel (as much as possible), or at least managed independently of other server requirements, such as web / mail / dns and other servers for instance. A. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREIAAYFAlWoITMACgkQqBZry7fv4vtBFAEA1JReA4oc3TfLPqJUZdPtrfdw FKB879pMlSwUYFH3GikA/RNJ2r7UnE75pGVHfflvGkA/b9y7eom3T7XFQy5Dtxh3 =h/wJ -----END PGP SIGNATURE-----

On Fri, 17 Jul 2015 07:25:08 AM Andrew McGlashan wrote:
I'm not sure, but I think we need appliances that server btrfs file systems via iSCSI or other methods.
My understanding is that you use iSCSI to provide block devices to run filesystems on, what would match your idea would be something like exporting the filesystem via NFS (or some other distributed filesystem). I guess what you lose there is the chance for the clients to do things like move files between subvolumes via reflink (which the "mv" command in the latest coreutils will try as its first step now) which means the data blocks can be untouched, just the metadata in the filesystem needs to be updated; or for clients to set various attributes like nocow on a file where that copy-on- write semantics could result in poor behaviour. But it's certainly a possibility for simpler use cases. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Thu, 16 Jul 2015 at 18:08 Andrew McGlashan < andrew.mcglashan@affinityvision.com.au> wrote:
I haven't gotten around to watching both videos myself yet.
I had problems looking at that email, it appears Google Mail on android can handle the embedded message, but Google Mail on Firefox can't. As such I looked at the file in a text editor, which gave incorrect links because - I suspect - of quoted-printable encoding. The LCA one is https://www.youtube.com/watch?v=6DplcPrQjvA Just watching it now; I note it says to avoid kernels 3.15 to 3.16.1 - Debian Jessie has 3.16.0... Will watch the NYLUG one next - seems to be at https://www.youtube.com/watch?v=W3QRWUfBua8

On Sat, Jul 18, 2015 at 01:08:31AM +0000, Brian May wrote:
On Thu, 16 Jul 2015 at 18:08 Andrew McGlashan < andrew.mcglashan@affinityvision.com.au> wrote:
I haven't gotten around to watching both videos myself yet.
I had problems looking at that email, it appears Google Mail on android can handle the embedded message, but Google Mail on Firefox can't. As such I looked at the file in a text editor, which gave incorrect links because - I suspect - of quoted-printable encoding.
the QP in that message isn't capable of messing up the URLs, there isn't any QP-stuff anywhere near the URLs - it's probably because the original poster put full-stops at the end of some of the URLs....arguably correct English, but terrible URL formatting. btw, the Chris Mason talk doesn't start until 6:00 into that video so if you want to skip 6 minutes of NYLUG administrivia, go to: https://youtu.be/W3QRWUfBua8?t=360 craig -- craig sanders <cas@taz.net.au>

On Sat, 18 Jul 2015 at 11:05 Brian May <brian@microcomaustralia.com.au> wrote:
Will watch the NYLUG one next - seems to be at https://www.youtube.com/watch?v=W3QRWUfBua8
At 26:25 he said quota support has had significant improvement and "quotas is really something that you can use". Interesting.

On Sat, 18 Jul 2015 at 11:05 Brian May <brian@microcomaustralia.com.au> wrote:
Just watching it now; I note it says to avoid kernels 3.15 to 3.16.1 - Debian Jessie has 3.16.0...
Just tried installing Linux-4.1 from experimental on my desktop Jessie system to try btrfs. Unfortunately, the nvidia-kernel-dkms package from Jessie will not compile against Linux-4.1, and the version from experimental requires the nvidia-driver from experimental also, which pulls in a dependency that cannot be satisfied in Jessie+experimental. So I tried switching to NOUVEAU. This appears to work, but instead of starting gdm I get a big graphical screen saying "Oh no! Something has gone wrong. A problem has occurred and the system can't recover. Please log out and try again." The X log files show nothing serious wrong: [ 10.342] (II) This device may have been added with another device file. [ 10.343] (EE) FBDEV(0): FBIOBLANK: Invalid argument So I suspect if I want to test btrfs on this system, I really should upgrade to Debian testing first...

On Sat, 18 Jul 2015 at 15:11 Brian May <brian@microcomaustralia.com.au> wrote:
Just tried installing Linux-4.1 from experimental on my desktop Jessie system to try btrfs.
Got this working. I had to back port nvidia-graphics-driver (from experimental) and libvdpau (from unstable) to Debian Jessie. It seems to work fine. Are there any resources for recommended setup details, e.g. recommended cron jobs for scrubbing btrfs, etc? Anything else recommended with Linux-4.1? Apparently, from videos, manual balancing shouldn't be required any more? Thanks

On CentOS7 I get "No space left on device errors" just now.. [root@m-admin03 ~]# btrfs scrub start / WARNING: failed to write the progress status file: No space left on device. Status recording disabled scrub started on /, fsid 25f5d19c-a4d3-4f28-9abb-699362765879 (pid=2833) [root@m-admin03 ~]# btrfs scrub status / scrub status for 25f5d19c-a4d3-4f28-9abb-699362765879 no stats available total bytes scrubbed: 0.00 with 0 errors [root@m-admin03 ~]# btrfs filesystem df / Data, single: total=5.97GiB, used=4.45GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=341.38MiB, used=242.31MiB Metadata, single: total=8.00MiB, used=0.00 GlobalReserve, single: total=96.00MiB, used=0.00 Sorry, I don't understand this.. In general, after a few weeks back in "Linuxland" (after more than 4 years FreeBSD/ZFS/jails).. Linux feels just sticky. FreeBSD is well-designed, secure and reliable compared to this. Seen from FreeBSD, Linux is what Windows is to Linux: Just more "main stream" so everybody is scared of using "the exotic OS". So I am quite ineffective at the moment, driving with the handbrake on. As for the licensing: I could not be more bored. ZFS has a life outside Oracle. If you need a storage appliance, use FreeNAS, I think. It beats Linux-based stuff by a mile. Regards Peter On Thu, Jul 23, 2015 at 9:05 AM, Brian May <brian@microcomaustralia.com.au> wrote:
On Sat, 18 Jul 2015 at 15:11 Brian May <brian@microcomaustralia.com.au> wrote:
Just tried installing Linux-4.1 from experimental on my desktop Jessie system to try btrfs.
Got this working. I had to back port nvidia-graphics-driver (from experimental) and libvdpau (from unstable) to Debian Jessie.
It seems to work fine.
Are there any resources for recommended setup details, e.g. recommended cron jobs for scrubbing btrfs, etc?
Anything else recommended with Linux-4.1? Apparently, from videos, manual balancing shouldn't be required any more?
Thanks
_______________________________________________ luv-main mailing list luv-main@luv.asn.au http://lists.luv.asn.au/listinfo/luv-main

On Mon, 27 Jul 2015 06:00:40 PM Peter Ross wrote:
Sorry, I don't understand this..
"BTRFS is a Technology Preview in Red Hat Enterprise Linux 7." You're on an ancient kernel (based on 3.10). Pretty sure what you're seeing has been fixed since then. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Mon, 27 Jul 2015 06:00:40 PM Peter Ross wrote:
On CentOS7 I get "No space left on device errors" just now..
[root@m-admin03 ~]# btrfs scrub start / WARNING: failed to write the progress status file: No space left on device. Status recording disabled scrub started on /, fsid 25f5d19c-a4d3-4f28-9abb-699362765879 (pid=2833) [root@m-admin03 ~]# btrfs scrub status / scrub status for 25f5d19c-a4d3-4f28-9abb-699362765879 no stats available
Some of the btrfs commands write a file to disk to track the status of things, it's not unlike the way LVM and mdadm use files under /etc. When there is no space for a small file you have bigger problems than the filesystem not being scrubbed. So the first thing to do is to solve that.
total bytes scrubbed: 0.00 with 0 errors [root@m-admin03 ~]# btrfs filesystem df / Data, single: total=5.97GiB, used=4.45GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=341.38MiB, used=242.31MiB Metadata, single: total=8.00MiB, used=0.00 GlobalReserve, single: total=96.00MiB, used=0.00
Sorry, I don't understand this..
Looks like you are running low on metadata space. Run a balance to free a data chunk.
As for the licensing: I could not be more bored. ZFS has a life outside Oracle.
The GPL vs BSD license debate has died down. Most people who matter prefer GPL while most companies prefer BSD so they can take free software and make it non-free. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Quoting Russell Coker (russell@coker.com.au):
On Mon, 27 Jul 2015 06:00:40 PM Peter Ross wrote:
As for the licensing: I could not be more bored. ZFS has a life outside Oracle.
The GPL vs BSD license debate has died down. Most people who matter prefer GPL while most companies prefer BSD so they can take free software and make it non-free.
Just a reminder: ZFS code is licensed under CDDL, which is a weak-copyleft variant of MPL. It is not under BSD or any other permissive licence. (Some permissive-licensing ideologues don't like ZFS in FreeBSD for the same reason they don't like any other copylefted code in that project.)

On Mon, Jul 27, 2015 at 09:38:09AM -0700, Rick Moen wrote:
Just a reminder: ZFS code is licensed under CDDL, which is a weak-copyleft variant of MPL. It is not under BSD or any other permissive licence.
which is a shame, because if it was there would be no licensing problem preventing it from being merged with the linux kernel. BSD is compatible with GPL, CDDL isn't. Sun did that deliberately so that GPL projects like linux couldn't use their code, negating most of the good will (and code contributions) they could have achieved from switching to an open source license.
(Some permissive-licensing ideologues don't like ZFS in FreeBSD for the same reason they don't like any other copylefted code in that project.)
yeah, well, BSD license zealots are weird. they're perfectly OK with proprietary software taking their code but spit the dummy when a GPL project does the same ("it's not fair", they say). and even with LGPL and other weak copyleft licenses they think it's a matter of principle to refuse to use it. craig -- craig sanders <cas@taz.net.au>

Quoting Craig Sanders (cas@taz.net.au): [ZFS code and other parts of Solaris being under CDDL:]
which is a shame, because if it was there would be no licensing problem preventing it from being merged with the linux kernel.
BSD is compatible with GPL, CDDL isn't.
Sun did that deliberately so that GPL projects like linux couldn't use their code, negating most of the good will (and code contributions) they could have achieved from switching to an open source license.
I'm honestly not sure there wasn't a reason derived from Sun's relations with third-party stakeholders -- mostly because that was Netscape Communications's reason for crafting MPL to apply reciprocal obligation or not on a code-module by code-module basis. However, I think we're departing from the scope of luv-main, so I won't elaborate.
yeah, well, BSD license zealots are weird. they're perfectly OK with proprietary software taking their code but spit the dummy when a GPL project does the same ("it's not fair", they say). and even with LGPL and other weak copyleft licenses they think it's a matter of principle to refuse to use it.
Yeah, boy, I've sure seen this, and been mystified.

Russell Coker wrote:
Looks like you are running low on metadata space. Run a balance to free a data chunk.
# btrfs balance / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [root@m-admin03 ~]# dmesg | tail .. [57720.728021] BTRFS info (device sda3): relocating block group 12582912 flags 1 [57721.462525] BTRFS info (device sda3): relocating block group 4194304 flags 4 [57722.195723] BTRFS info (device sda3): relocating block group 0 flags 2 [57723.302220] BTRFS info (device sda3): 14 enospc errors during balance
The GPL vs BSD license debate has died down. Most people who matter prefer GPL while most companies prefer BSD so they can take free software and make it non-free.
Well, I am a sysadmin and simply interested in technology that works. Yes, Chris, CentOS7 is based on 3.10, a 2 years old kernel, but CentOS7 is a distribution I use now. I happily used ZFS 2 years ago, and jails, and see nothing as pale imitations and "magic' in the system I am using now. Obviously we cannot even predict anymore when your filesystems are mounted, as the systemd/btrfs thread illustrates. We muddle our way through, that's all. There is no beauty in it. Regards Peter On Mon, Jul 27, 2015 at 10:54 PM, Russell Coker <russell@coker.com.au> wrote:
On Mon, 27 Jul 2015 06:00:40 PM Peter Ross wrote:
On CentOS7 I get "No space left on device errors" just now..
[root@m-admin03 ~]# btrfs scrub start / WARNING: failed to write the progress status file: No space left on device. Status recording disabled scrub started on /, fsid 25f5d19c-a4d3-4f28-9abb-699362765879 (pid=2833) [root@m-admin03 ~]# btrfs scrub status / scrub status for 25f5d19c-a4d3-4f28-9abb-699362765879 no stats available
Some of the btrfs commands write a file to disk to track the status of things, it's not unlike the way LVM and mdadm use files under /etc. When there is no space for a small file you have bigger problems than the filesystem not being scrubbed. So the first thing to do is to solve that.
total bytes scrubbed: 0.00 with 0 errors [root@m-admin03 ~]# btrfs filesystem df / Data, single: total=5.97GiB, used=4.45GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=341.38MiB, used=242.31MiB Metadata, single: total=8.00MiB, used=0.00 GlobalReserve, single: total=96.00MiB, used=0.00
Sorry, I don't understand this..
Looks like you are running low on metadata space. Run a balance to free a data chunk.
As for the licensing: I could not be more bored. ZFS has a life outside Oracle.
The GPL vs BSD license debate has died down. Most people who matter prefer GPL while most companies prefer BSD so they can take free software and make it non-free.
-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

P.S. I added another virtual disk to make it happy. The problem is that I cannot see anything critically close to be exhausted. So I have no reliable way of monitoring and predict when the problem appears. Thanks Peter On Tue, Jul 28, 2015 at 9:54 AM, Peter Ross <petrosssit@gmail.com> wrote:
Russell Coker wrote:
Looks like you are running low on metadata space. Run a balance to free a data chunk.
# btrfs balance / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [root@m-admin03 ~]# dmesg | tail .. [57720.728021] BTRFS info (device sda3): relocating block group 12582912 flags 1 [57721.462525] BTRFS info (device sda3): relocating block group 4194304 flags 4 [57722.195723] BTRFS info (device sda3): relocating block group 0 flags 2 [57723.302220] BTRFS info (device sda3): 14 enospc errors during balance
The GPL vs BSD license debate has died down. Most people who matter prefer GPL while most companies prefer BSD so they can take free software and make it non-free.
Well, I am a sysadmin and simply interested in technology that works.
Yes, Chris, CentOS7 is based on 3.10, a 2 years old kernel, but CentOS7 is a distribution I use now.
I happily used ZFS 2 years ago, and jails, and see nothing as pale imitations and "magic' in the system I am using now. Obviously we cannot even predict anymore when your filesystems are mounted, as the systemd/btrfs thread illustrates.
We muddle our way through, that's all. There is no beauty in it.
Regards Peter
On Mon, Jul 27, 2015 at 10:54 PM, Russell Coker <russell@coker.com.au> wrote:
On Mon, 27 Jul 2015 06:00:40 PM Peter Ross wrote:
On CentOS7 I get "No space left on device errors" just now..
[root@m-admin03 ~]# btrfs scrub start / WARNING: failed to write the progress status file: No space left on device. Status recording disabled scrub started on /, fsid 25f5d19c-a4d3-4f28-9abb-699362765879 (pid=2833) [root@m-admin03 ~]# btrfs scrub status / scrub status for 25f5d19c-a4d3-4f28-9abb-699362765879 no stats available
Some of the btrfs commands write a file to disk to track the status of things, it's not unlike the way LVM and mdadm use files under /etc. When there is no space for a small file you have bigger problems than the filesystem not being scrubbed. So the first thing to do is to solve that.
total bytes scrubbed: 0.00 with 0 errors [root@m-admin03 ~]# btrfs filesystem df / Data, single: total=5.97GiB, used=4.45GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=341.38MiB, used=242.31MiB Metadata, single: total=8.00MiB, used=0.00 GlobalReserve, single: total=96.00MiB, used=0.00
Sorry, I don't understand this..
Looks like you are running low on metadata space. Run a balance to free a data chunk.
As for the licensing: I could not be more bored. ZFS has a life outside Oracle.
The GPL vs BSD license debate has died down. Most people who matter prefer GPL while most companies prefer BSD so they can take free software and make it non-free.
-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Tue, 28 Jul 2015, Peter Ross <petrosssit@gmail.com> wrote:
Russell Coker wrote:
Looks like you are running low on metadata space. Run a balance to free a data chunk.
# btrfs balance / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [root@m-admin03 ~]# dmesg | tail .. [57720.728021] BTRFS info (device sda3): relocating block group 12582912 flags 1 [57721.462525] BTRFS info (device sda3): relocating block group 4194304 flags 4 [57722.195723] BTRFS info (device sda3): relocating block group 0 flags 2 [57723.302220] BTRFS info (device sda3): 14 enospc errors during balance
Recent kernels have fixed most of those issues. But the old kernel should be OK as long as you have plenty of free space all the time. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

In the meantime, I experimented with btrfs add/delete. btrfs delete was the end of it. The system was stuck and after reset it did not boot anymore. Regards peter On Tue, Jul 28, 2015 at 11:19 AM, Russell Coker <russell@coker.com.au> wrote:
On Tue, 28 Jul 2015, Peter Ross <petrosssit@gmail.com> wrote:
Russell Coker wrote:
Looks like you are running low on metadata space. Run a balance to free a data chunk.
# btrfs balance / ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail [root@m-admin03 ~]# dmesg | tail .. [57720.728021] BTRFS info (device sda3): relocating block group 12582912 flags 1 [57721.462525] BTRFS info (device sda3): relocating block group 4194304 flags 4 [57722.195723] BTRFS info (device sda3): relocating block group 0 flags 2 [57723.302220] BTRFS info (device sda3): 14 enospc errors during balance
Recent kernels have fixed most of those issues. But the old kernel should be OK as long as you have plenty of free space all the time.
-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Tue, 28 Jul 2015 01:53:10 PM Peter Ross wrote:
In the meantime, I experimented with btrfs add/delete.
btrfs delete was the end of it. The system was stuck and after reset it did not boot anymore.
You need to make sure there's enough space free before doing that. If you want your data back then try mounting with a newer kernel. In future don't use small filesystems with BTRFS, it's not designed to have lots of small partitions the way that Ext* is. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Hi Russell, there was enough space (in theory) to keep all data on one disk (less than 5GB on a 8GB partition, from memory), and the "btrfs delete" seems to have succeeded. "btrfs filesystem df /" and device stats looked fine and did not show any reference to the second disk. The trouble only started when I removed the second disk (physically, as in disk on a virtual machine on ESXi). Using a newer kernel is out of question here - I have to use the latest "Enterprise Linux" (CentOS 7) without patches. It looks as I have to abandon all plans to use btrfs and am back in the stone age. Thanks for comments and suggestions Peter On Tue, Jul 28, 2015 at 11:21 PM, Russell Coker <russell@coker.com.au> wrote:
On Tue, 28 Jul 2015 01:53:10 PM Peter Ross wrote:
In the meantime, I experimented with btrfs add/delete.
btrfs delete was the end of it. The system was stuck and after reset it did not boot anymore.
You need to make sure there's enough space free before doing that.
If you want your data back then try mounting with a newer kernel. In future don't use small filesystems with BTRFS, it's not designed to have lots of small partitions the way that Ext* is.
-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On 29 July 2015 at 11:41, Peter Ross <petrosssit@gmail.com> wrote:
Using a newer kernel is out of question here - I have to use the latest "Enterprise Linux" (CentOS 7) without patches.
Does that include using kernels from other CentOS repos? http://wiki.centos.org/AdditionalResources/Repositories The elrepo repository listed above has a mainline kernel that follows upstream stable releases, so no need for patching. In case anyone is interested, openSUSE also has the same type of stable kernel repo here: http://download.opensuse.org/repositories/Kernel:/stable/standard/ and the Ubuntu has it's mainline kernel ppa: http://kernel.ubuntu.com/~kernel-ppa/mainline/ Regards, Marcus. -- Marcus Furlong

Marcus Furlong wrote:
On 29 July 2015 at 11:41, Peter Ross <petrosssit@gmail.com> wrote:
Using a newer kernel is out of question here - I have to use the latest "Enterprise Linux" (CentOS 7) without patches.
Does that include using kernels from other CentOS repos?
The elrepo repository listed above has a mainline kernel that follows upstream stable releases, so no need for patching.
I wonder how safe it is to use these kernels, and whether it breaks the userland.
From the policy view, it somehow defeats the purpose of choosing an Enterprise Linux (with well-tested software in their own "kernel/userland universe") and then throwing out crucial parts of it.
I could put a FreeBSD kernel underneath then, it has a Linux kernel ABI, and start deploying CentOS jails on ZFS;-) https://wiki.freebsd.org/VIMAGE/Linux/CentOS55 I don't think I get away with this either;-) Thanks Peter On Wed, Jul 29, 2015 at 12:07 PM, Marcus Furlong <furlongm@gmail.com> wrote:
On 29 July 2015 at 11:41, Peter Ross <petrosssit@gmail.com> wrote:
Using a newer kernel is out of question here - I have to use the latest "Enterprise Linux" (CentOS 7) without patches.
Does that include using kernels from other CentOS repos?
http://wiki.centos.org/AdditionalResources/Repositories
The elrepo repository listed above has a mainline kernel that follows upstream stable releases, so no need for patching.
In case anyone is interested, openSUSE also has the same type of stable kernel repo here:
http://download.opensuse.org/repositories/Kernel:/stable/standard/
and the Ubuntu has it's mainline kernel ppa:
http://kernel.ubuntu.com/~kernel-ppa/mainline/
Regards, Marcus. -- Marcus Furlong

On Thu, 30 Jul 2015 11:47:34 AM Peter Ross wrote:
Marcus Furlong wrote:
On 29 July 2015 at 11:41, Peter Ross <petrosssit@gmail.com> wrote:
Using a newer kernel is out of question here - I have to use the latest "Enterprise Linux" (CentOS 7) without patches.
Does that include using kernels from other CentOS repos?
http://wiki.centos.org/AdditionalResources/Repositories
The elrepo repository listed above has a mainline kernel that follows upstream stable releases, so no need for patching.
I wonder how safe it is to use these kernels, and whether it breaks the userland.
The older BTRFS utilities won't support the new features, but they should work OK for the previous functionality.
From the policy view, it somehow defeats the purpose of choosing an Enterprise Linux (with well-tested software in their own "kernel/userland universe") and then throwing out crucial parts of it.
The maintainers of the enterprise distribution should have been back-porting the BTRFS kernel code. But if you wanted to run the RHEL kernel it would probably have been a better idea to not use BTRFS. While I don't agree with people giving "always use the latest kernel for BTRFS" advice, I think that the minimum kernel version should be something a lot newer than that. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Thu, 30 Jul 2015 11:47:34 AM Peter Ross wrote:
From the policy view, it somehow defeats the purpose of choosing an Enterprise Linux (with well-tested software in their own "kernel/userland universe") and then throwing out crucial parts of it.
In my bitter experience RHEL has a nasty habit of breaking things on their "minor" point releases. I've seen rsync over ssh broken (no idea how they missed that in testing), at least 5 different Mellanox Infiniband & 10gigE kernel bugs introduced (we had to use RHEL 6.2 kernels on one of our systems until RHEL 6.6 finaly cleared things up and the RHEL5 bug that delivered incoming packets to the wrong interface took about a year to fix by which stage we'd reinstalled with RHEL6) and I suspect there are a few others I've forgotten... The best thing I can say about RHEL is that at least it's not SLES. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Tue, 28 Jul 2015 09:54:00 AM Peter Ross wrote:
Well, I am a sysadmin and simply interested in technology that works.
Yes, Chris, CentOS7 is based on 3.10, a 2 years old kernel, but CentOS7 is a distribution I use now.
Don't be surprised when a kernel from back when btrfs was still marked as experimental sets your machine on fire; the experimental warning was only removed in 3.13. The Kconfig for 3.10 said: help Btrfs is a new filesystem with extents, writable snapshotting, support for multiple devices and many more features. Btrfs is highly experimental, and THE DISK FORMAT IS NOT YET FINALIZED. You should say N here unless you are interested in testing Btrfs with non-critical data. To compile this file system support as a module, choose M here. The module will be called btrfs. If unsure, say N. I think the "enterprise" distros have done their users a huge disservice by shipping it enabled before it was ready.. :-( All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Thu, 16 Jul 2015 06:08:26 PM Andrew McGlashan wrote:
ZFS on BSD for sure, but btrfs on Linux ... probably better and will be much better yet. The big risk, as far as I am concerned, is that it relies too much on specific kernel versions and of course it relies upon Linux too. For a number of reasons, I would be happier if this was also available in the BSD world.
BTRFS doesn't inherently rely on kernel versions (as opposed to things that require binary kernel modules). It is however a project that's in active development with new features being added. If you need a new feature (such as RAID-5/6) then you need a kernel new enough to support it. If you make a filesystem with support for recent features then it can't be mounted on an older kernel (EG a filesystem made with mkfs.btrfs from Jessie can't be mounted with the Wheezy kernel). BTRFS isn't alone in this regard, we had the same issues when Ext3 and Ext4 were released. The difference is that it was always possible to make a filesystem in the older format. I can run mkfs.ext3 on a Debian/Jessie system and mount the filesystem on fairly old kernels without problem. I presume that BTRFS will get similar backwards compatability at some future time. But at the moment the focus is on features. ZFS on Linux has DKMS modules which makes upgrading kernels painful and risky. But it does have better forwards and backwards compatability in other ways. On Thu, 16 Jul 2015 08:28:40 PM Brian May wrote:
I have always heard people say ZFS is better, Oracle should support ZFS not btrfs, etc. Don't know what to think myself.
ZFS is a more mature product, but it's just a matter of time before BTRFS matches that. BTRFS supports more flexibility in devices, EG "RAID-1" arrays of varying numbers of devices of varying sizes. But ZFS has more features for high-end use such as L2ARC and ZIL. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Sat, 18 Jul 2015 at 16:15 Russell Coker <russell@coker.com.au> wrote:
BTRFS doesn't inherently rely on kernel versions (as opposed to things that require binary kernel modules). It is however a project that's in active development with new features being added.
According to https://www.youtube.com/watch?v=6DplcPrQjvA it is recommended that you use a recent kernel version due to bugs fixed.
From slide: "... use recent kernels if you can, but consider staying a kernel or two behind for stability."

Brian May <brian@microcomaustralia.com.au> wrote:
From slide: "... use recent kernels if you can, but consider staying a kernel or two behind for stability."
I hope BTRFS is starting to get the kind of deployment that will enable reliability bugs to be found and fixed. XFS developers have always been very good at running regression tests. If I remember rightly, parts of their test suite were generic and there were plans to apply it to other file systems - exactly what is needed.

On 23 Jul 2015, at 8:56 am, Jason White <jason@jasonjgw.net> wrote:
Brian May <brian@microcomaustralia.com.au> wrote:
From slide: "... use recent kernels if you can, but consider staying a kernel or two behind for stability."
I hope BTRFS is starting to get the kind of deployment that will enable reliability bugs to be found and fixed. XFS developers have always been very good at running regression tests.
FYI, Oracle runs the full XFS test suite on btrfs on all releases of the Oracle UEK and some mainline releases. We've also submitted fixes upstream to the XFS suite to address specific btrfs tests. </Oracle hat> Cheers, Avi

On Thu, 23 Jul 2015 08:56:45 AM Jason White wrote:
Brian May <brian@microcomaustralia.com.au> wrote:
From slide: "... use recent kernels if you can, but consider staying a kernel or two behind for stability."
I hope BTRFS is starting to get the kind of deployment that will enable reliability bugs to be found and fixed. XFS developers have always been very good at running regression tests. If I remember rightly, parts of their test suite were generic and there were plans to apply it to other file systems - exactly what is needed.
The XFS test suite is being extended to cover bugs in BTRFS too. I think it's more of a generic filesystem test suite than an XFS specific thing nowadays, there was even talk of renaming it. This idea that BTRFS users should upgrade to the latest kernels is a bit silly. If you have a specific bug that is known to be fixed then upgrading it is the right thing to do. But don't expect that newer kernels will always be better. New kernels have new bugs and also expose new bugs by fixing old bugs, in particular performance improvements often expose bugs in older code. I have Debian/Wheezy systems that are running BTRFS without problems and I don't think there's any need for an upgrade. While Debian/Jessie is running a kernel that isn't going to have upstream support for backporting fixes to BTRFS it's also running quite well (surprisingly well really) and again I don't feel a great need to upgrade. My laptop will get kernel 4.0.0 next time I reboot it. This isn't because I expect an benefit from a new kernel (I don't) but because as a DD it's part of my job to try the new versions. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Thu, 23 Jul 2015 at 12:20 Russell Coker <russell@coker.com.au> wrote:
I have Debian/Wheezy systems that are running BTRFS without problems and I don't think there's any need for an upgrade.
Like I mentioned before, in Marc Merlin talk at LCA2015, is specifically recommends against using kernels 3.15 to 3.16.1 - Debian Jessie has 3.16.0... I personally would be nervous about going against Marc's recommendations here. Unless of course the Debian kernel has fixes back ported.

Like I mentioned before, in Marc Merlin talk at LCA2015, is specifically recommends against using kernels 3.15 to 3.16.1 - Debian Jessie has 3.16.0...
I personally would be nervous about going against Marc's recommendations here.
Unless of course the Debian kernel has fixes back ported.
I enabled quota's on the default Jessie kernel and it all blew up, so at least that is broken. Disabling quotas was sufficient to put it right again. Running low on space (>10%) also really hurt performance badly (my limited experience with zfs is that it hangs badly when free space gets under about 20%). I upgraded to 4.x at about the same time as I resolved the disk space problem so I can't say if 4.x improves the low-space performance issues though. The problem I had is that I had mythtv storage in a subvolume, and the only control you have over mythtv is that you can tell it to leave a certain amount of GB free, and the maximum that can be is 200GB, so obviously that's a problem on all but the smallest installations. I ended up creating a 1TB file (with nocow enabled) and used ext4 on loopback as my mythtv store. Performance is probably badly impacted but I don't notice it. James

On Thu, 23 Jul 2015 04:28:44 PM Brian May wrote:
I personally would be nervous about going against Marc's recommendations here.
I don't believe that Marc knows more about BTRFS than I do. Marc also tends to compile his own kernels and I don't recall him mentioning any tests of Debian kernels. I think that my opinion of Debian kernels is more relevant than Marc's opinion, I've used them a lot and I don't know if Marc has ever used them. On Thu, 23 Jul 2015 08:32:45 PM James Harper wrote:
I enabled quota's on the default Jessie kernel and it all blew up, so at least that is broken. Disabling quotas was sufficient to put it right again.
Quotas is something you shouldn't expect to be reliable any time soon. Most people don't need it and many of the people who do need it have been scared off testing it because of past issues. For a long time the developers paid little attention to it due to the large number of more serious issues, even now I don't think it gets that much attention.
Running low on space (>10%) also really hurt performance badly (my limited experience with zfs is that it hangs badly when free space gets under about 20%). I upgraded to 4.x at about the same time as I resolved the disk space problem so I can't say if 4.x improves the low-space performance issues though.
I haven't noticed such problems on either filesystem. I think that it depends on what your usage is, maybe my usage happens to miss the corner cases where lack of space causes performance problems.
The problem I had is that I had mythtv storage in a subvolume, and the only control you have over mythtv is that you can tell it to leave a certain amount of GB free, and the maximum that can be is 200GB, so obviously that's a problem on all but the smallest installations. I ended up creating a 1TB file (with nocow enabled) and used ext4 on loopback as my mythtv store. Performance is probably badly impacted but I don't notice it.
If you had a 5TB RAID-1 array (the smallest I would consider buying for home use nowadays) then 200G would be 4%. While there are plenty of "rules of thumb" about how much space should be free on a filesystem I really doubt that they continue to that size. On a 10G BTRFS filesystem if you had 10% free space that would be a single 1G data chunk free while on a 1TB filesystem that would be 100 data chunks free, I don't think that considering those cases to be the same makes sense. I have had serious metadata performance issues with BTRFS on my 4TB RAID-1 array, such as a "ls -l" taking many seconds to complete. For that array I can just wait for those cases, for that system all the data which needs good performance is stored on a SSD. If I wanted good performance on a BTRFS array I would make the filesystem as a RAID-1 array of SSDs. Then I would create a huge number of small files to allocate many gigs of metadata space. A 4TB array can have 120G of metadata so I might use 150G of metadata space. Then I'd add 2 big disks to the array which would get used for data chunks and delete all the small files. Then as long as I never did a balance all the metadata chunks would stay on the SSD and the big disks would get used for data. I expect that performance would be great for such an array. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Running low on space (>10%) also really hurt performance badly (my limited experience with zfs is that it hangs badly when free space gets under about 20%). I upgraded to 4.x at about the same time as I resolved the disk space problem so I can't say if 4.x improves the low-space performance issues though.
I haven't noticed such problems on either filesystem. I think that it depends on what your usage is, maybe my usage happens to miss the corner cases where lack of space causes performance problems.
Perhaps. Definitely noticed it on a 26GB ZFS system. As free space got under about 20% there started to be odd problems that I couldn't put my finger on. As free space crept towards 10% it got to the point where deleting a snapshot took a whole day or just would never complete. Deleting files to bring the free space back towards 80% made all the problems go away. That was on a fairly old FreeBSD install though (8.4 maybe).
The problem I had is that I had mythtv storage in a subvolume, and the only control you have over mythtv is that you can tell it to leave a certain amount of GB free, and the maximum that can be is 200GB, so obviously that's a problem on all but the smallest installations. I ended up creating a 1TB file (with nocow enabled) and used ext4 on loopback as my mythtv store. Performance is probably badly impacted but I don't notice it.
If you had a 5TB RAID-1 array (the smallest I would consider buying for home use nowadays) then 200G would be 4%. While there are plenty of "rules of thumb" about how much space should be free on a filesystem I really doubt that they continue to that size. On a 10G BTRFS filesystem if you had 10% free space that would be a single 1G data chunk free while on a 1TB filesystem that would be 100 data chunks free, I don't think that considering those cases to be the same makes sense.
Yes this is true. My two experiences are on my router with a single 60GB SSD (with probably 40GB actually allocated to the filesystem), and my server (2 x 1.5TB, 2 x 2TB). I don't remember what the poor performance threshold was on the former, but on the latter, 200GB of free space caused serious problems. NFS clients would see very frequent kernel messages about delays. As soon as I got back up to about 1TB free all the problems went away, although no measurements were really taken between 200GB and 1TB of free space.
I have had serious metadata performance issues with BTRFS on my 4TB RAID- 1 array, such as a "ls -l" taking many seconds to complete. For that array I can just wait for those cases, for that system all the data which needs good performance is stored on a SSD.
If I wanted good performance on a BTRFS array I would make the filesystem as a RAID-1 array of SSDs. Then I would create a huge number of small files to allocate many gigs of metadata space. A 4TB array can have 120G of metadata so I might use 150G of metadata space. Then I'd add 2 big disks to the array which would get used for data chunks and delete all the small files. Then as long as I never did a balance all the metadata chunks would stay on the SSD and the big disks would get used for data. I expect that performance would be great for such an array.
That sounds unreasonably fragile. Especially if you are unable to ever do a balance. My first testing of btrfs was on top of bcache, and performance was awesome. I went back to entirely rotating media for production though as I only had a single SSD at my disposal, didn't really need the extreme performance, and had other things to spend money on. Also at the time there were reports of incompatibilities between btrfs and bcache. I expect bcache would out-perform the hot-relocation project, for most workloads. For my server which does lots of streaming writes (mythtv) and lots of random io (other stuff), it would balance things nicely. This guy claims success with bcache + btrfs http://www.spinics.net/lists/linux-btrfs/msg42125.html and raises some interesting points (interesting to me, at least). Btw, when you say 5TB RAID1, what exactly do you mean? Is the 5TB referring to the raw disks or the usable redundant space? I'm never quite sure. James

On Fri, 24 Jul 2015 10:12:04 PM James Harper wrote:
I have had serious metadata performance issues with BTRFS on my 4TB RAID- 1 array, such as a "ls -l" taking many seconds to complete. For that array I can just wait for those cases, for that system all the data which needs good performance is stored on a SSD.
If I wanted good performance on a BTRFS array I would make the filesystem as a RAID-1 array of SSDs. Then I would create a huge number of small files to allocate many gigs of metadata space. A 4TB array can have 120G of metadata so I might use 150G of metadata space. Then I'd add 2 big disks to the array which would get used for data chunks and delete all the small files. Then as long as I never did a balance all the metadata chunks would stay on the SSD and the big disks would get used for data. I expect that performance would be great for such an array.
That sounds unreasonably fragile. Especially if you are unable to ever do a balance.
You should only ever need to do a balance if you have too much space allocated to one of data/metadata and need to free some for the other or when you are doing things like changing RAID levels. In normal use you shouldn't need to do it. The fact that it is sometimes needed in normal use is due to deficiencies in BTRFS that might have been fixed now.
My first testing of btrfs was on top of bcache, and performance was awesome. I went back to entirely rotating media for production though as I only had a single SSD at my disposal, didn't really need the extreme performance, and had other things to spend money on. Also at the time there were reports of incompatibilities between btrfs and bcache. I expect bcache would out-perform the hot-relocation project, for most workloads. For my server which does lots of streaming writes (mythtv) and lots of random io (other stuff), it would balance things nicely.
The concept of Bcache sounds good, but the bug reports are concerning.
This guy claims success with bcache + btrfs http://www.spinics.net/lists/linux-btrfs/msg42125.html and raises some interesting points (interesting to me, at least).
Very impressive.
Btw, when you say 5TB RAID1, what exactly do you mean? Is the 5TB referring to the raw disks or the usable redundant space? I'm never quite sure.
5TB disks are quite affordable nowadays. 6TB is still a little expensive. So a RAID-1 array of 5TB disks is a good option. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

That sounds unreasonably fragile. Especially if you are unable to ever do a balance.
You should only ever need to do a balance if you have too much space allocated to one of data/metadata and need to free some for the other or when you are doing things like changing RAID levels. In normal use you shouldn't need to do it. The fact that it is sometimes needed in normal use is due to deficiencies in BTRFS that might have been fixed now.
I thought you needed to run it after adding an additional disk, so that you wouldn't get into a situation where the new disk had space, but all the other available disks were full so there was no second disk to write the copy to?
Btw, when you say 5TB RAID1, what exactly do you mean? Is the 5TB referring to the raw disks or the usable redundant space? I'm never quite sure.
5TB disks are quite affordable nowadays. 6TB is still a little expensive. So a RAID-1 array of 5TB disks is a good option.
I guess it depends on your performance vs capacity vs power requirements. For performance, you want as many spindles as possible (and preferably 2.5", all other things being equal), for power greenness, you want as few as possible (and again, preferably 2.5"), and for capacity you want a number of disks where $/GB is the best to give you the capacity you want in the number of drive bays you have. Maybe the spindle count isn't as important if you use bcache though, assuming typical access patterns. 3TB still appears to be the best $/GB right now, but obviously that's not the only factor. But if I didn't have a use for 5TB in the next few years, I'd be going with the cheaper disks and buying more of them to fill all my bays (With BTRFS, I just get whatever disks I can get my hands on when I need them, or when surplus disks come my way, and cram them into my server. BTRFS knows what to do with them :) Has anyone ever put the numbers together for the optimal buying strategy? Eg I need x storage now, and my projected storage growth is yTB/year, should I buy the bigger (more expensive) disks now that will last me longer, or smaller (cheaper) disks now and bigger disks later as I need them and when they are cheaper? James

On Sun, 26 Jul 2015 10:24:54 PM James Harper wrote:
That sounds unreasonably fragile. Especially if you are unable to ever do a balance.
You should only ever need to do a balance if you have too much space allocated to one of data/metadata and need to free some for the other or when you are doing things like changing RAID levels. In normal use you shouldn't need to do it. The fact that it is sometimes needed in normal use is due to deficiencies in BTRFS that might have been fixed now.
I thought you needed to run it after adding an additional disk, so that you wouldn't get into a situation where the new disk had space, but all the other available disks were full so there was no second disk to write the copy to?
If you had a RAID-1 array where the size of the new disk is less than the sum of the unallocated chunks on both the old disks then no balance should be needed. Also in the case of an array that had been constructed with 2*SSD for metadata and 2*HDD for data if you added a new HDD you would only want data on it so you could do a balance of only data. Also you can specify which devid to balance - see btrfs-balance(8) for details.
5TB disks are quite affordable nowadays. 6TB is still a little expensive. So a RAID-1 array of 5TB disks is a good option.
I guess it depends on your performance vs capacity vs power requirements. For performance, you want as many spindles as possible (and preferably 2.5", all other things being equal), for power greenness, you want as few as possible (and again, preferably 2.5"), and for capacity you want a number of disks where $/GB is the best to give you the capacity you want in the number of drive bays you have.
Maybe the spindle count isn't as important if you use bcache though, assuming typical access patterns.
For both performance and energy efficiency you want SSD for as many operations as possible.
3TB still appears to be the best $/GB right now, but obviously that's not the only factor. But if I didn't have a use for 5TB in the next few years, I'd be going with the cheaper disks and buying more of them to fill all my bays (With BTRFS, I just get whatever disks I can get my hands on when I need them, or when surplus disks come my way, and cram them into my server. BTRFS knows what to do with them :)
3TB is better value for money if 3TB is enough space to last you a while. Adding more disks means more noise, power use, and sysadmin work. Also most affordable systems can't handle more than 4 disks so if you use 3TB disks with RAID-1 (I wouldn't trust RAID-5/6 on BTRFS any time soon) then you are limited to 6TB of RAID storage. 5TB disks are a little less value for money but will probably last longer with less hassle.
Has anyone ever put the numbers together for the optimal buying strategy? Eg I need x storage now, and my projected storage growth is yTB/year, should I buy the bigger (more expensive) disks now that will last me longer, or smaller (cheaper) disks now and bigger disks later as I need them and when they are cheaper?
My observation is that there are 2 broad categories of systems. One is systems that don't need that much storage, for example my laptop mostly has work stuff and email so I use a fraction of it's 320G disk and my desktop has 120G of SSD because the big files it uses are NFS mounted from the server. For such systems you just buy what you need as the needs don't change fast. The other is systems that need serious amounts of storage which is mostly servers but for some people it would be laptops and desktops. For those systems it's a PITA to take them apart to add new disks and copying data takes a lot of time. So if you assign any reasonable dollar value to your time then it's better to just buy disks that are close to the largest available to avoid wasting time on upgrades. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Fri, 24 Jul 2015 at 21:36 Russell Coker <russell@coker.com.au> wrote:
Quotas is something you shouldn't expect to be reliable any time soon.
That isn't what Chris Mason says: https://youtu.be/W3QRWUfBua8?t=1586 Do you claim to know more then Chris Mason?

On Sun, 26 Jul 2015 10:39:32 AM Brian May wrote:
On Fri, 24 Jul 2015 at 21:36 Russell Coker <russell@coker.com.au> wrote:
Quotas is something you shouldn't expect to be reliable any time soon.
That isn't what Chris Mason says:
https://youtu.be/W3QRWUfBua8?t=1586
Do you claim to know more then Chris Mason?
Why don't you test out quotas on some of your systems with real data and tell us how it goes? I've observed the development of BTRFS core features and had downtime of my own systems along the way. With the amount of time required for BTRFS to be stable for the most basic operations that everyone uses I don't expect the less used features to suddenly become reliable. http://etbe.coker.com.au/tag/btrfs/ The above URL has a fairly complete summary of my experience with BTRFS, including data loss, kernel panics, and filesystem corruption dating back to 2012. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Sun, 26 Jul 2015 12:39:32 AM Brian May wrote:
On Fri, 24 Jul 2015 at 21:36 Russell Coker <russell@coker.com.au> wrote:
Quotas is something you shouldn't expect to be reliable any time soon.
That isn't what Chris Mason says:
They are still problematic for some people. For instance there is an unfortunate interaction between quotas and snapshots, there was someone having issues on btrfs list recently where the btrfs-cleaner kernel thread would consume a whole core whilst quotas were enabled: http://www.spinics.net/lists/linux-btrfs/msg45652.html # I can confirm that getting rid of the quotas fixed the issue for me. # Just disabling quotas wasn't enough, I had to enable, delete all # qgroups, reboot because disable was hung on one of the filesystems, # then disable quotas. Now when btrfs-cleaner runs it doesn't # completely consume a core, I can see corresponding disk i/o, and the # process goes away after a reasonable amount of time. To give some context this person had 92 subvolumes, including snapshots. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Mon, 27 Jul 2015 09:07:21 PM Chris Samuel wrote:
To give some context this person had 92 subvolumes, including snapshots.
Which isn't a lot really, I just checked a few of my systems which range from 67 to 235 subvols. 235 isn't anywhere near the most I've had. I have cron jobs snapshotting subvols for backups, at times when the subvol removal scripts didn't work properly I've got thousands. Don't use thousands of snapshots as it causes performance problems. But it is fairly common for people to accidentally or deliberately get thousands of snapshots from cron jobs. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Fri, Jul 24, 2015 at 09:35:51PM +1000, Russell Coker wrote:
Running low on space (>10%) also really hurt performance badly (my limited experience with zfs is that it hangs badly when free space gets under about 20%) [...]
I haven't noticed such problems on either filesystem. I think that it depends on what your usage is, maybe my usage happens to miss the corner cases where lack of space causes performance problems.
i've used zfs extensively since at least 2011 and i've run into this several times. When the pool gets over about 80% full, performance turns to shit...really, awfully, abysmally bad for both reads and writes. in fact, i ran into it again this week with my backup pool here at home. i'd foolishly allowed it to get to 87% full and performance was abysmal, on the order of kilobytes/second rather than 200+MB/s. i had to replace the 4x1TB drives (in RAIDZ1 configuration) in my backup pool with 4x4TB (configured as two mirrored pairs). i've found that once the pool gets over 80%, it gets slower the more it's used, even fairly light usage over a few hours will make it slow to a crawl...and you can forget about rsync to or from a fs > 80% full (oddly, a zfs send to another pool runs reasonably fast), and daily cron jobs like updating the mlocate db tend to just hang before completion. the only solution is to add more vdevs to the pool, replace all the disks in one or more vdevs with larger disks, or create a new pool with larger/more disks and replicate to it with 'zfs send -R' btrfs probably handles this slightly better because it's easier to add a single disk or two to increase capacity...and, of course, you can rebalance to redistribute your data evenly over all disks in the pool. btw, 'zfs send' is so aweseome that i'm seriously considering converting all my systems here to having root on zfs - i backup some filesystems with zfs send and some with rsync, and backing up a system with zfs send is many orders of magnitude faster than rsync because zfs knows *exactly* which blocks have changed between any two snapshots without having to stat or compare any files on source or destination. same goes for btrfs too (in fact, from what i've read btrfs has more flexible snapshot handling for send/receive) but since my backup pool is zfs, btrfs isn't really an option for me. i could convert to btrfs but i can't afford the extra disks or the time or hassle it would take - rebalancing and resizing are tempting features but not tempting enough. anyway: IMO, snapshots, error detection & correction, and send/receive capabilty are more than enough reasons to use either btrfs or zfs. if you're not already using one of them, you should seriously consider switching.
The problem I had is that I had mythtv storage in a subvolume, and the only
actually, i was wrong earlier when i said there were only three options to fixing a fs over 80% full. You can also delete files to get it back well below 80% again. i had to do this on my mythtv zpool because it got to about 85% full with recordings i either hadn't got around to watching or had little intention of ever watching again. i deleted enough to get it below 60% and performance went back to normal. i've since done a ruthless purge and got it down to 35% full. note: it will be painfully slow if you try to cp or rsync the files to somewhere else before deleting them. zfs send is still fast (and if you send to another pool on the same system you can set the mountpoint of the destination fs so that it mounts in the same place as the source fs used to be). the downside is that send only sends snapshots of entire filesystems, you can't pick and choose which files to move....so, useless for my myth situation but useful for a pool with multiple filesystems on it.
control you have over mythtv is that you can tell it to leave a certain amount of GB free, and the maximum that can be is 200GB, so obviously that's a problem on all but the smallest installations. I ended up creating a 1TB file (with nocow enabled) and used ext4 on loopback as my mythtv store. Performance is probably badly impacted but I don't notice it.
If you had a 5TB RAID-1 array (the smallest I would consider buying for home use nowadays) then 200G would be 4%.
and using all but 4% of the pool would mean using 96% of it. the free space issue on zfs is not a "rule of thumb" but a well known fact about it - getting over 80% full is really bad for performance, and just about every tuning or setup guide will mention it. i don't know btrfs as well but it wouldn't surprise me if the <10% free issue that was mentioned is also a hard rule (as in "don't do it") my understanding is that at >80%, zfs changes the algorithm it uses to allocate space to files from "best fit" to "wherever there's some space". this tends to cause massive fragmentation (even more than is common on COW filesystems)
If I wanted good performance on a BTRFS array I would make the filesystem as a RAID-1 array of SSDs. Then I would create a huge number of small files to allocate many gigs of metadata space. A 4TB array can have 120G of metadata so I might use 150G of metadata space. Then I'd add 2 big disks to the array which would get used for data chunks and delete all the small files. Then as long as I never did a balance all the metadata chunks would stay on the SSD and the big disks would get used for data. I expect that performance would be great for such an array.
with zfs, you'd just do: zfs set secondarycache=metadata <fs> that tells it to use the L2ARC (e.g. ssd or other fast block device cache) to only cache metadata. this can be set on a per-filesystem ("subvolume" in btrfs terms) basis. you can also set primarycache (i.e. ARC in RAM) to the same values - all, none, or metadata with the default being all (actually, inherit from parent with the ultimate parent's default being all) . zfs doesn't do rebalancing, or rseizing, (unfortunately - they're the key features that btrfs has that zfs doesn't) but if it did you wouldn't have to avoid using them so that a kludge like that keeps working. clever tricks can be cool but designed reliable features are better. craig -- craig sanders <cas@taz.net.au>

On Mon, 27 Jul 2015 09:46:41 AM Craig Sanders wrote:
btrfs probably handles this slightly better because it's easier to add a single disk or two to increase capacity...and, of course, you can rebalance to redistribute your data evenly over all disks in the pool.
The most significant thing about how BTRFS handles this better is that BTRFS allows removing disks and changing RAID configuration on the fly. Admittedly while RAID-5/6 hasn't been tested enough there is a limit to what can be done here, but in the future a live migration from RAID-1 to RAID-6 after adding a few more disks will be an option. ZFS only allows replacing disks and adding pools and vdevs. AFAIK it's impossible to create a ZFS array with 4*1TB disks and migrate it to 2*4TB without send/recv and downtime.
the free space issue on zfs is not a "rule of thumb" but a well known fact about it - getting over 80% full is really bad for performance, and just about every tuning or setup guide will mention it. i don't know btrfs as well but it wouldn't surprise me if the <10% free issue that was mentioned is also a hard rule (as in "don't do it")
In BTRFS it used to be that the filesystem could deadlock if you ran out of space. But I think that adding GlobalReserve has solved that. I don't recall seeing performance problems with BTRFS filesystems that were near full, but the history of BTRFS has led me to take more effort to avoid them getting full than I did with Ext* filesystems.
If I wanted good performance on a BTRFS array I would make the filesystem as a RAID-1 array of SSDs. Then I would create a huge number of small files to allocate many gigs of metadata space. A 4TB array can have 120G of metadata so I might use 150G of metadata space. Then I'd add 2 big disks to the array which would get used for data chunks and delete all the small files. Then as long as I never did a balance all the metadata chunks would stay on the SSD and the big disks would get used for data. I expect that performance would be great for such an array.
with zfs, you'd just do:
zfs set secondarycache=metadata <fs>
that tells it to use the L2ARC (e.g. ssd or other fast block device cache) to only cache metadata. this can be set on a per-filesystem ("subvolume" in btrfs terms) basis.
you can also set primarycache (i.e. ARC in RAM) to the same values - all, none, or metadata with the default being all (actually, inherit from parent with the ultimate parent's default being all) .
zfs doesn't do rebalancing, or rseizing, (unfortunately - they're the key features that btrfs has that zfs doesn't) but if it did you wouldn't have to avoid using them so that a kludge like that keeps working. clever tricks can be cool but designed reliable features are better.
Yes. BTRFS is lacking in that regard, but it's not surprising given that they haven't even written code to optimise reading for both disks in a RAID-1 last time I checked. There hasn't been much work done to give BTRFS good performance. Unfortunately the BTRFS developers seem to have little interest in features like the copies= feature of ZFS. I hope they will change their minds about that when BTRFS has the more common reliability issues solved. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (12)
-
Andrew McGlashan
-
Avi Miller
-
Brian May
-
Chris Samuel
-
Colin Fee
-
Craig Sanders
-
James Harper
-
Jason White
-
Marcus Furlong
-
Peter Ross
-
Rick Moen
-
Russell Coker