
I want to create a filesystem to store my on-disk backups (from Bacula) on a new server. These backup files will be few (less than 10000) and mostly huge (>1GB). Because I will have multiple files being written out at once, a large data per inode ratio seems to make sense as it will greatly reduce fragmentation, and wasted space would be low because of the small number of files. Also because the write pattern is exclusively streaming writes, I can go against my normal rule and use RAID5. I've chosen a 4MB of data per inode ratio based on some rough calculations, but while my mkfs.ext3 <dev> -i 4194304 just raced through initially, when it got to "Writing superblocks and filesystem accounting information:" it just seemed to hang. Strace says it's doing seek, write 4k, seek, write 4k, over and over again. I hit ^C and the process is now [mkfs.ext3], but the system is still pegged at 100% disk utilisation. Any suggestions as to how I could make this go faster? The filesystem is around 8TB (RAID5 of 4 x 3TB disks), so it's not exactly small, and the disks are only 7200RPM SATA, but I know xfs would complete pretty quick. I'd use xfs but over the years I've used xfs and ext3 in roughly equal proportions, and I've lost 3 xfs filesystems and no ext3 filesystems, so I'm a little reluctant to commit to it. Thanks James

On 27 September 2013 22:13, James Harper <james.harper@bendigoit.com.au> wrote:
I've chosen a 4MB of data per inode ratio based on some rough calculations, but while my mkfs.ext3 <dev> -i 4194304 just raced through initially, when it got to "Writing superblocks and filesystem accounting information:" it just seemed to hang. Strace says it's doing seek, write 4k, seek, write 4k, over and over again. I hit ^C and the process is now [mkfs.ext3], but the system is still pegged at 100% disk utilisation.
( sorry James for the double reply ) Hi, I am no expert but in case it helps I have used mkfs.ext3 -T largefile4 in the past without problems on a single 1TB drive and it was a little faster, not slower, than other mkfs. According to /etc/mke2fs.conf that is equivalent to inode_ratio = 4194304 blocksize = -1 so if I was seeing your symptoms I wouldn't be blaming that. Maybe test it on an individual drive/s first?

On Fri, 27 Sep 2013, James Harper <james.harper@bendigoit.com.au> wrote:
I want to create a filesystem to store my on-disk backups (from Bacula) on a new server. These backup files will be few (less than 10000) and mostly huge (>1GB). Because I will have multiple files being written out at once, a large data per inode ratio seems to make sense as it will greatly reduce fragmentation, and wasted space would be low because of the small number of files. Also because the write pattern is exclusively streaming writes, I can go against my normal rule and use RAID5.
I've chosen a 4MB of data per inode ratio based on some rough calculations, but while my mkfs.ext3 <dev> -i 4194304 just raced through initially, when it got to "Writing superblocks and filesystem accounting information:" it just seemed to hang. Strace says it's doing seek, write 4k, seek, write 4k, over and over again. I hit ^C and the process is now [mkfs.ext3], but the system is still pegged at 100% disk utilisation.
One of the features of Ext4 is uninit_bg which if enabled at mkfs time should reduce creation time. I don't have a convenient large storage device to test this with.
Any suggestions as to how I could make this go faster? The filesystem is around 8TB (RAID5 of 4 x 3TB disks), so it's not exactly small, and the disks are only 7200RPM SATA, but I know xfs would complete pretty quick. I'd use xfs but over the years I've used xfs and ext3 in roughly equal proportions, and I've lost 3 xfs filesystems and no ext3 filesystems, so I'm a little reluctant to commit to it.
I wouldn't use Ext3 for a device that big. Why can't you use ZFS? I'm using BTRFS for all my backups nowadays so any data corruption will be flagged. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

I'm using BTRFS for all my backups nowadays so any data corruption will be flagged.
And I'm planning on getting familiar with BTRFS at some point.
Actually, revisiting the feature list of BTRFS the 'some point' might be now. Compression seems particularly attractive as that would take some load off of the backup client machines I'm running Debian Wheezy, so I assume I want the 3.10 kernel from backports. Do you have any pointers to a good getting started guide? Thanks James

On Sun, 29 Sep 2013, James Harper <james.harper@bendigoit.com.au> wrote:
Actually, revisiting the feature list of BTRFS the 'some point' might be now. Compression seems particularly attractive as that would take some load off of the backup client machines
I'm running Debian Wheezy, so I assume I want the 3.10 kernel from backports.
I don't think that you can ever expect filesystem compression to compete well with application level compression. Filesystem compression is based around the expectation of random access while much application compression is based around entirely compressing large files due to the application/user knowing that random access isn't required. But if you want random access and compression then filesystem compression apparently works well. I'm not even sure if I have compression enabled on my systems, the data I'm storing on BTRFS isn't going to compress well anyway. For Debian you want a newer kernel than is available in Wheezy. The Wheezy kernel does work and if you avoid getting the filesystem anywhere near full then it will work well. But later kernels have many bug fixes. Don't use BTRFS RAID-5, that's no-where near usable. If you want data integrity and RAID-5 then there is no option other than ZFS at this time. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Sun, 29 Sep 2013, James Harper <james.harper@bendigoit.com.au> wrote:
Actually, revisiting the feature list of BTRFS the 'some point' might be now. Compression seems particularly attractive as that would take some load off of the backup client machines
I'm running Debian Wheezy, so I assume I want the 3.10 kernel from backports.
I don't think that you can ever expect filesystem compression to compete well with application level compression. Filesystem compression is based around the expectation of random access while much application compression is based around entirely compressing large files due to the application/user knowing that random access isn't required. But if you want random access and compression then filesystem compression apparently works well.
The reason I want to do it on the filesystem is that bacula's compression sucks the performance out of the machine being backed up. It has LZO now which is better, but in the little testing I've done, btrfs is faster still. If the end result is a compression ratio that really sucks then I'll revisit it.
I'm not even sure if I have compression enabled on my systems, the data I'm storing on BTRFS isn't going to compress well anyway.
For Debian you want a newer kernel than is available in Wheezy. The Wheezy kernel does work and if you avoid getting the filesystem anywhere near full then it will work well. But later kernels have many bug fixes.
wheezy-backports has 3.10
Don't use BTRFS RAID-5, that's no-where near usable. If you want data integrity and RAID-5 then there is no option other than ZFS at this time.
I'm testing RAID5 now but I can't find anything wrt stability beyond the original post announcing it in 3.8. I'm hoping it is now considered more stable but haven't found anything to back that up yet, so I may be switching to RAID10, then upgrading to RAID5 when it is considered stable. I'll be pulling the power and pulling a disk out on the machine soon anyway to see what happens. Thanks James

On Sat, Sep 28, 2013 at 06:15:02PM +1000, Russell Coker wrote:
I wouldn't use Ext3 for a device that big. Why can't you use ZFS?
yep, for disks and array that size, error-detection and correction are essential. which means either zfs or btrfs.
I'm using BTRFS for all my backups nowadays so any data corruption will be flagged.
how's the raid5 support in btrfs now? last i read (back in june, iirc) it was still very experimental. one of the problems i read about was that you couldn't replace a failed drive in a raid5 btrfs. btrfs raid-0/1/10 modes work fine - and btrfs does offer a way of auto-converting an array from raid1 to raid5/6, so starting with raid10 and converting to raid5 when its "ready" may be a safer choice. or zfs, with the -dkms packages available for most distros, installing zfs and setting it up is easy. craig -- craig sanders <cas@taz.net.au>

Craig Sanders <cas@taz.net.au> writes:
On Sat, Sep 28, 2013 at 06:15:02PM +1000, Russell Coker wrote:
I wouldn't use Ext3 for a device that big. Why can't you use ZFS? or zfs, with the -dkms packages available for most distros, installing zfs and setting it up is easy.
Some fedora-using guy on #btrfs was running btrfs as a DKMS, with his stock stable fedora kernel. Dunno if that was a hand-rolled DKMS; I haven't seen standard btrfs-dkms packages in Debian experimental.

Craig Sanders <cas@taz.net.au> writes:
On Sat, Sep 28, 2013 at 06:15:02PM +1000, Russell Coker wrote:
I wouldn't use Ext3 for a device that big. Why can't you use ZFS? or zfs, with the -dkms packages available for most distros, installing zfs and setting it up is easy.
Some fedora-using guy on #btrfs was running btrfs as a DKMS, with his stock stable fedora kernel. Dunno if that was a hand-rolled DKMS; I haven't seen standard btrfs-dkms packages in Debian experimental.
I've done the same thing before when backporting other things (normally xen related). It normally requires very little effort, but assumes that the code is entirely self-contained and doesn't require patches to other areas of the kernel. Given the size and complexity of btrfs I suspect this might be a bit trickier and probably impossible to go port back too far from the kernel of origin. Fortunately wheezy-backports has a 3.10 kernel which should be new enough. James

On Tue, Oct 01, 2013 at 12:25:37PM +1000, Trent W. Buck wrote:
Craig Sanders <cas@taz.net.au> writes:
On Sat, Sep 28, 2013 at 06:15:02PM +1000, Russell Coker wrote:
I wouldn't use Ext3 for a device that big. Why can't you use ZFS? or zfs, with the -dkms packages available for most distros, installing zfs and setting it up is easy.
Some fedora-using guy on #btrfs was running btrfs as a DKMS, with his stock stable fedora kernel. Dunno if that was a hand-rolled DKMS; I haven't seen standard btrfs-dkms packages in Debian experimental.
huh? you don't need a dkms package for btrfs - it's built-in to the mainline kernel, and has been for quite a while. zfs, though, is not (and probably never will be due to license incompatibility between CDDL and GPL) part of the standard linux kernel so you do need to compile an extra module. installing a zfs-dkms package is the easiest (and completely automated) way to do that. Oracle as the copyright holder on the Sun-developed ZFS could solve the license problem by re-licensing it as GPL or BSD (preferably BSD so that FreeBSD and Illumos etc could use it too) but that's extremely unlikely to happen. They're the main copyright holder for both of the current modern/advanced filesystems for unix & linux - zfs and btrfs - and seem to have lost interest in both of them. btrfs is GPL and development is within the linux kernel (and fusion-io, where the btrfs author now works). zfs is CDDL and illumos etc have forked it. craig -- craig sanders <cas@taz.net.au> BOFH excuse #218: The UPS doesn't have a battery backup.

Hi, On 30/09/2013, at 9:37 PM, Craig Sanders <cas@taz.net.au> wrote:
They're the main copyright holder for both of the current modern/advanced filesystems for unix & linux - zfs and btrfs - and seem to have lost interest in both of them.
This is completely untrue. Both ZFS and btrfs remain at the top of our development priorities. Oracle employs several developers for both, including several of the top btrfs developers (who were hired by Chris Mason before he moved to fusion-IO). I can't speak for the ZFS development as that's not part of my sphere of responsibilities, but btrfs was the single largest piece of development done for our new Unbreakable Enterprise Kernel Release 2 (which is currently in beta). Just wanted to clear that up. Cheers, Avi

On 30/09/2013, at 9:43 PM, Avi Miller <avi.miller@gmail.com> wrote:
I can't speak for the ZFS development as that's not part of my sphere of responsibilities, but btrfs was the single largest piece of development done for our new Unbreakable Enterprise Kernel Release 2 (which is currently in beta).
Typo: should've been Release 3.

On Mon, Sep 30, 2013 at 09:43:53PM -0700, Avi Miller wrote:
On 30/09/2013, at 9:37 PM, Craig Sanders <cas@taz.net.au> wrote:
They're the main copyright holder for both of the current modern/advanced filesystems for unix & linux - zfs and btrfs - and seem to have lost interest in both of them.
This is completely untrue. Both ZFS and btrfs remain at the top of our development priorities.
then they're doing a damn good job of making sure that they have little control or even influence over the direction that future development takes - oracle's actions have pretty much forced illumos to fork zfs, and much of btrfs development seems to be taking place outside of oracle too. which is a shame, because funding and supporting the work on btrfs is one of the really good things that oracle has done. my suspicion is that oracle execs have no idea how to monetise either zfs or btrfs or use them as leverage to control linux, so don't see either as a priority and don't have a clue what to do with either of them. oracle geeks are probably different, but geeks don't make important decisions in corporates like oracle. suits do. craig -- craig sanders <cas@taz.net.au>

On 01/10/2013, at 6:10 AM, Craig Sanders <cas@taz.net.au> wrote:
my suspicion is that oracle execs have no idea how to monetise either zfs or btrfs or use them as leverage to control linux, so don't see either as a priority and don't have a clue what to do with either of them.
Or, perhaps Oracle execs don't wish to monetise either or use them to leverage Linux and would rather just pay for the work to be done without exerting what others see as excessive control?
oracle geeks are probably different, but geeks don't make important decisions in corporates like oracle. suits do.
Actually, the Senior Vice-President of Linux Engineering at Oracle is a massive geek. He hacks on the kernel in his spare time (under a pseudonym) and has code in almost every product he manages. He's also 2-down from Larry and reports directly to our Chief Corporate Architect. So, we don't have a suit in charge of Oracle Linux, we have a geek. Hence, we have no desire to monetise our contributions. Hell, we're the only enterprise Linux that gives away the ISO, the updates, the errata, the bug fixes and the feature development for free. Say what you will about Oracle as a whole (and I've heard most of it all), but the Oracle Linux/VM product teams and associated upstream mainline developers are not typical of the rest of the organisation at all. Cheers, Avi

On Tue, 1 Oct 2013 07:24:56 Avi Miller wrote:
oracle geeks are probably different, but geeks don't make important decisions in corporates like oracle. suits do.
Actually, the Senior Vice-President of Linux Engineering at Oracle is a massive geek. He hacks on the kernel in his spare time (under a pseudonym) and has code in almost every product he manages. He's also 2-down from Larry and reports directly to our Chief Corporate Architect. So, we don't have a suit in charge of Oracle Linux, we have a geek. Hence, we have no desire to monetise our contributions. Hell, we're the only enterprise Linux that gives away the ISO, the updates, the errata, the bug fixes and the feature development for free.
Say what you will about Oracle as a whole (and I've heard most of it all), but the Oracle Linux/VM product teams and associated upstream mainline developers are not typical of the rest of the organisation at all.
You should probably give a LUV talk about how these things work. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Craig Sanders <cas@taz.net.au> writes:
On Tue, Oct 01, 2013 at 12:25:37PM +1000, Trent W. Buck wrote:
Some fedora-using guy on #btrfs was running btrfs as a DKMS, with his stock stable fedora kernel. Dunno if that was a hand-rolled DKMS; I haven't seen standard btrfs-dkms packages in Debian experimental.
huh? you don't need a dkms package for btrfs - it's built-in to the mainline kernel, and has been for quite a while.
Except then if you have an old (stable) kernel, you also have an old btrfs.

On Wed, Oct 02, 2013 at 10:50:34AM +1000, Trent W. Buck wrote:
Craig Sanders <cas@taz.net.au> writes:
huh? you don't need a dkms package for btrfs - it's built-in to the mainline kernel, and has been for quite a while.
Except then if you have an old (stable) kernel, you also have an old btrfs.
that's a problem with an easy answer: "Don't Do That, Then!" really, just compile the latest kernel or use a backported kernel package for the distro of your choice. messing around with dkms just to avoid doing that strikes me as being somewhat insane (perhaps due to excessive exposure to overly bureaucratic change management processes - you're not allowed to change the kernel so you have to pretend you're not actually changing the kernel, you're just updating one filesystem driver in it and then pretending that nothing else could possibly be affected) craig -- craig sanders <cas@taz.net.au>

On Wed, Oct 02, 2013 at 10:50:34AM +1000, Trent W. Buck wrote:
Craig Sanders <cas@taz.net.au> writes:
huh? you don't need a dkms package for btrfs - it's built-in to the mainline kernel, and has been for quite a while.
Except then if you have an old (stable) kernel, you also have an old btrfs.
that's a problem with an easy answer: "Don't Do That, Then!"
really, just compile the latest kernel or use a backported kernel package for the distro of your choice.
messing around with dkms just to avoid doing that strikes me as being somewhat insane (perhaps due to excessive exposure to overly bureaucratic change management processes - you're not allowed to change the kernel so you have to pretend you're not actually changing the kernel, you're just updating one filesystem driver in it and then pretending that nothing else could possibly be affected)
Using a kernel that you know works perfectly well and backporting a few specific changes/fixes into it is an entirely reasonable thing to do. James

On Wed, Oct 02, 2013 at 02:42:54AM +0000, James Harper wrote:
Using a kernel that you know works perfectly well and backporting a few specific changes/fixes into it is an entirely reasonable thing to do.
replacing an entire file-system driver with one that's about 5 or 6 kernel versions newer is not "backporting a few specific changes/fixes". it's not a trivial change, and there's a good chance that fixes/changes in the new btrfs rely on other fixes and changes in the rest of the kernel. worse, you have a kernel+btrfs combo that's unique - which means difficult for anyone else to help you debug or even duplicate any problem, because nobody else has the exact same combination of code, so reporting bugs is almost useless. this is not just a btrfs issue, it's a general problem with backporting instead of upgrading. sometimes it's worth doing anyway - but as a last resort, when there are no other good or reasonable options. craig -- craig sanders <cas@taz.net.au>

Russell Coker <russell@coker.com.au> writes:
One of the features of Ext4 is uninit_bg which if enabled at mkfs time should reduce creation time.
This can be turned on for ext3: "man mke2fs.conf" & /etc/mke2fs.conf for default profiles, or mke2fs -O to override it for a single run. I did this at least once for a large filesystem either with ext3, or with ext4 before lazy init became the default on ext4. Worked for me.

On 27 September 2013 22:13, James Harper <james.harper@bendigoit.com.au> wrote:
I want to create a filesystem to store my on-disk backups (from Bacula) on a new server. These backup files will be few (less than 10000) and mostly huge (>1GB). Because I will have multiple files being written out at once, a large data per inode ratio seems to make sense as it will greatly reduce fragmentation, and wasted space would be low because of the small number of files. Also because the write pattern is exclusively streaming writes, I can go against my normal rule and use RAID5.
I've chosen a 4MB of data per inode ratio based on some rough calculations, but while my mkfs.ext3 <dev> -i 4194304 just raced through initially, when it got to "Writing superblocks and filesystem accounting information:" it just seemed to hang. Strace says it's doing seek, write 4k, seek, write 4k, over and over again. I hit ^C and the process is now [mkfs.ext3], but the system is still pegged at 100% disk utilisation.
Why aren't you using ext4? It has improvements for handling large files (extents), among other things. Although I would have chosen zfs or btrfs for that task myself, unless I was stuck on something like RHEL :/
participants (7)
-
Avi Miller
-
Craig Sanders
-
David
-
James Harper
-
Russell Coker
-
Toby Corkindale
-
trentbuck@gmail.com