
James Harper <james.harper@bendigoit.com.au> wrote:
Consider virtual hosting where disk size matters a bit more, because you pay for what you use. To take an example of one of my spam filtering mail relays which sees around 500MB of email a day, and on which I keep 365 days of logs, the logfiles take 380MB compressed but would take 2.3GB uncompressed. If they weren't compressed I'd have to buy more disk than the 10GB I have now.
True, but Russell's comparison was between compressing the logs with, e.g., gzip or xz, and relying on a file system such as Btrfs to perform the compression (albeit less efficiently). Thus the strength of the above example depends on the size difference between these two cases. The conclusion might still hold, of course, but this isn't obvious without some assumptions and calculations.
His opening paragraph was "Does it make sense to compress log files nowadays?" which I took to mean "compress at all by any means" so I answered that. Following that was some theoretical musings about not-yet-ready-for-production-use[1] and not-license-compatible-with-Linux[2] filesystems, but they apply more to a possible future, not "nowadays". Filesystem compression is interesting but as I also commented, the biggest problem with logs on an SSD is going to be the continual appending which is going to be terrible for SSD wear. In theory it should be possible to write to an SSD in such a way that you can fill up the "still erased" tail of the sector with the additional logs without allocating a new sector because the unwritten part of the sector is still writable[3], but I doubt this is possible unless the SSD's support it and they mask so much behind their I/O frontend channel that it would be impossible to tell. This would be completely impossible with a compressed filesystem though, I think (I'm assuming a compressed filesystem compresses in per-inode chunks... I don't really know how they work) Put it another way, an example logfile on my system over a 24 hour period has 34000 lines (each line has, on average, 117 characters). That logfile is an exim4 mainlog so lets say that we are writing 4 lines every 10 seconds, on average (4 lines constitute 1 email worth of logs). To append to the data on a sector in a harddisk you read in the existing data, modify it (stick your new data on the end), then write it back. For a dumb flash device you would have to read in the entire block (~1MB I think), erase it, then write back the block, including your modified data. SSD's are much smarter than that and do all sorts of funky things to ease the erase requirements but you still need to continually write an inode worth of sectors each time you modify it. In any case, by my (very rough) calculations every 10 seconds a new 4K (8 sector) inode of data is re-written to append the data, so ~8640 inode re-writes over the course of the 24 hour period (I know I've overlooked a lot of things here and made a lot of assumptions but the order of magnitude should be about right). When the logfile is zipped up, it consumes 179 inodes. Using a compressed filesystem wouldn't save anything on the daily re-write churn which is per-write based, but would save on the write of the 179 inodes of zipped data, or 2% of total. James [1] https://btrfs.wiki.kernel.org/index.php/FAQ#Is_btrfs_stable.3F "Is btrfs stable? Short answer: No, it's still considered experimental." [2] http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue "In a nutshell, the issue is that the Linux kernel which is licensed under the GNU General Public License is incompatible with ZFS which is licensed under the Sun CDDL" [3] no reference here. I'm basing it on my experience with writing to flash memory. Erased state is all binary 1's. Writing can turn the 1 into a 0 but the only way to go from a 0 to a 1 is to erase the entire block.