James Harper <james.harper(a)bendigoit.com.au>
wrote:
Consider virtual hosting where disk size matters
a bit more, because
you pay for what you use. To take an example of one of my spam
filtering mail relays which sees around 500MB of email a day, and on
which I keep 365 days of logs, the logfiles take 380MB compressed but
would take
2.3GB uncompressed.
If they weren't compressed I'd have to
buy more disk than the 10GB I
have now.
True, but Russell's comparison was between compressing the logs with, e.g.,
gzip or xz, and relying on a file system such as Btrfs to perform the
compression (albeit less efficiently). Thus the strength of the above example
depends on the size difference between these two cases. The conclusion
might still hold, of course, but this isn't obvious without some assumptions
and calculations.
His opening paragraph was "Does it make sense to compress log files nowadays?"
which I took to mean "compress at all by any means" so I answered that.
Following that was some theoretical musings about not-yet-ready-for-production-use[1] and
not-license-compatible-with-Linux[2] filesystems, but they apply more to a possible
future, not "nowadays".
Filesystem compression is interesting but as I also commented, the biggest problem with
logs on an SSD is going to be the continual appending which is going to be terrible for
SSD wear. In theory it should be possible to write to an SSD in such a way that you can
fill up the "still erased" tail of the sector with the additional logs without
allocating a new sector because the unwritten part of the sector is still writable[3], but
I doubt this is possible unless the SSD's support it and they mask so much behind
their I/O frontend channel that it would be impossible to tell. This would be completely
impossible with a compressed filesystem though, I think (I'm assuming a compressed
filesystem compresses in per-inode chunks... I don't really know how they work)
Put it another way, an example logfile on my system over a 24 hour period has 34000 lines
(each line has, on average, 117 characters). That logfile is an exim4 mainlog so lets say
that we are writing 4 lines every 10 seconds, on average (4 lines constitute 1 email worth
of logs). To append to the data on a sector in a harddisk you read in the existing data,
modify it (stick your new data on the end), then write it back. For a dumb flash device
you would have to read in the entire block (~1MB I think), erase it, then write back the
block, including your modified data. SSD's are much smarter than that and do all sorts
of funky things to ease the erase requirements but you still need to continually write an
inode worth of sectors each time you modify it. In any case, by my (very rough)
calculations every 10 seconds a new 4K (8 sector) inode of data is re-written to append
the data, so ~8640 inode re-writes over the course of the 24 hour period (I know I've
overlooked a lot of things here and made a lot of assumptions but the order of magnitude
should be about right). When the logfile is zipped up, it consumes 179 inodes. Using a
compressed filesystem wouldn't save anything on the daily re-write churn which is
per-write based, but would save on the write of the 179 inodes of zipped data, or 2% of
total.
James
[1]
https://btrfs.wiki.kernel.org/index.php/FAQ#Is_btrfs_stable.3F "Is btrfs stable?
Short answer: No, it's still considered experimental."
[2]
http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue "In a nutshell, the
issue is that the Linux kernel which is licensed under the GNU General Public License is
incompatible with ZFS which is licensed under the Sun CDDL"
[3] no reference here. I'm basing it on my experience with writing to flash memory.
Erased state is all binary 1's. Writing can turn the 1 into a 0 but the only way to go
from a 0 to a 1 is to erase the entire block.