
On Wed, May 23, 2018 at 10:42:01PM +1000, russell@coker.com.au wrote:
On Wednesday, 23 May 2018 1:10:08 PM AEST Craig Sanders via luv-main wrote: http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-siz...
Some Google results suggest it's up to 5G of RAM per TB of storage, the above URL seems to suggest 2.4G/TB.
I've seen estimates ranging from 1GB per TB to around 8GB per TB. Nobody really seems to know for sure (and, like most things, will vary greatly depending on the nature of the data being de-duped). Even at the most optimistic rate of 1GB/TB, it's not worth doing except perhaps in some very specialised circumstances. For the average home or small-medium business user, adding more disk space is much easier and much cheaper. And there are better uses for any extra RAM than de-duping - like ARC or other disk caching/buffering, or just running programs.
At your prices 2.4G of RAM costs $36 so if it could save you 600G of disk space (IE 1.6TB of regular storage deduped to 1TB of disk space which means 38% of blocks being duplicates) it would save money in theory.
In theory, maybe it could. In practice, it probably wouldn't. You don't buy RAM in 2.4GB DIMMs. You buy RAM in 2(*), 4, 8, 16, 32, 64, etc GB sizes and usually install them in pairs (or fours or eights) depending on whether you have dual-, quad- or eight memory channels. So probably a minimum of two 4GB DIMMs @ $15/GB = $120 (or more for ECC RAM). That's the price of a pair of 1TB drives. (*) I'm not even sure 2 GB DIMMs are still available new anywhere. You probably wouldn't waste a server's DIMM sockets on anything less than a 4 or 8 GB DIMM anyway, and at the scale where de-duping might be worth it, probably not less than 32 or 64 GB. I use 4 & GB DIMMs in my DDR3 machines here. 16 GB DIMMs in my new DDR-4 box - to me, that was part of the benefit of moving to the platform: ddr3 is effectively obsolete AND ddr4 is cheaper than ddr3 in large sizes...adding more RAM is still one of the best ways to improve performance on Linux boxes if the bottleneck is mostly disk I/O rather than CPU.
In practice it's probably more about which resource you run out of and which you can easily increase. Buying bigger disks generally seems to be easier than buying more RAM due to limited number of DIMM slots and unreasonable prices for the larger DIMMs.
Yeah, disk is cheap and easy to expand. RAM, significantly less so on both counts.
Strangely I never saw such good compression when storing email on ZFS. One would expect email to compress well (for starters anything like Huffman coding will give significant benefits) but it seems not.
logs almost certainly compress better than mail. lots more repeated text "phrases". I enabled gzip on /var/log last night and remembered to disable logrotate compression on this new machine. I'll check what the compression ratio looks like in a few days.
If you are storing logs on a filesystem that supports compression you should turn off your distribution's support for compressing logs. That will read and rewrite the log files from a cron job and end up not providing much benefit to size.
I was going to mention that but thought I'd already written more than enough :) It's even worse than what you say - not only does gzipping the log files create entirely new files, but you'll still have the uncompressed versions of the logs in your snapshots until the snapshots are expired or you delete them. Using the "compress" option in logrotate's conf files actually uses MORE space, not less. fixable with: sed -E -i 's/^\s*.*compress/#&/' /etc/logrotate.d/* Also make sure that "compress" isn't the global default in /etc/logrotate.conf -- comment it out. optionally explicitly set "nocompress" as the global default instead. .deb packages with support for logrotate (most service/daemon type packages) typically have compress enabled by default. You'll need to remember to fix that if/when you install a new package which does that. Or if you let dpkg/apt replace your modified conf files with the packaged ones on any upgrade. craig -- craig sanders <cas@taz.net.au>