
On Wednesday, 23 May 2018 1:10:08 PM AEST Craig Sanders via luv-main wrote:
far too much RAM to be worth doing. It's a great way to minimuse use of cheap disks ($60 per TB or less) by using lots of very expensive RAM ($15 per GB or more).
A very rough rule of thumb is that de-duplication uses around 1GB of RAM per TB of storage. Definitely not worth it. About the only good use case I've seen for de-duping is a server with hundreds of GBs of RAM providing storage for lots of mostly-duplicate clone VMs, like at an ISP or other hosting provider. It's only worthwile there because of the performance improvement that comes from NOT having multiple copies of the same data-blocks (taking more space in the ARC & L2ARC caches, and causing more seek time delays if using spinning rust rather than SSDs). Even then, it's debatable whether just adding more disk would be better.
http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-siz... Some Google results suggest it's up to 5G of RAM per TB of storage, the above URL seems to suggest 2.4G/TB. At your prices 2.4G of RAM costs $36 so if it could save you 600G of disk space (IE 1.6TB of regular storage deduped to 1TB of disk space which means 38% of blocks being duplicates) it would save money in theory. In practice it's probably more about which resource you run out of and which you can easily increase. Buying bigger disks generally seems to be easier than buying more RAM due to limited number of DIMM slots and unreasonable prices for the larger DIMMs.
Compression's worth doing on most filesystems, though. lz4 is a very fast, very low cpu usage algorithm, and (depending on what kind of data) on average you'll probably get about 1/3rd to 1/2 reduction of space used by compressible files. e.g. some of the datasets on the machine I just built (called "hex"):
# zfs get compressratio hex hex/home hex/var/log hex/var/cache NAME PROPERTY VALUE SOURCE hex compressratio 1.88x - hex/home compressratio 2.00x - hex/var/cache compressratio 1.09x - hex/var/log compressratio 4.44x -
The first entry is the overall compression ratio for the entire pool. 1.88:1 ratio. So compression is currently saving me nearly half of my disk usage. It's a new machine, so there's not much on it at the moment.
Strangely I never saw such good compression when storing email on ZFS. One would expect email to compress well (for starters anything like Huffman coding will give significant benefits) but it seems not.
I'd probably get even better compression on the logs (at least 6x, probably more) if I set it to use gzip for that dataset with:
zfs set compression=gzip hex/var/log
I never knew about that, it would probably have helped the mail store a lot.
(note that won't re-compress existing data. only new data will be compressed with the new algorithm)
If you are storing logs on a filesystem that supports compression you should turn off your distribution's support for compressing logs. That will read and rewrite the log files from a cron job and end up not providing much benefit to size. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/