New subject: Biting the bullet - RAID

23 May 2018

      On Wednesday, 23 May 2018 1:10:08 PM AEST Craig Sanders via luv-main wrote:
...
far too much RAM to be worth doing.  It's a great way to minimuse use of
cheap disks ($60 per TB or less) by using lots of very expensive RAM ($15
per GB or more).
A very rough rule of thumb is that de-duplication uses around 1GB of RAM per
TB of storage.  Definitely not worth it.  About the only good use case I've
seen for de-duping is a server with hundreds of GBs of RAM providing
storage for lots of mostly-duplicate clone VMs, like at an ISP or other
hosting provider.  It's only worthwile there because of the performance
improvement that comes from NOT having multiple copies of the same
data-blocks (taking more space in the ARC & L2ARC caches, and causing more
seek time delays if using spinning rust rather than SSDs).  Even then, it's
debatable whether just adding more disk would be better.
http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-siz...

Some Google results suggest it's up to 5G of RAM per TB of storage, the above 
URL seems to suggest 2.4G/TB.  At your prices 2.4G of RAM costs $36 so if it 
could save you 600G of disk space (IE 1.6TB of regular storage deduped to 1TB 
of disk space which means 38% of blocks being duplicates) it would save money 
in theory.  In practice it's probably more about which resource you run out of 
and which you can easily increase.  Buying bigger disks generally seems to be 
easier than buying more RAM due to limited number of DIMM slots and 
unreasonable prices for the larger DIMMs.
...
Compression's worth doing on most filesystems, though. lz4 is a very fast,
very low cpu usage algorithm, and (depending on what kind of data) on
average you'll probably get about 1/3rd to 1/2 reduction of space used by
compressible files.  e.g. some of the datasets on the machine I just built
(called "hex"):
# zfs get compressratio hex hex/home hex/var/log hex/var/cache
NAME           PROPERTY       VALUE  SOURCE
hex            compressratio  1.88x  -
hex/home       compressratio  2.00x  -
hex/var/cache  compressratio  1.09x  -
hex/var/log    compressratio  4.44x  -
The first entry is the overall compression ratio for the entire pool. 
1.88:1 ratio. So compression is currently saving me nearly half of my disk
usage. It's a new machine, so there's not much on it at the moment.
Strangely I never saw such good compression when storing email on ZFS.  One 
would expect email to compress well (for starters anything like Huffman coding 
will give significant benefits) but it seems not.
...
I'd probably get even better compression on the logs (at least 6x, probably
more) if I set it to use gzip for that dataset with:
zfs set compression=gzip hex/var/log
I never knew about that, it would probably have helped the mail store a lot.
...
(note that won't re-compress existing data.  only new data will be
compressed with the new algorithm)
If you are storing logs on a filesystem that supports compression you should 
turn off your distribution's support for compressing logs.  That will read and 
rewrite the log files from a cron job and end up not providing much benefit to 
size.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: Biting the bullet - RAID

Russell Coker

Craig Sanders

tags

participants (2)