
On Tue, Feb 07, 2012 at 10:36:27AM +1100, Toby Corkindale wrote:
or I could buy a bigger hard drive and put up with it for now, until someone implements a stable block-based deduplication in an open-source filesystem.. :)
We already have it in the kernel for memory (as KSM) so I'm surprised we haven't seen something for ext4 already.
it's in ZFS. the catch is that it takes enormous amounts of memory....about 1GB RAM per TB of disk space IIRC to store the hashes for each block. I suspect the same catch would apply to other implementations. that's mitigated somewhat by the fact that it also uses your L2ARC (cache) for de-duping, so if you have a large fast SSD cache device on your zpool, you can get away with less RAM. I don't run enough VMs on my home machine to bother with de-duping. I've got terabytes of disk and only 16GB RAM. I do use compression on the ZFS filesystems, though. it's effectively free (in fact, for most workloads it's a performance boost because it's faster to load and decompress fewer blocks than it is to load more uncompressed blocks). craig -- craig sanders <cas@taz.net.au> BOFH excuse #258: That's easy to fix, but I can't be bothered.