
On Wed, Jan 07, 2015 at 11:07:52AM +1100, Trent W. Buck wrote:
Russell Coker <russell@coker.com.au> writes:
If you want decent performance with IMAP then just don't use NFS. The write pattern of mail stores is a poor match for the way NFS works and the large number of files doesn't work too well for read caching. Nitpick: NFS doesn't play nice with lots of small files, i.e. maildir and MH storage formats. An IMAP server need not use those formats (e.g. domino).
indeed. no filesystem likes a lot of small files. the old adage "a filesystem is not a database" springs to mind... somewhat tangential to slow t-bird, but this discussion got me thinking - what does "small" file really mean? 1k? 10 bytes? 100MB? can it be defined? (TL;DR - put your email on a SSD) assuming that linux and glibc etc. is perfect and filesystems have zero overhead(1), a workable definition of a "small" block operation is where the latency of the storage media dominates over the bandwidth. for a standard local disk (sata, ~= sas, fc, whatever) that would be roughly when you can read a whole track in less than the time it takes to seek to it. seek takes >= 1/120th sec (assume 7200rpm). at typical disk speeds of 120 MB/s you can read 1MB in 1/120 of a second. so by this definition files < 1MB/s are "small" for normal storage media. any i/o operations of less than that size will be tend to be iops dominated. note 1MB is not the threshold at which performance is _good_, it's only when it doesn't suck terribly. something more like >> 10MB/s is a "good" file size for disk based storage. what about network filesystems? few * 30 to 50 micro-seconds of GigE network latency don't really affect the above calculation, so +/- software(2), "small" should be roughly the same over NFS, and indeed to any other network filesystem that is ultimately backed by spinning rust. NFS's design and software overhead undoubtedly slows things down somewhat (with locking etc, 10x slower wouldn't surprise me), but ultimately small files on spinning disks are just slow. so how about SSDs? assuming 100k iops & 500 MB/s, that works out at a "small" file size of maybe ~5kB or a bit less, which is impressive. however small random write i/o is the absolute worst thing you can do to these things, so be sure to buy a good one (ie. intel only IMHO). taking things to the extreme, how about i/o to dram? ie. tmpfs. server ram is maybe ~70ns latency(3) and ballpark 30GB/s, so surprisingly about 2KB is still a "small" file even for a blindingly fast filesystem completely in ram. however at this level, software overheads (glibc, VM, VFS, slow and simplistic filesystem, writes in multiple caching levels etc.) dominate over raw media speeds, so this isn't isn't really a useful analysis for something so fast. so is mbox or anything else much better than mh? probably not from this raw i/o perspective. the same small email messages have to go somewhere, even if with mbox most are appends and with mh most are new file operations. the occasional large i/o read-modifiy-write in the middle of a mbox is probably "free" though compared to the iops (except for its effect on flushing caches) so I doubt mbox would be much worse than mh. mh also tends towards zillions of files in a dir and some fs's don't deal with that well. XFS used to handle 100k+ file/dir a lot better than ext[34]. dunno if it still does. many files in a dir is not a great idea with any fs. cheers, robin (1) the spherical cow approximation. soooo not true, but probably a good enough approximation in this case as long as your filesystem runs at less than a few GB/s. (2) 'man nfs' tells me that nfs should negotiate upwards to 1MB rpc's these days, which sounds ok. if rpcs are still 4k or 8k like in the old days then it would definitely suck. (3) http://sites.utexas.edu/jdm4372/files/2012/03/RangerLatencyChart.jpg