
On 14/09/11 16:38, Craig Sanders wrote:
On Wed, Sep 14, 2011 at 04:00:54PM +1000, Toby Corkindale wrote:
ZFS (defaults): natty: 171 oneiric: 996
did you tune the fs for database use? e.g. make a separate zfs for ~postgres and give it an 8K block size.
All tests were performed with the /var/lib/postgresql directory being a separate partition, with a freshly-created filesystem for the test. I didn't do any tuning beyond the mount options mentioned in my post, so no, Pg didn't get an 8K block size. I'm not convinced on the 8K vs 4K block size though - some benchmarks I've seen actually indicate that smaller sizes perform better - but that swapping 4 for 8 doesn't make a whole lot of difference on the benchmark I'm using. Eg: http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/
apparently it's also a good idea to reduce/limit the size of the ARC (ZFS's equivalent to Linux's disk buffering& caching) because, by default, it will use up all available unused memory less 1GB. The db server will will some or all of that RAM, and db servers often have their own built-in app specific caching too (mysql certainly does).
That sounds much like Linux's own VM system then, in which case, it's fine to do that. Without going into details, if you've tuned postgres correctly, then this is actually the optimal setup.
i haven't really looked into tuning ZFS yet, but i have run across some web pages with useful info.
http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases#PostgreSQL_... http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Tuning_...
also, ZFS has built in support for write intent logging (ZIL) and caching (L2ARC) on a separate block device (i.e. disk or partition). e.g on an SSD or other fast disk. a couple of GB of ZIL will really speed up random write performance (similar to having a battery-backed or SSD non-volatile write cache on a raid controller) and a separate cache will speed up reads.
I noticed that.. I'd like to experiment with those again some time, but as it stands, just having three fast drives striped seems to work pretty well anyway. Looking at disk stats as I ran the benchmarks, I noticed that the one that managed to generate the most disk I/O (in MB/sec) was ext4-nobarrier, followed by ZFS. Everything else was quite a way behind. ZFS appeared to be using a lot more CPU than ext4 though.. not sure what to make of that.. I guess the extra complexity in the FS has to cause something!