
On Wed, Sep 14, 2011 at 05:06:01PM +1000, Toby Corkindale wrote:
All tests were performed with the /var/lib/postgresql directory being a separate partition, with a freshly-created filesystem for the test.
'zpool create ...' followed by 'zfs create -o mountpoint=/var/lib/postgresql postgresql' or similar?
I didn't do any tuning beyond the mount options mentioned in my post, so no, Pg didn't get an 8K block size.
I'm not convinced on the 8K vs 4K block size though - some benchmarks
i have no opinion one way or the other yet. haven't done the testing myself. just reporting what i'd read. 4K block-sizes (i.e. ashift=12 rather than default ashift=9 when you create the zpool), on the other hand, are really important. especially if you have some "Advanced Format" 4K sector drives, or there's a chance that you'll be adding some to the pool in the future. which is pretty much a certainty with most 2TB and (i think) all 3TB drives being 4K. I've even got a few 1TB drives that are 4K sectors (2 of them in my zpool). apparently that's a reason to use whole disks rather than partitions when creating/adding to a zpool, so zfs can autodetect whether the drive used 512-byte or 4KB sectors. (of course, i didn't know this when i created my test zpool. didn't know about the ashift=12 option either. no problem, i was expecting to blow it away and start from scratch once i'd played with it enough to know how it worked). I didn't realise what I had when i saw my first 4K sector drive, so just went ahead and formatted it as usual with 512 byte alignment. performance really sucked, and i had no idea why until i googled the model number. had to backup, reformat, and restore.
I've seen actually indicate that smaller sizes perform better - but that swapping 4 for 8 doesn't make a whole lot of difference on the benchmark I'm using. Eg: http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/
thanks for the link.
apparently it's also a good idea to reduce/limit the size of the ARC (ZFS's equivalent to Linux's disk buffering& caching) because [...]
That sounds much like Linux's own VM system then, in which case, it's fine to do that. Without going into details, if you've tuned postgres correctly, then this is actually the optimal setup.
very much like it. AIUI, the effort to hack zfsonlinux to use linux' own cachine probably isn't worth it, and would make keeping up with ZFS on Iluumos, FreeBSD, etc much more difficult. it's not a huge problem unless you're running other stuff (like a standard desktop) on the same system and don't have enough memory.
I noticed that.. I'd like to experiment with those again some time, but as it stands, just having three fast drives striped seems to work pretty well anyway.
just striped...scary for anything but "I really don't care if i lose it all" data. zfs is good, but it's not magic. it can't recover data when a disk dies if there isn't another copy on the other disks in the pool. for safety, stick another drive in and use raidz. it's like raid-5 but without (most of) the raid-5 write performance problems.
Looking at disk stats as I ran the benchmarks, I noticed that the one that managed to generate the most disk I/O (in MB/sec) was ext4-nobarrier, followed by ZFS. Everything else was quite a way behind. ZFS appeared to be using a lot more CPU than ext4 though.. not sure what to make of that.. I guess the extra complexity in the FS has to cause something!
compression enabled? ZFS does use a fair amount of CPU power. it does a lot more than most filesystems. also, given that it's designed for "Enterprise" environments, they've made quite reasonable assumptions about hardware capabilities and performance than more consumer-oriented dev teams can get away with. i.e. modern desktop machines won't be troubled by it. aging ones will. especially if they don't have enough RAM. craig -- craig sanders <cas@taz.net.au> BOFH excuse #28: CPU radiator broken