
So what does this all mean? Is it a regression in Linux 3?
Or were previous versions not actually blocking while sync was called?
I think the latter. Probably the filesystems didn't sync as expected in previous versions of Linux. I was surprised by the numbers, but according to Wikipedia a modern 7200RPM SATA disk maxes out at around 75-100 IOPS. An average run of my dd script is: # dd if=/dev/zero of=test.bin bs=512 count=128 oflag=sync 128+0 records in 128+0 records out 65536 bytes (66 kB) copied, 4.19463 s, 15.6 kB/s So that's 128 sync writes in 4 seconds, or 32 writes per second. Every time you write the 512 byte chunk to the test.bin file, there will also be a metadata update, and even if that was only a single write operation we are now at 64 IOPS, which is within the same magnitude as the 75-100 figure given by Wikipedia. Small synchronous writes suck. I think the suckiness increases even more with 512 byte writes on 4k sectors because of the read-modify-write situation. I'm hoping that ceph with the journal on an SSD will improve my situation somewhat... my SSD's arrive tomorrow. James