
On Tue, Apr 30, 2013 at 09:09:14AM +1000, hannah commodore wrote:
So what does this all mean? Is it a regression in Linux 3?
Or were previous versions not actually blocking while sync was called?
i think Toby indicated the source of the problem with his comment about barriers. from kernel 2.6.28 onwards, write barriers are turned on by default in ext4. i'm not sure what version they got added for XFS but they've definitely been on by default for a few years now, and are known to have a massive performance penalty for mysql and innodb[2] similarly, LVM got full write barrier support in 2.6.33. mdadm RAID0/1 has had write barriers for several years, and RAID5/6 got them in late 2009/early 2010 IIRC. it's only safe to turn barriers off if disable any write-caching in the drive OR if you have a non-volatile write cache (e.g. hardware raid, something like bcache with an SSD, or ZFS with an SSD for ZIL), otherwise you risk data loss and filesystem corruption in the event of a crash or power-failure. [1] http://kernelnewbies.org/Ext4#head-25c0a1275a571f7332fa196d4437c38e79f39f63 this also links to a May 2008 article on write barries: http://lwn.net/Articles/283161/ (see also http://lwn.net/Articles/349970/ "Ext3 and RAID: silent data killers?" which inspired mdadm's author to add write barriers for RAID5) [2] https://pracops.com/wiki/index.php/Write_barriers this one links to useful info at: http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache... craig ps: there's also the fact that dd just isn't a very good tool for performance benchmarking, especially with such small files. pps: my suggestion would be to run mysql VMs on ZFS volumes (rather than LVM+mdadm LVs) with 16KB record size, an SSD for L2ARC and ZIL, and "skip-innodb_doublewrite" in mysql.conf as suggested here: https://blogs.oracle.com/realneel/entry/mysql_innodb_zfs_best_practices and here: http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/mysql-zfs.html for non-VM mysql, a zfs filesystem for /var/lib/mysql with 16KB record size, SSD caching/ZIL, and skip-innodb_doublewrite and similar for postgresql, although the tuning recommendation there is for 8K recordsize for pgsql zfs filesystems/volumes. also, enabling zfs compression has been shown to improve performance with some kinds of data and IO loads. I don't know if anyone has done similar testing with mysql. -- craig sanders <cas@taz.net.au>