
On Thursday, 17 August 2017 3:16:49 PM AEST Craig Sanders via luv-main wrote:
On Thu, Aug 17, 2017 at 01:37:24PM +1000, Tim Connors wrote:
Both XFS and btrfs enthusiastically like to silently throw any data written in the past 5 days on the floor when there's a power failure/kernel panic, so there's that commonality.
That's always been a false claim about XFS.
That's assuming he really means 5 seconds not 5 days. If he really meant 5 days then I've never seen evidence to support such a claim.
If there's a power failure or similar crash *while there is unsynced data in the write cache*, then after a reboot, if the crash circumstances were just right (or maybe "just wrong") then XFS can return a block of NUL bytes rather than whatever random garbage might have been in that unwritten block at the time.
This confuses people because they see all those ugly NULs (e.g. embedded in their log file) and wonder WTF they're there.
Of course the real issue if they have such problems is that an application didn't call fsync() or fdatasync() when it should have OR the application isn't designed for data to be synchronised. For some tasks such as compiling source code you don't want the overhead of fdatasync() and you can just run "make clean ; make all" if you had a power failure - with the recent work on reproducable builds you should even get a binary idential result. MTAs are pretty good about calling the sync family of syscalls and you shouldn't expect problems there. I've seen MySQL have problems on all filesystems, but if you use MySQL you really should have good backups. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430958 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588254 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578635 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=577756 In 2007 I reported a bug against dpkg because it wasn't calling fsync() or fdatasync() and was subject to data loss (I had proven that it had the same bug that was in rpm and had caused data loss on SLES clusters). I've given links to some of the bug reports linked to this. I think that fixing #430958 has involved more developer time and user complaints than any other bug I ever filed in any software, it has also probably involved more user problems when it wasn't fixed than any other bug I reported - but this isn't obvious as the result is just applications crashing for no good reason. When an application writes a new file and doesn't call sync a crash after the write can in practice give the same result as a crash immediately before the write - you lose the data. When an application appends to a file just before a crash you can end up with the file longer but with zeros at the end on some filesystems (I think it's just Ext* and XFS NOT BTRFS) but again having a few nulls in your log file isn't a major problem and you just lose the last write. The real problem is where an application overwrites an existing file before a crash and you can end up with a merge of the data from the 2 versions of the file, and if that's a compressed file (like a Libre Office file) it means you won't get much back. I would hope that applications like Libre Office would write a new temporary file, call fdatasync(), and then rename the temporary file over the old file. I'm sure that lots of programs don't do that and I could probably file a dozen bug reports in a day if I wanted to test things out. The real benefit of BTRFS in this regard is that it allows easy snapshotting. If Libre Office does the wrong thing in this regard and one of my workstations crashes at an inconvenient time then I can get the old version of the file from a snapshot that cron made. ZFS also allows the same snapshot functionality in this regard, but it's a bit harder to manage IMHO. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/