
Can anyone comment (anecdotal will do) on the most stable filesystem for USB backup disks? I just had a server stop (no keyboard, no ping, etc) because I attached a usb disk with an ext3 fs that had gone bad. The event log was being hit so hard with messages that it couldn't do anything else. Unplugging the disk made the server responsive again, but obviously this isn't the behaviour I was expecting. A FS with better quality kernel code (eg rate limiting in kernel logging would be a good start!) would be preferable to something that might perform a few % better. I've gone with xfs for now but changing is easy enough. (The disk itself could be bad, but a SMART selftest doesn't indicate that this is the case.) Thanks James

James Harper <james.harper@bendigoit.com.au> wrote:
I've gone with xfs for now but changing is easy enough.
I've been using XFS, trouble-free, on my desktop and laptop for years now. If you want to detect possible problems early, you can run xfs_check after unmounting. Also, metadata check sums are coming to XFS (I think it's in 3.10). My backup drive uses Btrfs, possibly not a good idea. I always run btrfsck after unmounting - no problems so far - the most recent backup was a few days ago with a 3.9.4 Debian kernel.

James Harper <james.harper@bendigoit.com.au> writes:
I've gone with xfs for now but changing is easy enough.
I've found XFS pretty good in just being rather terse that your disk is, in fact, corrupt (usually the result of some USB disk decided not to actually ever flush blocks to actual disk). It's still my go-to for removable (okay, all) media for data I actually care about. -- Stewart Smith

Hi James, I'm no expert in these things but I wonder if it might not be caused by an electrical/USB problem rather than a filesystem one. So either a controller inside the disk, or in the caddy/enclosure, a cable, or the usb system in your server, or any intervening usb hub. Did your system crash when you physically plugged in the usb disk, or when you mounted the disk? I'm not sure how you'd rule this in or out, other than trying to reproduce the fault with a lot of plugging/unplugging, something you might not want to do if that server is of some importance. Maybe you could also try the same disk in a different caddy.

Hi James,
I'm no expert in these things but I wonder if it might not be caused by an electrical/USB problem rather than a filesystem one. So either a controller inside the disk, or in the caddy/enclosure, a cable, or the usb system in your server, or any intervening usb hub.
Could be. The cause doesn't necessarily matter to me, I know these things happen, what matters is whether my server stays up when a non-essential filesystem has some errors.
Did your system crash when you physically plugged in the usb disk, or when you mounted the disk?
During access. I think I was deleting some files at the time then it just froze.
I'm not sure how you'd rule this in or out, other than trying to reproduce the fault with a lot of plugging/unplugging, something you might not want to do if that server is of some importance. Maybe you could also try the same disk in a different caddy.
I wiped the disk (reformatted as xfs) and it appears to be fine so far. I was checking the disks to go into backup rotation (had been used previously at another site that had outgrown them). This disk will be used early next week so if there are any actual problems with the media I should find out then. Thanks James

On Fri, 14 Jun 2013, James Harper <james.harper@bendigoit.com.au> wrote:
Could be. The cause doesn't necessarily matter to me, I know these things happen, what matters is whether my server stays up when a non-essential filesystem has some errors.
Unix just doesn't seem to be designed that way. Consider the case of NFS mounts which block everything on any network outage. When running the latest KDE if you have an NFS server become unresponsive then it causes most of the desktop environment to become unusable too, even if the NFS mount was under /mnt (IE not in the path and not used by most programs). Then there are lots of other programs which take note of mount points and do unexpected things. For example Dovecot wants you to run a doveadm command when you change mounted filesystems. Can Ext3/4 be run as a FUSE filesystem? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker <russell@coker.com.au> wrote:
Unix just doesn't seem to be designed that way. Consider the case of NFS mounts which block everything on any network outage. When running the latest KDE if you have an NFS server become unresponsive then it causes most of the desktop environment to become unusable too, even if the NFS mount was under /mnt (IE not in the path and not used by most programs).
Is this a bug or a consequence of design choices? What's the cause?

Russell Coker <russell@coker.com.au> writes:
Unix just doesn't seem to be designed that way. Consider the case of NFS mounts which block everything on any network outage. When running the latest KDE if you have an NFS server become unresponsive then it causes most of the desktop environment to become unusable too, even if the NFS mount was under /mnt (IE not in the path and not used by most programs).
Um, that's by design. If you want applications to receive an error instead of blocking until NFS becomes available again, mount with -o soft instead of -o hard. The same soft-vs-hard binding decision is available in PADL libpam-ldap and libnss-ldap, amongst other things. I don't know why your KDE system would be reading a filesystem in /mnt, though, if you didn't tell it to -- that sounds like a "feature" you should ask the KDE community how to disable. It's probably something trivial and silly like the KDE filesystem abstraction library checking how full each fs is every tenth of a second. I know GNOME does that even for NFS filesystems -- I find it a bit annoying that as soon as /srv/share/read-only fills above X%, a popup appears on every prisoner desktop at once :-/

On 14/06/13 16:41, James Harper wrote:
Hi James,
I'm no expert in these things but I wonder if it might not be caused by an electrical/USB problem rather than a filesystem one. So either a controller inside the disk, or in the caddy/enclosure, a cable, or the usb system in your server, or any intervening usb hub. Could be. The cause doesn't necessarily matter to me, I know these things happen, what matters is whether my server stays up when a non-essential filesystem has some errors. I would have thought it does matter. If the problem was USB causing an interupt storm (and that was also my first thought from your description) then no amount of changing the file system is going to help.
I'm not sure how you'd rule this in or out, other than trying to reproduce the fault with a lot of plugging/unplugging, something you might not want to do if that server is of some importance. Maybe you could also try the same disk in a different caddy. I forget the details, but when I had trouble with an interrupt storm on a server's USB, I found a command that allowed me to see the number of interrupts generated so far for the USB device, so I could see how fast that figure was climbing. It was supposedly fixable with a firmware update, but my only use of USB was to occasionally plug in a keyboard, so I just removed USB and used PS2. I wiped the disk (reformatted as xfs) and it appears to be fine so far. I was checking the disks to go into backup rotation (had been used previously at another site that had outgrown them). This disk will be used early next week so if there are any actual problems with the media I should find out then.
So far so good then. It doesn't really tell you much about the cause, but if the problem's gone... I'd be just a bit wary though of a disk that's failed before for unknown reasons. Test your backups now and then, and don't use this for the only copy of anything important. Regards, Andrew McNaughton

On Mon, 17 Jun 2013, Andrew McNaughton <andrewmcnnz@gmail.com> wrote:
I would have thought it does matter. If the problem was USB causing an interupt storm (and that was also my first thought from your description) then no amount of changing the file system is going to help.
http://en.wikipedia.org/wiki/Hurd http://en.wikipedia.org/wiki/Microkernel I think that the real solution to that class of problems is to use GNU HURD. A monolithic kernel system such as Linux will always be susceptible to bugs in one area affecting others. The bug of unregulated log messages is annoying (and in this case disruptive) but things could be worse. There have been attacks developed to hack a PC via a hostile USB stick that exploits a driver bug. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (7)
-
Andrew McNaughton
-
Andrew Spiers
-
James Harper
-
Jason White
-
Russell Coker
-
Stewart Smith
-
trentbuck@gmail.com