Re: experience with XFS

6 Apr 2012

      On Fri, Apr 06, 2012 at 05:38:28PM +1000, Craig Sanders wrote:
...
On Fri, Apr 06, 2012 at 02:47:32AM -0400, Robin Humble wrote:
...
what about sector re-writes - have you seen ZFS do any of those?
I presume ZFS does this, but again, I haven't seen it yet in my testing.
whether the sector(s) actually got remapped by the drive is hard to
tell. i'd assume so.
fair enough. I guess if remapped sectors are incrementing up in the
drive's smart data then it's probably working, but if the drives are
just timing out SCSI cmds randomly (should they really do that?!) then
that wouldn't show up.
...
any idea what's causing the deadlocks?
the traces and some builds back and forward through git commits give
some idea. I'm guessing attempts by lustre to send data using zero copy
write hooks in zfs are racing with (non-zero copy?) metadata (attr)
updates. I'll email Brian and see if he can suggest something and/or
which mailing list or jira or github issue to post to.

I assume regular ZFS is ok and stable because it doesn't attempt zero
copy writes.
...
only when writing, or reading too?  random or sequential writes?
just writes. sometimes sequential and sometimes random. always with at
least 32 and often with 128 1M i/o's in flight from clients.

so I guess you're running with ashift=12 and a limit on zfs_arc_max?

I'm also using zfs_prefetch_disable=1 (helps lustre reads), but apart
from that no other zfs tweaks, no l2arc SSDs yet etc.

cheers,
robin