Hi Russell,

I would assume that the resilvering is related to the checksum errors. From the zpool(8) manpage:

Scrubbing and resilvering are very similar operations. The difference
is that resilvering only examines data that ZFS knows to be out of
date (for example, when attaching a new device to a mirror or
replacing an existing device), whereas scrubbing examines all data to
discover silent errors due to hardware faults or disk failure.

For the messages: FreeBSD has a sysctl vfs.zfs.debug. This sysctl approach was ported to Linux, my Google 'research' (e.g. http://askubuntu.com/questions/228386/how-do-you-apply-performance-tuning-settings-for-native-zfs) indicates, so you may be able to use it under Linux too.

BTW: There is a Nagios/Icinga check_zfs plugin.

I did not know about "mon" before... How does it compare to Nagios/Icinga?

Regards
Peter


On Thu, Sep 22, 2016 at 10:54 PM, Russell Coker via luv-main <luv-main@luv.asn.au> wrote:
Below is part of the output of "zpool status".  It seems that sdr is
defective, it has a steadily increasing number of checksum errors.

Would the "resilvered 763M" part be about the 121 checksum errors?  If so does
that mean each checksum error required resilvering on average 6M of data?

The kernel message log has NOTHING about this.  I'm used to Ext* and BTRFS
which give kernel message log entries about filesystem errors.  Can ZFS be
configured to give similar logging?

As an aside I've written a mon module for monitoring for such ZFS errors.
I'll release it sometime soon.  But I'd be happy to give a version that's
quite usable although not ready for full release to anyone who wants it.

status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 763M in 0h0m with 0 errors on Thu Aug 18 14:48:53 2016
config:

        NAME           STATE     READ WRITE CKSUM
        server         ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            sdj        ONLINE       0     0     0
            sdk        ONLINE       0     0     0
            sdl        ONLINE       0     0     0
            sdm        ONLINE       0     0     0
            sdn        ONLINE       0     0     0
            sdo        ONLINE       0     0     0
            sdp        ONLINE       0     0     0
            sdq        ONLINE       0     0     0
            sdr        ONLINE       0     0   121

--
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

_______________________________________________
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main