Hi Russell,

I would assume that the resilvering is related to the checksum errors. From the zpool(8) manpage:

Scrubbing and resilvering are very similar operations. The difference
is that resilvering only examines data that ZFS knows to be out of
date (for example, when attaching a new device to a mirror or
replacing an existing device), whereas scrubbing examines all data to
discover silent errors due to hardware faults or disk failure.

For the messages: FreeBSD has a sysctl vfs.zfs.debug. This sysctl approach was ported to Linux, my Google 'research' (e.g. http://askubuntu.com/questions/228386/how-do-you-apply-performance-tuning-settings-for-native-zfs) indicates, so you may be able to use it under Linux too.

BTW: There is a Nagios/Icinga check_zfs plugin.

I did not know about "mon" before... How does it compare to Nagios/Icinga?

Regards

Peter

On Thu, Sep 22, 2016 at 10:54 PM, Russell Coker via luv-main <luv-main@luv.asn.au> wrote:

Below is part of the output of "zpool status". It seems that sdr is
defective, it has a steadily increasing number of checksum errors.

Would the "resilvered 763M" part be about the 121 checksum errors? If so does
that mean each checksum error required resilvering on average 6M of data?

The kernel message log has NOTHING about this. I'm used to Ext* and BTRFS
which give kernel message log entries about filesystem errors. Can ZFS be
configured to give similar logging?

As an aside I've written a mon module for monitoring for such ZFS errors.
I'll release it sometime soon. But I'd be happy to give a version that's
quite usable although not ready for full release to anyone who wants it.

status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 763M in 0h0m with 0 errors on Thu Aug 18 14:48:53 2016
config:

NAME STATE READ WRITE CKSUM
server ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
sdr ONLINE 0 0 121

--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/

_______________________________________________
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main