Re: zpool status

26 Jul 2013

      On Fri, Jul 26, 2013 at 02:39:37PM +1000, Russell Coker wrote:
...
On Fri, 26 Jul 2013 14:18:44 +1000
Craig Sanders <cas@taz.net.au> wrote:
...
On Fri, Jul 26, 2013 at 01:00:30PM +1000, Russell Coker wrote:
...
...
also the numbers in the READ WRITE and CKSUM columns will show
you the number of errors detected for each drive.
However those numbers are all 0 for me.
as i said, i interpret that as indicating that there's no real
problem with the drive - unless the kernel is retrying successfully
before zfs notices the drive is having problems? is that the case?
No, the very first message in this thread included the zpool status
output which stated that 1.4M of data had been regenerated.
see previous message for why that does not necessarily indicate a failed
disk.
...
...
...
I'm now replacing the defective disk.  I've attached a sample of
iostat output, it seems to be reading from all disks and then
reconstructing the parity for the new disk which is surprising, I
had expected it to just read the old disk and write to the new disk
there's (at least) two reasons for that.
first is that raidz is only similar to raid5/6 but not exactly the
same. the redundant copies of a data block can exist anywhere on any
of the drives in the vdev, so it's not just a straight dd-style copy
from the old drive to the new.
the second is that when you're replacing a drive, the old one may not
be reliable or trustworthy, or may even be absent from the system.
zpool replace tank \
sdd /dev/disk/by-id/ata-ST4000DM000-1F2168_Z300MHWF-part2
In this case the old disk was online, I ran the above replace command
so ZFS should know that the new disk needs to be an exact copy of the
old.
1. you're still thinking in raid or mdadm terms. zfs doesn't do exact
copies of disks. it does exact copies of the data on disks. a data block
with redundant copies on multiple disks WILL NOT BE IN THE SAME SECTOR
on the different disks.  It will be wherever zfs saw fit to put it at
the time it was writing it.

this also means that it's not copying unused/empty sectors on the disk,
it's only copying data in use...so the replace will likely be finished a
lot sooner than you expect, and sooner than 'zpool status' estimates.

i've also read somehwhere that it reads the blocks in the order that
they were written, so if you've created and deleted lots of files or
snapshots, fragmentation will be causing the disk to thrash and slow
down reads. i'm not 100% sure if this is - or even was - true, just
something i've read.

2. if, as you say, the drive has read errors then that will dramatically
slow down the read performance of the drive due to retries.

3. you're replacing an entire disk with a partition? is the start of
the partition 4k-aligned? 

if not, that could make a huge performance difference on writing to the
replacement disk (and once the buffers are filled, slow down reading to
match - no point reading faster than you can write).
...
...
...
but instead I get a scrub as well as the "resilver".
that's odd.  what makes you say that?
I've attached the zpool status output.  It shows the disk as being
replaced but is accessing all disks according to the iostat output I
attached previously.
i still don't see a scrub happening as well as a resilver.

the zpool iostat output is showing about 6MB/s of other usage, as well
as just under 39M/s resilvering the replacement drive.

that seems reasonable to do parity checks on the data as its reading it.
...
I've attached the zpool iostat output and it claims that the only real
activity is reading from the old disk at 38MB/s and writing to the
new disk at the same speed.  I've attached the iostat -m output which
shows that all disks are being accessed at a speed just over 45MB/s.
I guess that the difference between 38 and 45 would be due to some
random variation,
the 6M/s of other usage seems to roughly make up the difference.

if i wanted to be more precise, i'd say: "38-ish plus 6-ish
approximately equals 45-ish. roughly speaking" :)
...
zpool gives an instant response based on past data
while iostat runs in real time and gives more current data.
FYI you can also use 'zpool iostat -v tank nnnn', with nnnn in seconds
similar to /usr/bin/iostat

as with iostat, ignore the first set of output and watch it for a while.

craig

-- 
craig sanders <cas@taz.net.au>

BOFH excuse #368:

Failure to adjust for daylight savings time.