Re: zpool status

26 Jul 2013

      On Fri, Jul 26, 2013 at 01:00:30PM +1000, Russell Coker wrote:
...
...
also the numbers in the READ WRITE and CKSUM columns will show you
the number of errors detected for each drive.
However those numbers are all 0 for me.
as i said, i interpret that as indicating that there's no real problem
with the drive - unless the kernel is retrying successfully before zfs
notices the drive is having problems? is that the case?
...
I'm now replacing the defective disk.  I've attached a sample of iostat
output, it seems to be reading from all disks and then reconstructing
the parity for the new disk which is surprising, I had expected it to
just read the old disk and write to the new disk
there's (at least) two reasons for that.

first is that raidz is only similar to raid5/6 but not exactly the same.
the redundant copies of a data block can exist anywhere on any of the
drives in the vdev, so it's not just a straight dd-style copy from the
old drive to the new.

the second is that when you're replacing a drive, the old one may not be
reliable or trustworthy, or may even be absent from the system.

also note that with a raidz vdev you get the IOPS of a single drive -
this is why for large pools it is better to have multiple smaller raidz
vdevs than one large vdev (e.g. 3 x 5-drive raidz vdevs rather than 1 x
15-drive raidz vdev).
...
but instead I get a scrub as well as the "resilver".
that's odd.  what makes you say that?
...
So the rate of rebuild is considerably less than half what I had hoped
for, I had expected something like 130MB/s for contiguous reads and
writes and instead each of the 5 disks is doing about 45MB/s.
which is about 180MB/s total read.

btw, from what I understand of the way it works, ZFS avoids the
performance penalty on raid writes by always writing the entire
(variable-width) stripe. this can result in writes being faster than
reads in some cases, and is the main reason for the recommendation to
use 2, 4, 8 (a power of two) number of DATA disks plus however many
"parity" disks (1 for raidz1, 2 for raidz2 etc) in a raidz pool.

and, of course, with raidz (or raid5) writes are always going to be
limited to the speed of, at best, a single drive.

the SATA controller is also a factor, many (most?) aren't capable
of running four or more drives at full speed simultaneously. even a
cheap-but-midrange SAS card like my LSI cards couldn't run all 8 ports
at full speed with 6Gbps SSDs flat out (since i'm only running hard
disks and not SSDs on them, i will never be limited by that so don't
care)
...
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.69    0.00   20.23    3.79    0.00   74.29
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda             373.90     0.40  298.30    5.80 92344.00    75.20   303.91     1.36    4.48   2.23  67.96
sdb             195.90     0.40  502.70    5.80 89902.40    75.20   176.95     1.66    3.27   1.25  63.72
sdc             374.20     0.60  300.30    6.00 92286.40    76.80   301.54     1.41    4.59   2.38  72.84
sdd             175.10     0.60  539.30    6.00 89230.40    76.80   163.78     1.78    3.27   1.24  67.76
sdl               0.00   174.30    0.00  681.10     0.00 88107.10   129.36     6.40    9.39   1.32  89.72
hmm. does iostat know about 4K sectors yet? maybe try that with -m for
megabytes/sec rather than rsec/s.

also, what does 'zpool iostat' (or 'zpool iostat -v') and 'zpool status'
say?

craig

-- 
craig sanders <cas@taz.net.au>

Re: zpool status

Craig Sanders