
On Fri, Jul 26, 2013 at 01:00:30PM +1000, Russell Coker wrote:
also the numbers in the READ WRITE and CKSUM columns will show you the number of errors detected for each drive.
However those numbers are all 0 for me.
as i said, i interpret that as indicating that there's no real problem with the drive - unless the kernel is retrying successfully before zfs notices the drive is having problems? is that the case?
I'm now replacing the defective disk. I've attached a sample of iostat output, it seems to be reading from all disks and then reconstructing the parity for the new disk which is surprising, I had expected it to just read the old disk and write to the new disk
there's (at least) two reasons for that. first is that raidz is only similar to raid5/6 but not exactly the same. the redundant copies of a data block can exist anywhere on any of the drives in the vdev, so it's not just a straight dd-style copy from the old drive to the new. the second is that when you're replacing a drive, the old one may not be reliable or trustworthy, or may even be absent from the system. also note that with a raidz vdev you get the IOPS of a single drive - this is why for large pools it is better to have multiple smaller raidz vdevs than one large vdev (e.g. 3 x 5-drive raidz vdevs rather than 1 x 15-drive raidz vdev).
but instead I get a scrub as well as the "resilver".
that's odd. what makes you say that?
So the rate of rebuild is considerably less than half what I had hoped for, I had expected something like 130MB/s for contiguous reads and writes and instead each of the 5 disks is doing about 45MB/s.
which is about 180MB/s total read. btw, from what I understand of the way it works, ZFS avoids the performance penalty on raid writes by always writing the entire (variable-width) stripe. this can result in writes being faster than reads in some cases, and is the main reason for the recommendation to use 2, 4, 8 (a power of two) number of DATA disks plus however many "parity" disks (1 for raidz1, 2 for raidz2 etc) in a raidz pool. and, of course, with raidz (or raid5) writes are always going to be limited to the speed of, at best, a single drive. the SATA controller is also a factor, many (most?) aren't capable of running four or more drives at full speed simultaneously. even a cheap-but-midrange SAS card like my LSI cards couldn't run all 8 ports at full speed with 6Gbps SSDs flat out (since i'm only running hard disks and not SSDs on them, i will never be limited by that so don't care)
avg-cpu: %user %nice %system %iowait %steal %idle 1.69 0.00 20.23 3.79 0.00 74.29
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 373.90 0.40 298.30 5.80 92344.00 75.20 303.91 1.36 4.48 2.23 67.96 sdb 195.90 0.40 502.70 5.80 89902.40 75.20 176.95 1.66 3.27 1.25 63.72 sdc 374.20 0.60 300.30 6.00 92286.40 76.80 301.54 1.41 4.59 2.38 72.84 sdd 175.10 0.60 539.30 6.00 89230.40 76.80 163.78 1.78 3.27 1.24 67.76 sdl 0.00 174.30 0.00 681.10 0.00 88107.10 129.36 6.40 9.39 1.32 89.72
hmm. does iostat know about 4K sectors yet? maybe try that with -m for megabytes/sec rather than rsec/s. also, what does 'zpool iostat' (or 'zpool iostat -v') and 'zpool status' say? craig -- craig sanders <cas@taz.net.au>