
On Fri, 26 Jul 2013 14:18:44 +1000 Craig Sanders <cas@taz.net.au> wrote:
On Fri, Jul 26, 2013 at 01:00:30PM +1000, Russell Coker wrote:
also the numbers in the READ WRITE and CKSUM columns will show you the number of errors detected for each drive.
However those numbers are all 0 for me.
as i said, i interpret that as indicating that there's no real problem with the drive - unless the kernel is retrying successfully before zfs notices the drive is having problems? is that the case?
No, the very first message in this thread included the zpool status output which stated that 1.4M of data had been regenerated.
I'm now replacing the defective disk. I've attached a sample of iostat output, it seems to be reading from all disks and then reconstructing the parity for the new disk which is surprising, I had expected it to just read the old disk and write to the new disk
there's (at least) two reasons for that.
first is that raidz is only similar to raid5/6 but not exactly the same. the redundant copies of a data block can exist anywhere on any of the drives in the vdev, so it's not just a straight dd-style copy from the old drive to the new.
the second is that when you're replacing a drive, the old one may not be reliable or trustworthy, or may even be absent from the system.
zpool replace tank \ sdd /dev/disk/by-id/ata-ST4000DM000-1F2168_Z300MHWF-part2 In this case the old disk was online, I ran the above replace command so ZFS should know that the new disk needs to be an exact copy of the old.
but instead I get a scrub as well as the "resilver".
that's odd. what makes you say that?
I've attached the zpool status output. It shows the disk as being replaced but is accessing all disks according to the iostat output I attached previously.
and, of course, with raidz (or raid5) writes are always going to be limited to the speed of, at best, a single drive.
Actually for contiguous writes a RAID-5 array can be expected to exceed the performance of a single disk. It's not difficult to demonstrate this in real life.
the SATA controller is also a factor, many (most?) aren't capable of running four or more drives at full speed simultaneously. even a cheap-but-midrange SAS card like my LSI cards couldn't run all 8 ports at full speed with 6Gbps SSDs flat out (since i'm only running hard disks and not SSDs on them, i will never be limited by that so don't care)
Yes, that's always been an issue, dating back to IDE days.
avg-cpu: %user %nice %system %iowait %steal %idle 1.69 0.00 20.23 3.79 0.00 74.29
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 373.90 0.40 298.30 5.80 92344.00 75.20 303.91 1.36 4.48 2.23 67.96 sdb 195.90 0.40 502.70 5.80 89902.40 75.20 176.95 1.66 3.27 1.25 63.72 sdc 374.20 0.60 300.30 6.00 92286.40 76.80 301.54 1.41 4.59 2.38 72.84 sdd 175.10 0.60 539.30 6.00 89230.40 76.80 163.78 1.78 3.27 1.24 67.76 sdl 0.00 174.30 0.00 681.10 0.00 88107.10 129.36 6.40 9.39 1.32 89.72
hmm. does iostat know about 4K sectors yet? maybe try that with -m for megabytes/sec rather than rsec/s.
also, what does 'zpool iostat' (or 'zpool iostat -v') and 'zpool status' say?
I've attached the zpool iostat output and it claims that the only real activity is reading from the old disk at 38MB/s and writing to the new disk at the same speed. I've attached the iostat -m output which shows that all disks are being accessed at a speed just over 45MB/s. I guess that the difference between 38 and 45 would be due to some random variation, zpool gives an instant response based on past data while iostat runs in real time and gives more current data. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/