
[ 477.971022] ata1.00: exception Emask 0x50 SAct 0xc00 SErr 0x290b02 action 0xe frozen [ 477.971107] ata1.00: irq_stat 0x01400000, PHY RDY changed [ 477.971166] ata1: SError: { RecovComm UnrecovData Persist HostInt PHYRdyChg 10B8B BadCRC } [ 477.971243] ata1.00: failed command: READ FPDMA QUEUED [ 477.971305] ata1.00: cmd 60/a8:50:00:13:ed/00:00:05:00:00/40 tag 10 ncq 86016 in [ 477.971308] res 40/00:58:a8:13:ed/00:00:05:00:00/40 Emask 0x50 (ATA bus error) [ 477.971449] ata1.00: status: { DRDY } [ 477.971501] ata1.00: failed command: READ FPDMA QUEUED I was using dd to copy /dev/sda3 to /dev/sdb3 on a system that is usually running Windows but doesn't appear to have hardware problems. Then I saw the above message about a READ error on ata1.00 in the kernel log followed immediately by the below message about a WRITE error on ata2.00. Any ideas about what might be happening here? I've attached the entire kernel message log. [ 477.971763] ata1: hard resetting link [ 477.971796] ata2.00: exception Emask 0x50 SAct 0x3fffe000 SErr 0x480800 action 0x6 frozen [ 477.971877] ata2.00: irq_stat 0x08000000, interface fatal error [ 477.971937] ata2: SError: { HostInt 10B8B Handshk } [ 477.971994] ata2.00: failed command: WRITE FPDMA QUEUED [ 477.972082] ata2.00: cmd 61/00:68:00:2c:e7/04:00:05:00:00/40 tag 13 ncq 524288 out [ 477.972087] res 40/00:68:00:2c:e7/00:00:05:00:00/40 Emask 0x50 (ATA bus error) [ 477.972233] ata2.00: status: { DRDY } [ 477.972287] ata2.00: failed command: WRITE FPDMA QUEUED -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

[ 477.971022] ata1.00: exception Emask 0x50 SAct 0xc00 SErr 0x290b02 action 0xe frozen [ 477.971107] ata1.00: irq_stat 0x01400000, PHY RDY changed [ 477.971166] ata1: SError: { RecovComm UnrecovData Persist HostInt PHYRdyChg 10B8B BadCRC } [ 477.971243] ata1.00: failed command: READ FPDMA QUEUED [ 477.971305] ata1.00: cmd 60/a8:50:00:13:ed/00:00:05:00:00/40 tag 10 ncq 86016 in [ 477.971308] res 40/00:58:a8:13:ed/00:00:05:00:00/40 Emask 0x50 (ATA bus error) [ 477.971449] ata1.00: status: { DRDY } [ 477.971501] ata1.00: failed command: READ FPDMA QUEUED
I was using dd to copy /dev/sda3 to /dev/sdb3 on a system that is usually running Windows but doesn't appear to have hardware problems. Then I saw the above message about a READ error on ata1.00 in the kernel log followed immediately by the below message about a WRITE error on ata2.00. Any ideas about what might be happening here? I've attached the entire kernel message log.
Before you do anything else, do smartctl -H /dev/sda (and then repeat for /dev/sdb). If that tells you one of the disks is failing then that's probably your problem. I don't know if BadCRC above refers to a media or an interface error. Also smartctl -t short /dev/sda (and again, same for /dev/sdb), then smartctl -l selftest after the prescribed amount of time to check the results. I've never seen a long selftest show an error that a short selftest didn't also pick up, but maybe run a long selftest in the absence of any other suggestions. Maybe also post the output of smartctl -a for each of the disks. If it's working fine on Windows then it's probably not a hardware issue as you say, but SMART makes it so easy to do cursory checks, it doesn't make any sense not to start there. James

On Fri, 22 May 2015 11:11:24 PM James Harper wrote:
Before you do anything else, do smartctl -H /dev/sda (and then repeat for /dev/sdb). If that tells you one of the disks is failing then that's probably your problem. I don't know if BadCRC above refers to a media or an interface error.
Also smartctl -t short /dev/sda (and again, same for /dev/sdb), then smartctl -l selftest after the prescribed amount of time to check the results.
Thanks for the suggestion, I'll try that next time I get access to the system.
If it's working fine on Windows then it's probably not a hardware issue as you say, but SMART makes it so easy to do cursory checks, it doesn't make any sense not to start there.
The PC was working well on Windows, but not with those disks. One of the 2 SATA disks has been part of a RAID-Z array on another system for some time and the other is a new SATA disk to replace it. The situation is that I have a server with 5*4TB disks in a RAID-Z array and I need to replace it with 5*6TB disks. As I have no system that can handle 10 disks I need to use several PCs to copy the data. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker <russell@coker.com.au> writes:
[ 477.971449] ata1.00: status: { DRDY }
IME this means either the disk or the controller is dying. Isolate the fault using SMART self-tests & juggling hardware to see which component the problem stays with. Then replace it. PS: on Transcend MTS400 with link_power_management_policy = min_power, it's because the firmware is bugged. There's no upgrade because the vendor's an uncaring asshat. YMMV.

On Mon, 25 May 2015 11:15:39 AM Trent W. Buck wrote:
Russell Coker <russell@coker.com.au> writes:
[ 477.971449] ata1.00: status: { DRDY }
IME this means either the disk or the controller is dying.
Isolate the fault using SMART self-tests & juggling hardware to see which component the problem stays with.
Thanks for the suggestion. The new SATA disk in question is in the server and working well. The PC in question has Windows disks in it and Windows isn't having any apparent trouble. The old SATA disk is in a safe place just in case. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (3)
-
James Harper
-
Russell Coker
-
trentbuck@gmail.com