
[ 477.971022] ata1.00: exception Emask 0x50 SAct 0xc00 SErr 0x290b02 action 0xe frozen [ 477.971107] ata1.00: irq_stat 0x01400000, PHY RDY changed [ 477.971166] ata1: SError: { RecovComm UnrecovData Persist HostInt PHYRdyChg 10B8B BadCRC } [ 477.971243] ata1.00: failed command: READ FPDMA QUEUED [ 477.971305] ata1.00: cmd 60/a8:50:00:13:ed/00:00:05:00:00/40 tag 10 ncq 86016 in [ 477.971308] res 40/00:58:a8:13:ed/00:00:05:00:00/40 Emask 0x50 (ATA bus error) [ 477.971449] ata1.00: status: { DRDY } [ 477.971501] ata1.00: failed command: READ FPDMA QUEUED
I was using dd to copy /dev/sda3 to /dev/sdb3 on a system that is usually running Windows but doesn't appear to have hardware problems. Then I saw the above message about a READ error on ata1.00 in the kernel log followed immediately by the below message about a WRITE error on ata2.00. Any ideas about what might be happening here? I've attached the entire kernel message log.
Before you do anything else, do smartctl -H /dev/sda (and then repeat for /dev/sdb). If that tells you one of the disks is failing then that's probably your problem. I don't know if BadCRC above refers to a media or an interface error. Also smartctl -t short /dev/sda (and again, same for /dev/sdb), then smartctl -l selftest after the prescribed amount of time to check the results. I've never seen a long selftest show an error that a short selftest didn't also pick up, but maybe run a long selftest in the absence of any other suggestions. Maybe also post the output of smartctl -a for each of the disks. If it's working fine on Windows then it's probably not a hardware issue as you say, but SMART makes it so easy to do cursory checks, it doesn't make any sense not to start there. James