Re: ata errors in dmesg/syslog - any pointers from the more ATA/AHCI literate?

21 Jan 2016

      It got worse.. writing this from webmail :)
...
...
It has been awhile since I got this machine (April '10), though the 3TB
drive is a lot newer than the rest of it.
Today's blargh (turns out knowing basic SMTP is handy in these situations :)):
[200860.130029] ------------[ cut here ]------------
[200860.130035] WARNING: CPU: 3 PID: 26390 at /build/linux-AFqQDb/linux-4.2.0/fs/buffer.c:1160 mark_buffer_dirty+0xf3/0x100()
[200860.130036] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs
[200860.130044] Buffer I/O error on dev sdb1, logical block 0, lost sync page write
[200860.130045]  xfs libcrc32c cpuid binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bnep rfcomm bluetooth uas usb_storage pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nvidia(POE) coretemp kvm_intel mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic i7core_edac kvm snd_hda_intel snd_hda_codec snd_hda_core gpio_ich snd_hwdep snd_pcm drm edac_core input_leds snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq serio_raw snd_seq_device snd_timer wmi snd 8250_fintek shpchp soundcore lpc_ich mac_hid sunrpc parport_pc ppdev lp parport autofs4 pata_acpi hid_generic usbhid hid firewire_ohci firewire_core r8169 pata_it8213 crc_itu_t mii ahci libahci
[200860.130079] CPU: 3 PID: 26390 Comm: Cache2 I/O Tainted: P           OE   4.2.0-23-generic #28-Ubuntu
[200860.130081] Hardware name: Gigabyte Technology Co., Ltd. P55A-UD4/P55A-UD4, BIOS F15 09/16/2010
[200860.130082]  0000000000000000 000000007621f8ae ffff8801486a7b48 ffffffff817e94c9
[200860.130084]  0000000000000000 0000000000000000 ffff8801486a7b88 ffffffff8107b3d6
[200860.130086]  ffffffff81ac2d38 ffffffff81d2a8a0 ffff880211ff80d0 00000000012e8320
[200860.130087] Call Trace:
...
not good.  there's something really messed up with your system, and my best
guess is that it's the motherboard....or, at least, the sata controllers on it.
My computer auto-boots if it's shut down, so that if I go to work and
I turned it off overnight, it'll be online by the time I'm in the
office.

This morning, when I woke up, I heard the GPU fan running high (which
only happens when GPU driver hasn't been loaded yet) and was greeted
by a disk error explosion on the 1TB drive.

Would I be right in thinking that this kind of smart failure could not
be triggered by the controller, and rather it's a drive fault, because
the tests are run wholly within the drive itself and all that goes
between drive and computer is the request to start the test, and the
test results?

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black
Device Model:     WDC WD1002FAEX-00Z3A0
Serial Number:    WD-WCATR0562222
LU WWN Device Id: 5 0014ee 2044331c2
Firmware Version: 05.01D05
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s
Local Time is:    Thu Jan 21 04:23:23 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...
200 Multi_Zone_Error_Rate   0x0008   200   197   000    Old_age
Offline      -       8
...
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     43110
      486750687
# 2  Conveyance offline  Completed: read failure       90%     43110
      486750687
# 3  Extended offline    Completed: read failure       90%     43110
      486750687

My thoughts are that besides the bitching of the Marvell virtual ATA
device (which perhaps was passing stuff through to the 1TB device),
all the errors could be attributed to the now failed drive?

Re: ata errors in dmesg/syslog - any pointers from the more ATA/AHCI literate?

Anthony