ata errors in dmesg/syslog - any pointers from the more ATA/AHCI literate?

Hey folks, I have my system (Gigabyte P55A-UD4 r1 F15 firmware) configured in AHCI mode with a 1+3 TB HDDs, and a DVD drive. ata5 = 1TB SATA (msdos partition scheme) ata9 = DVD SATA ata10 = 3TB SATA (GPT scheme to use 3TB as one, but not a boot drive) ata16 = Appears to be virtual "Marvell" device (I am not using hardware or fake RAID, and have disabled eSATA port) I have two 6Gbps SATA ports, and the rest are 3 I think. One of the 6GB ports goes to the 3TB drive, the other goes to a docking bay in the case that I use to do occasional offline backups. The kernel seems to be bitching about the DVD drive all the time (and sometimes my 1TB drive), and I'm not quite sure why. Buggy AHCI? Crappy SATA cables? Something else? Used AHCI mode for ages, but only with upgrade to Ubuntu version that I've noticed more bitching. The other day, my 1TB drive got pushed into read only mode on boot - I did several SMART tests and an fsck from a live environment and nothing out of the ordinary came up (I have recently installed Windows 10 in dual boot, but prior to that had Windows 7 - Windows didn't touch grub at all). Using search: zegrep 'ata([59]|10|16)|status: \{' /var/log/syslog.3.gz I've pulled out two examples of boot time fun: [ 1.062473] ata5: SATA max UDMA/133 abar m2048@0xfbffd000 port 0xfbffd200 irq 37 [ 1.078468] ata9: SATA max UDMA/133 abar m2048@0xfbdff000 port 0xfbdff100 irq 39 [ 1.078470] ata10: SATA max UDMA/133 abar m2048@0xfbdff000 port 0xfbdff180 irq 39 [ 1.078480] ata16: SATA max UDMA/133 abar m2048@0xfbdff000 port 0xfbdff480 irq 39 [ 1.400286] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.400383] ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1.400411] ata16.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 [ 1.400689] ata16.00: configured for UDMA/66 [ 1.401396] ata10.00: ATA-8: WDC WD3000FYYZ-01UL1B0, 01.01K01, max UDMA/133 [ 1.401401] ata10.00: 5860533168 sectors, multi 8: LBA48 NCQ (depth 31/32), AA [ 1.403407] ata10.00: configured for UDMA/133 [ 1.404179] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.405880] ata9.00: ATAPI: ATAPI iHAS124 Y, BL0V, max UDMA/100 [ 1.408124] ata9.00: configured for UDMA/100 [ 2.192223] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 2.207881] ata5.00: ATA-8: WDC WD1002FAEX-00Z3A0, 05.01D05, max UDMA/133 [ 2.207887] ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 2.209995] ata5.00: configured for UDMA/133 [ 3.184967] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.185027] ata9.00: irq_stat 0x40000001 [ 3.200944] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.201002] ata9.00: irq_stat 0x40000001 [ 3.216173] ata16.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.216223] ata16.00: irq_stat 0x40000001 [ 3.216271] ata16.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 1 dma 16640 in [ 3.216336] ata16.00: status: { DRDY } [ 3.268939] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.268988] ata9.00: irq_stat 0x40000001 [ 3.284496] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.284545] ata9.00: irq_stat 0x40000001 [ 3.300450] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.300499] ata9.00: irq_stat 0x40000001 [ 3.324919] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.324967] ata9.00: irq_stat 0x40000001 [ 3.340548] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.340596] ata9.00: irq_stat 0x40000001 [ 3.356589] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.356636] ata9.00: irq_stat 0x40000001 [ 3.380945] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.380993] ata9.00: irq_stat 0x40000001 [ 10.885181] ata5.00: exception Emask 0x0 SAct 0x40000007 SErr 0x0 action 0x0 [ 10.885257] ata5.00: irq_stat 0x40000008 [ 10.885322] ata5.00: failed command: READ FPDMA QUEUED [ 10.885382] ata5.00: cmd 60/00:f0:c0:3d:83/01:00:1d:00:00/40 tag 30 ncq 131072 in [ 10.885490] ata5.00: status: { DRDY ERR } [ 10.885545] ata5.00: error: { UNC } [ 10.889441] ata5.00: configured for UDMA/133 [ 10.889571] ata5: EH complete [ 13.185375] ata5.00: exception Emask 0x0 SAct 0x3000000 SErr 0x0 action 0x0 [ 13.185463] ata5.00: irq_stat 0x40000008 [ 13.185526] ata5.00: failed command: READ FPDMA QUEUED [ 13.185585] ata5.00: cmd 60/08:c0:e0:3d:83/00:00:1d:00:00/40 tag 24 ncq 4096 in [ 13.185693] ata5.00: status: { DRDY ERR } [ 13.185749] ata5.00: error: { UNC } [ 13.189954] ata5.00: configured for UDMA/133 [ 13.190146] ata5: EH complete [ 15.037279] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 15.037342] ata9.00: irq_stat 0x40000001 [ 15.053304] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 15.053368] ata9.00: irq_stat 0x40000001 [ 15.073351] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 15.073423] ata9.00: irq_stat 0x40000001 [ 15.097661] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 15.097734] ata9.00: irq_stat 0x40000001 [ 15.117315] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 15.117378] ata9.00: irq_stat 0x40000001 [ 15.137616] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 15.137689] ata9.00: irq_stat 0x40000001 [ 139.870144] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 139.870148] ata9.00: irq_stat 0x40000001 [ 139.890320] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 139.890324] ata9.00: irq_stat 0x40000001 [ 139.897804] ata9: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4 [ 139.897807] ata9: irq_stat 0x40000001 [ 1.063019] ata5: SATA max UDMA/133 abar m2048@0xfbffd000 port 0xfbffd200 irq 37 [ 1.078151] ata9: SATA max UDMA/133 abar m2048@0xfbdff000 port 0xfbdff100 irq 39 [ 1.078153] ata10: SATA max UDMA/133 abar m2048@0xfbdff000 port 0xfbdff180 irq 39 [ 1.078163] ata16: SATA max UDMA/133 abar m2048@0xfbdff000 port 0xfbdff480 irq 39 [ 1.396028] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.396046] ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1.396200] ata16.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 [ 1.396470] ata16.00: configured for UDMA/66 [ 1.397147] ata10.00: ATA-8: WDC WD3000FYYZ-01UL1B0, 01.01K01, max UDMA/133 [ 1.397153] ata10.00: 5860533168 sectors, multi 8: LBA48 NCQ (depth 31/32), AA [ 1.399072] ata10.00: configured for UDMA/133 [ 1.403953] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.405630] ata9.00: ATAPI: ATAPI iHAS124 Y, BL0V, max UDMA/100 [ 1.407746] ata9.00: configured for UDMA/100 [ 2.191935] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 2.204992] ata5.00: ATA-8: WDC WD1002FAEX-00Z3A0, 05.01D05, max UDMA/133 [ 2.204998] ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 2.207106] ata5.00: configured for UDMA/133 [ 3.184727] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.184787] ata9.00: irq_stat 0x40000001 [ 3.200768] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.200826] ata9.00: irq_stat 0x40000001 [ 3.215932] ata16.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.215991] ata16.00: irq_stat 0x40000001 [ 3.216044] ata16.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 1 dma 16640 in [ 3.216111] ata16.00: status: { DRDY } [ 3.276729] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.276788] ata9.00: irq_stat 0x40000001 [ 3.292414] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.292472] ata9.00: irq_stat 0x40000001 [ 3.308295] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.308343] ata9.00: irq_stat 0x40000001 [ 3.332740] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.332799] ata9.00: irq_stat 0x40000001 [ 3.352472] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.352530] ata9.00: irq_stat 0x40000001 [ 3.372280] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.372328] ata9.00: irq_stat 0x40000001 [ 3.396795] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 3.396854] ata9.00: irq_stat 0x40000001 [ 7.936707] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 7.936770] ata9.00: irq_stat 0x40000001 [ 7.984588] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 7.984652] ata9.00: irq_stat 0x40000001 [ 8.000647] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 8.000714] ata9.00: irq_stat 0x40000001 [ 8.024891] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 8.024958] ata9.00: irq_stat 0x40000001 [ 8.040638] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 8.040700] ata9.00: irq_stat 0x40000001 [ 8.060998] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 8.061060] ata9.00: irq_stat 0x40000001 [ 649.042550] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 649.042557] ata9.00: irq_stat 0x40000001 [ 649.062840] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 649.062846] ata9.00: irq_stat 0x40000001

On Tue, Jan 19, 2016 at 11:35:06PM +1100, Anthony Hogan wrote:
I have my system (Gigabyte P55A-UD4 r1 F15 firmware) configured in AHCI mode with a 1+3 TB HDDs, and a DVD drive.
ata5 = 1TB SATA (msdos partition scheme) ata9 = DVD SATA ata10 = 3TB SATA (GPT scheme to use 3TB as one, but not a boot drive) ata16 = Appears to be virtual "Marvell" device
(I am not using hardware or fake RAID, and have disabled eSATA port)
I have two 6Gbps SATA ports, and the rest are 3 I think. One of the 6GB ports goes to the 3TB drive, the other goes to a docking bay in the case that I use to do occasional offline backups.
The kernel seems to be bitching about the DVD drive all the time (and sometimes my 1TB drive), and I'm not quite sure why. Buggy AHCI? Crappy SATA cables? Something else? Used AHCI mode for ages, but only with upgrade to Ubuntu version that I've noticed more bitching.
The other day, my 1TB drive got pushed into read only mode on boot - I did several SMART tests and an fsck from a live environment and nothing out of the ordinary came up (I have recently installed Windows 10 in dual boot, but prior to that had Windows 7 - Windows didn't touch grub at all).
some things to try: 0. stating the obvious, but maybe try a different kernel. you don't mention what version of ubuntu you're running (the latest?) but you could see if there's an updated kernel for it. if there isn't one, you could use the liquorix kernel (latest liquorix version which runs on both debian and ubuntu is linux-image-4.3-3.dmz.6-liquorix-amd64. i've installed that but haven't got around to rebooting yet, so am still running linux-image-4.3-3.dmz.2-liquorix-amd64). http://liquorix.net/ 1. disable AHCI in the BIOS. 2. disable AHCI in the BIOS and move both SATA drives onto the Marvell 6Gbps ports and use the sata_mv driver (which is in the mainline kernel, has been for years). The driver has a few parms that might be worth reading about and experimenting with. $ modinfo sata_mv | grep -v alias filename: /lib/modules/4.3-3.dmz.2-liquorix-amd64/kernel/drivers/ata/sata_mv.ko version: 1.28 license: GPL description: SCSI low-level driver for Marvell SATA controllers author: Brett Russ srcversion: DD7FD903CFF2406E08B557D depends: libata intree: Y vermagic: 4.3-3.dmz.2-liquorix-amd64 SMP preempt mod_unload modversions parm: msi:Enable use of PCI MSI (0=off, 1=on) (int) parm: irq_coalescing_io_count:IRQ coalescing I/O count threshold (0..255) (int) parm: irq_coalescing_usecs:IRQ coalescing time threshold in usecs (int) According to http://www.gigabyte.com.au/products/product-page.aspx?pid=3436#sp the marvell ports should be labelled GSATA3_6 and GSATA3_7 NOTE: I'm not a fan of marvell sata (i had problems with them in the distant past but the bugs have probably been fixed long ago, and they're not exactly high-performance controllers, they're decidedly low-budget stuff), but it's worth a try. the DVD can stay where it is on one of the 3Gbps SATA ports. 3. Check all other settings in the BIOS and make sure they're reasonably sane. Since BIOS features and options tend to be extremely badly documented (if at all) this isn't as easy as it sounds. unless you have another computer with internet access nearby you be able to google any of the more obscure settings from the BIOS. and the "helpful descriptions" of the options in the bios screen are generally neither helpful nor descriptive. 5. Buy a 2 or 4 port PCI-e SATA 3 6Gbps card using a known good chipset. The trouble is that it's difficult to know what you're getting - many of the cheaper cards use marvell chips anyway (and probably older versions than what's on your m/b). if you want to be certain you're getting something good and don't mind a bit of overkill for the task at hand, look for one of the LSI 2008 SAS controllers. there are many brands with re-badged versions, including the IBM M1015 which typically sell for around $100 on ebay for an 8-port card - I have three of these and they're great. OTOH, $100-ish is not too far off what the cheapest new Haswell CPU + m/b would cost. 5. which brings us to the final option: replace the m/b and cpu. The cheapest possible replacement would be a G1840 CPU for around $57 and an Asrock H81M-DGS motherboard (with 2xSata 6Gbps and 2xSata 3Gbps ports) for $69. the LGA1150 and LGA1156 both use DDR3 so no need to get new ram. total would be around $126. for $20 more you could get the Asrock B85M Pro3 which has 4 x Sata 6Gbps and 2 x Sata 3Gbps. dunno what CPU you've got in your current board or how it compares to the G1840....but swapping the m/b should not only fix your current problem by getting rid of the problematic hardware, it would also give you a viable upgrade path for more RAM and better CPUs (all the way up to Xeon CPUs like the E3-1241V3)...whereas LGA1156 is dead and buried by now. craig -- craig sanders <cas@taz.net.au>

some things to try:
0. stating the obvious, but maybe try a different kernel. you don't mention what version of ubuntu you're running (the latest?) but you could see if there's an updated kernel for it.
Just upgraded to a new one today, but before rebooting into new kernel, everything got chucked into RO again (dmesg barf at the bottom). Presently: $ uname -a Linux AH01 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ dpkg --get-selections | grep linux-image linux-image-4.2.0-23-generic install linux-image-4.2.0-25-generic install linux-image-extra-4.2.0-23-generic install linux-image-extra-4.2.0-25-generic install linux-image-generic install I also have an nvidia graphics card running the binary driver. To date, I've found the nvidia + nouveau drivers crack the sads if KMS is enabled, so my grub config adds "nomodeset" to kernel boot parms.
if there isn't one, you could use the liquorix kernel (latest liquorix version which runs on both debian and ubuntu is linux-image-4.3-3.dmz.6-liquorix-amd64. i've installed that but haven't got around to rebooting yet, so am still running linux-image-4.3-3.dmz.2-liquorix-amd64).
Will have a squiz!
1. disable AHCI in the BIOS.
Done - dmesg still noisy as all buggery. I'll tackle Windows 10 (the lesser used, dual booted OS on the system) at another time (it will no doubt bitch and moan about the change from AHCI to "IDE" mode).
2. disable AHCI in the BIOS and move both SATA drives onto the Marvell 6Gbps ports and use the sata_mv driver (which is in the mainline kernel, has been for years). The driver has a few parms that might be worth reading about and experimenting with. ... marvell ports should be labelled GSATA3_6 and GSATA3_7
I'll give that a go next!
NOTE: I'm not a fan of marvell sata (i had problems with them in the distant past but the bugs have probably been fixed long ago, and they're not exactly high-performance controllers, they're decidedly low-budget stuff), but it's worth a try.
Heh, yeah, I'm getting that impression of it. The mobo also has an old NEC/Renesas USB3 chip which from what I'm told can also be problematic.
the DVD can stay where it is on one of the 3Gbps SATA ports.
I'll have to check if indeed that's where it is. The more I go through the screens, the less confident I feel about what's plugged where :) #unplugallthedrives :)
3. Check all other settings in the BIOS and make sure they're reasonably sane. Since BIOS features and options tend to be extremely badly documented (if at all) this isn't as easy as it sounds. unless you have another computer with internet access nearby you be able to google any of the more obscure settings from the BIOS. and the "helpful descriptions" of the options in the bios screen are generally neither helpful nor descriptive.
Perhaps it could be best described as BIOS Engrish? In terms of performance options, I tend to select the default and not overclock (I figure with only a stock heatsink on the CPU, it'd be bad to fiddle). I might give Google a whirl on my phone when next in the BIOS config.
5. Buy a 2 or 4 port PCI-e SATA 3 6Gbps card using a known good ... if you want to be certain you're getting something good and don't mind a bit of overkill for the task at hand, look for one of the LSI 2008 SAS controllers. there are many brands with re-badged versions, including the IBM M1015 which typically sell for around $100 on ebay for an 8-port card - I have three of these and they're great. OTOH, $100-ish is not too far off what the cheapest new Haswell CPU + m/b would cost.
I guess this is my primary desktop machine, so I'm not beyond making it more reliable. I mean, the 3TB "enterprise" drive was probably a bit of overkill in and of itself, but I figured it'd be running a lot of the time, and would hold a lot of my data.
5. which brings us to the final option: replace the m/b and cpu. The ... dunno what CPU you've got in your current board or how it compares to the G1840....but swapping the m/b should not only fix your current
Heh, yeah.. even when I bought the machine it wasn't blinged out - I just wanted something with the CPU and enough RAM to run a couple of VMs if and when required. Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz NVIDIA Corporation G92 [GeForce GTS 250] (rev a2) Seasonic 750W PSU It has been awhile since I got this machine (April '10), though the 3TB drive is a lot newer than the rest of it. Today's blargh (turns out knowing basic SMTP is handy in these situations :)): [200860.130029] ------------[ cut here ]------------ [200860.130035] WARNING: CPU: 3 PID: 26390 at /build/linux-AFqQDb/linux-4.2.0/fs/buffer.c:1160 mark_buffer_dirty+0xf3/0x100() [200860.130036] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs [200860.130044] Buffer I/O error on dev sdb1, logical block 0, lost sync page write [200860.130045] xfs libcrc32c cpuid binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bnep rfcomm bluetooth uas usb_storage pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nvidia(POE) coretemp kvm_intel mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic i7core_edac kvm snd_hda_intel snd_hda_codec snd_hda_core gpio_ich snd_hwdep snd_pcm drm edac_core input_leds snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq serio_raw snd_seq_device snd_timer wmi snd 8250_fintek shpchp soundcore lpc_ich mac_hid sunrpc parport_pc ppdev lp parport autofs4 pata_acpi hid_generic usbhid hid firewire_ohci firewire_core r8169 pata_it8213 crc_itu_t mii ahci libahci [200860.130079] CPU: 3 PID: 26390 Comm: Cache2 I/O Tainted: P OE 4.2.0-23-generic #28-Ubuntu [200860.130081] Hardware name: Gigabyte Technology Co., Ltd. P55A-UD4/P55A-UD4, BIOS F15 09/16/2010 [200860.130082] 0000000000000000 000000007621f8ae ffff8801486a7b48 ffffffff817e94c9 [200860.130084] 0000000000000000 0000000000000000 ffff8801486a7b88 ffffffff8107b3d6 [200860.130086] ffffffff81ac2d38 ffffffff81d2a8a0 ffff880211ff80d0 00000000012e8320 [200860.130087] Call Trace: [200860.130093] [<ffffffff817e94c9>] dump_stack+0x45/0x57 [200860.130097] [<ffffffff8107b3d6>] warn_slowpath_common+0x86/0xc0 [200860.130099] [<ffffffff8107b50a>] warn_slowpath_null+0x1a/0x20 [200860.130101] [<ffffffff812323b3>] mark_buffer_dirty+0xf3/0x100 [200860.130104] [<ffffffff812a4613>] ext4_commit_super+0x1a3/0x260 [200860.130106] [<ffffffff812a4ddd>] __ext4_std_error+0x6d/0xf0 [200860.130108] [<ffffffff812337bf>] ? __getblk_gfp+0x2f/0x60 [200860.130112] [<ffffffff81287fc0>] ext4_mark_iloc_dirty+0x4f0/0x710 [200860.130114] [<ffffffff81289764>] ? ext4_truncate+0x1c4/0x3e0 [200860.130116] [<ffffffff81288303>] ext4_mark_inode_dirty+0x83/0x200 [200860.130118] [<ffffffff81289764>] ext4_truncate+0x1c4/0x3e0 [200860.130120] [<ffffffff8128b2b4>] ext4_setattr+0x3f4/0x870 [200860.130122] [<ffffffff8123e8b6>] ? fsnotify+0x316/0x4a0 [200860.130125] [<ffffffff8121a165>] notify_change+0x235/0x360 [200860.130128] [<ffffffff811fa055>] do_truncate+0x75/0xc0 [200860.130131] [<ffffffff811fb049>] do_sys_ftruncate.constprop.13+0x119/0x170 [200860.130133] [<ffffffff811fe109>] ? SyS_write+0x79/0xc0 [200860.130135] [<ffffffff811fb25e>] SyS_ftruncate+0xe/0x10 [200860.130138] [<ffffffff817f02b2>] entry_SYSCALL_64_fastpath+0x16/0x75 [200860.130139] ---[ end trace 0603547224088737 ]--- [200860.130147] Buffer I/O error on dev sdb1, logical block 0, lost sync page write [201106.718794] scsi_io_completion: 2 callbacks suppressed [201106.718804] sd 9:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [201106.718810] sd 9:0:0:0: [sdb] tag#21 CDB: Read(16) 88 00 00 00 00 00 1e 34 b0 40 00 00 00 08 00 00 [201106.718812] blk_update_request: 2 callbacks suppressed [201106.718815] blk_update_request: I/O error, dev sdb, sector 506769472 [201106.718838] sd 9:0:0:0: [sdb] tag#22 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [201106.718842] sd 9:0:0:0: [sdb] tag#22 CDB: Read(16) 88 00 00 00 00 00 1e 34 b0 40 00 00 00 08 00 00 [201106.718845] blk_update_request: I/O error, dev sdb, sector 506769472 [202459.714979] sd 9:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [202459.714984] sd 9:0:0:0: [sdb] tag#27 CDB: Read(16) 88 00 00 00 00 00 1e 34 b0 40 00 00 00 08 00 00 [202459.714986] blk_update_request: I/O error, dev sdb, sector 506769472 [202459.760308] sd 9:0:0:0: [sdb] tag#28 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [202459.760330] sd 9:0:0:0: [sdb] tag#28 CDB: Read(16) 88 00 00 00 00 00 1e 34 b0 40 00 00 00 08 00 00 [202459.760337] blk_update_request: I/O error, dev sdb, sector 506769472

On Wed, Jan 20, 2016 at 06:07:04PM +1100, Anthony Hogan wrote:
some things to try:
0. stating the obvious, but maybe try a different kernel. you don't mention what version of ubuntu you're running (the latest?) but you could see if there's an updated kernel for it.
Just upgraded to a new one today, but before rebooting into new kernel, everything got chucked into RO again (dmesg barf at the bottom).
Presently:
$ uname -a Linux AH01 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ dpkg --get-selections | grep linux-image
try 'dlocate -k' to list kernel and related packages if you have my dlocate package installed. or '-K' for 'dpkg -l' style output. $ dlocate -k linux-headers-3.16.0-4-amd64 linux-headers-3.16.0-4-common linux-headers-3.19-5.dmz.1-liquorix-amd64 linux-headers-4.3-3.dmz.2-liquorix-amd64 linux-headers-4.3-3.dmz.6-liquorix-amd64 linux-headers-amd64 linux-headers-liquorix-amd64 linux-image-3.16.0-4-amd64 linux-image-3.19-5.dmz.1-liquorix-amd64 linux-image-4.3-3.dmz.2-liquorix-amd64 linux-image-4.3-3.dmz.6-liquorix-amd64 linux-image-amd64 linux-image-liquorix-amd64 nvidia-kernel-dkms spl-dkms zfs-dkms
I also have an nvidia graphics card running the binary driver.
To date, I've found the nvidia + nouveau drivers crack the sads if KMS is enabled, so my grub config adds "nomodeset" to kernel boot parms.
KMS works on my (AMD 1090T and FX-8150 and FX-8320 systems), so i'd guess that's another motherboard bug you have. or maybe the old GTS-260 card.
1. disable AHCI in the BIOS.
Done - dmesg still noisy as all buggery. I'll tackle Windows 10 (the lesser used, dual booted OS on the system) at another time (it will no doubt bitch and moan about the change from AHCI to "IDE" mode).
ok, that's a reason not to disable ahci. makes me wonder, though - does win10 have any problems with the drives? if it works OK on win, microsoft probably worked with gigabyte, or gigabyte provided their own driver to work around bugs in the sata implementation...while the generic linux driver assumes that the sata works as documented.
2. disable AHCI in the BIOS and move both SATA drives onto the Marvell 6Gbps ports and use the sata_mv driver (which is in the mainline kernel, has been for years). The driver has a few parms that might be worth reading about and experimenting with. ... marvell ports should be labelled GSATA3_6 and GSATA3_7
I'll give that a go next!
try moving them to the marvell and re-enable AHCI. assuming the implementation isn't buggy (which may not be / probably isn't the case on your m/b), running in ahci mode is always preferable to using specific vendor drivers.
3. Check all other settings in the BIOS and make sure they're [...]
Perhaps it could be best described as BIOS Engrish?
perhaps. but it's even less useful than that. a menu option labelled "foo" will usually have the descriptive text of "enable or disable foo" with not the slightest trace of anything resembling a definition of what it actually is or does. go ahead and guess, i dare ya.
In terms of performance options, I tend to select the default and not overclock (I figure with only a stock heatsink on the CPU, it'd be bad to fiddle). I might give Google a whirl on my phone when next in the BIOS config.
i don't bother overclocking either, but some options make a difference (like enabling IOMMU on AM3/AM3+ motherboards, or sleep states)
I guess this is my primary desktop machine, so I'm not beyond making it more reliable. I mean, the 3TB "enterprise" drive was probably a bit of overkill in and of itself, but I figured it'd be running a lot of the time, and would hold a lot of my data.
if the motherboard is failing or just a bad model, then adding a SATA or SAS card won't make it any more reliable. IMO upgrading the mb and cpu is a better option than a PCI-e card. OTOH if the m/b is mostly ok and it's just the sata controllers on it that are dodgy then if you can find a good 4-port SATA card for $30-$50 that's a lot cheaper than upgrading. your BIOS may even have an option to disable all the on-board SATA ports.
dunno what CPU you've got in your current board or how it compares to the G1840....but swapping the m/b should not only fix your current
Heh, yeah.. even when I bought the machine it wasn't blinged out - I just wanted something with the CPU and enough RAM to run a couple of VMs if and when required.
Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz NVIDIA Corporation G92 [GeForce GTS 250] (rev a2) Seasonic 750W PSU
that was a pretty good CPU in its day. http://cpuboss.com/cpus/Intel-Core-i7-860-vs-Intel-Celeron-G1840 http://www.cpu-world.com/Compare/237/Intel_Celeron_Dual-Core_G1840_vs_Intel_... Note that the G1840 includes an Intel HD GPU which is more than adequate for basic 2D desktop graphics, so you could get rid of that and save some power (88W vs 150W). also uses open source drivers in the mainline kernel if that's important to you (it is to me, but not important enough for me to give up the nvidia proprietary driver for nouveau. i'd prefer GPL driver, but not as much as i prefer fast, reliable graphics that just works). The G1840 itself uses 43W vs 156W for the i7-860, and I *think* the 88W is for both the CPU+GPU when it's running at full throttle. http://www.game-debate.com/gpu/index.php?gid=1438&gid2=711&compare=intel-hd-... i'm not sure if that's comparing exactly the right model. The specs for the G1840 say "Intel HD Graphics", not "Intel HD Graphics 4600". They're different. but probably not different enough that you'd notice for basic 2D desktop stuff. if you play games on win10 then keep the GTS-260. the windows desktop could probably use the extra GPU grunt too, with all its animations and other pointless bling. if you want to run a few VMs then you probably want something a bit more powerful than the G1840 (which is only single-core CPU with HT, so two threads vs the 4-core HT or 8 threads of the i7-860). Maybe the dual-core (4 thread) i3-4170 for $165 or i5-4460 for $278...either of those would be good enough for running a few lightly used VMs and occasional VMs for testing/experimentation. it really depends on how heavily it gets used or if you do any major number crunching or lots of compiling on it. if it's a lightly used home desktop / server / nas, then a g1840 is probably adequate. if you really want quad-core with 8 execution threads, then maybe a Xeon e3-1241-v3 for $379. it's not as fast as the i7-4790K and it doesn't have a GPU built-in but it's about $150 cheaper. but if you're spending that kind of money, get a 6-core FX-6300 ($155) or n 8-core AMD FX-8320 ($215) and an Asus M5A97-r2 motherboard $(135) instead. much better value for money. add 8 or 16GB of RAM (use DDR3-1866 for the FX CPUs, anything less reduces their performance - $75 for 8GB or $116 for 16GB). put any cheap old video card in it, and you've got a very nice linux server. your current system can be windows only (with putty installed to ssh to linux and a VNC client too if you want/need to run X, with vnc-server running on linux) a second linux-only system is probably a good idea anyway because windows will bitch and moan and invalidate your license if you swap the motherboard. BTW, the specs for the m5a97-r2 say it doesn't support DDR3-1866. it does. the bios detects and auto-configures them, no problem. i have one here, with an FX-8320 in it. # dmigrep.pl base.board Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: ASUSTeK COMPUTER INC. Product Name: M5A97 R2.0 Version: Rev 1.xx Type: Motherboard # list_dimms.sh Handle 0x0030, DMI type 17, 34 bytes Size: 8192 MB Bank Locator: BANK1 Type: DDR3 Speed: 1866 MHz Configured Clock Speed: 933 MHz Handle 0x0034, DMI type 17, 34 bytes Size: 8192 MB Bank Locator: BANK3 Type: DDR3 Speed: 1866 MHz Configured Clock Speed: 933 MHz performance of the FX chips on windows is OK (lots of bad reviews and negative comments, but they're not as bad on Win as some make out), but they work really well on linux, especially if you're running lots of multi-threaded stuff or multitasking with lots of VMs and background daemons. Useful tip when comparing CPUs and GPUs and motherboards: google for "product1 vs product2", e.g "i3-4170 vs i7-860". use product codes where possible for best results.
It has been awhile since I got this machine (April '10), though the 3TB drive is a lot newer than the rest of it.
Today's blargh (turns out knowing basic SMTP is handy in these situations :)): [200860.130029] ------------[ cut here ]------------ [200860.130035] WARNING: CPU: 3 PID: 26390 at /build/linux-AFqQDb/linux-4.2.0/fs/buffer.c:1160 mark_buffer_dirty+0xf3/0x100() [200860.130036] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs [200860.130044] Buffer I/O error on dev sdb1, logical block 0, lost sync page write [200860.130045] xfs libcrc32c cpuid binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bnep rfcomm bluetooth uas usb_storage pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nvidia(POE) coretemp kvm_intel mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic i7core_edac kvm snd_hda_intel snd_hda_codec snd_hda_core gpio_ich snd_hwdep snd_pcm drm edac_core input_leds snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq serio_raw snd_seq_device snd_timer wmi snd 8250_fintek shpchp soundcore lpc_ich mac_hid sunrpc parport_pc ppdev lp parport autofs4 pata_acpi hid_generic usbhid hid firewire_ohci firewire_core r8169 pata_it8213 crc_itu_t mii ahci libahci [200860.130079] CPU: 3 PID: 26390 Comm: Cache2 I/O Tainted: P OE 4.2.0-23-generic #28-Ubuntu [200860.130081] Hardware name: Gigabyte Technology Co., Ltd. P55A-UD4/P55A-UD4, BIOS F15 09/16/2010 [200860.130082] 0000000000000000 000000007621f8ae ffff8801486a7b48 ffffffff817e94c9 [200860.130084] 0000000000000000 0000000000000000 ffff8801486a7b88 ffffffff8107b3d6 [200860.130086] ffffffff81ac2d38 ffffffff81d2a8a0 ffff880211ff80d0 00000000012e8320 [200860.130087] Call Trace: ...
not good. there's something really messed up with your system, and my best guess is that it's the motherboard....or, at least, the sata controllers on it. have you got another machine to test those disks on? there's always the possibility that your drive or drives are failing - although it's a bit unlikely that both of them and the dvd would be failing all at once, more likely to be your motherboard. craig ps: and here we have a classic case of upgrade inflation. start off with something simple and cheap, and end up with something not so cheap after half a dozen "if i spent just a little bit more, i could...." expansions. even so, i think replacing a 6 year old CPU + m/b is a worthwhile thing to do, especially if it's your primary desktop machine. -- craig sanders <cas@taz.net.au>

It got worse.. writing this from webmail :)
It has been awhile since I got this machine (April '10), though the 3TB drive is a lot newer than the rest of it.
Today's blargh (turns out knowing basic SMTP is handy in these situations :)): [200860.130029] ------------[ cut here ]------------ [200860.130035] WARNING: CPU: 3 PID: 26390 at /build/linux-AFqQDb/linux-4.2.0/fs/buffer.c:1160 mark_buffer_dirty+0xf3/0x100() [200860.130036] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs [200860.130044] Buffer I/O error on dev sdb1, logical block 0, lost sync page write [200860.130045] xfs libcrc32c cpuid binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bnep rfcomm bluetooth uas usb_storage pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nvidia(POE) coretemp kvm_intel mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic i7core_edac kvm snd_hda_intel snd_hda_codec snd_hda_core gpio_ich snd_hwdep snd_pcm drm edac_core input_leds snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq serio_raw snd_seq_device snd_timer wmi snd 8250_fintek shpchp soundcore lpc_ich mac_hid sunrpc parport_pc ppdev lp parport autofs4 pata_acpi hid_generic usbhid hid firewire_ohci firewire_core r8169 pata_it8213 crc_itu_t mii ahci libahci [200860.130079] CPU: 3 PID: 26390 Comm: Cache2 I/O Tainted: P OE 4.2.0-23-generic #28-Ubuntu [200860.130081] Hardware name: Gigabyte Technology Co., Ltd. P55A-UD4/P55A-UD4, BIOS F15 09/16/2010 [200860.130082] 0000000000000000 000000007621f8ae ffff8801486a7b48 ffffffff817e94c9 [200860.130084] 0000000000000000 0000000000000000 ffff8801486a7b88 ffffffff8107b3d6 [200860.130086] ffffffff81ac2d38 ffffffff81d2a8a0 ffff880211ff80d0 00000000012e8320 [200860.130087] Call Trace: ...
not good. there's something really messed up with your system, and my best guess is that it's the motherboard....or, at least, the sata controllers on it.
My computer auto-boots if it's shut down, so that if I go to work and I turned it off overnight, it'll be online by the time I'm in the office. This morning, when I woke up, I heard the GPU fan running high (which only happens when GPU driver hasn't been loaded yet) and was greeted by a disk error explosion on the 1TB drive. Would I be right in thinking that this kind of smart failure could not be triggered by the controller, and rather it's a drive fault, because the tests are run wholly within the drive itself and all that goes between drive and computer is the request to start the test, and the test results? === START OF INFORMATION SECTION === Model Family: Western Digital Black Device Model: WDC WD1002FAEX-00Z3A0 Serial Number: WD-WCATR0562222 LU WWN Device Id: 5 0014ee 2044331c2 Firmware Version: 05.01D05 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 6.0 Gb/s Local Time is: Thu Jan 21 04:23:23 2016 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled ... 200 Multi_Zone_Error_Rate 0x0008 200 197 000 Old_age Offline - 8 ... SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 43110 486750687 # 2 Conveyance offline Completed: read failure 90% 43110 486750687 # 3 Extended offline Completed: read failure 90% 43110 486750687 My thoughts are that besides the bitching of the Marvell virtual ATA device (which perhaps was passing stuff through to the 1TB device), all the errors could be attributed to the now failed drive?

On Thu, Jan 21, 2016 at 04:34:39AM +0000, Anthony wrote:
Would I be right in thinking that this kind of smart failure could not be triggered by the controller, and rather it's a drive fault, because the tests are run wholly within the drive itself and all that goes between drive and computer is the request to start the test, and the test results?
yep, the smart tests are run on the drive itself. hope you've got a backup.
My thoughts are that besides the bitching of the Marvell virtual ATA device (which perhaps was passing stuff through to the 1TB device), all the errors could be attributed to the now failed drive?
possibly. but IIRC you said it was whinging about the DVD drive for ages too. craig -- craig sanders <cas@taz.net.au> BOFH excuse #447: According to Microsoft, it's by design

Well.. the drive definitely met it's maker.. but I've figured out where all the ATA error noise is coming from! systemd-udevd invokes /lib/udev/rules.d/85-hdparm.rules ... which invokes /lib/udev/hdparm ... which imports /lib/hdparm/hdparm-functions ... which has a function hdparm_options ... which calls performs various checks etc. and determines that it'll try and set power management on anything even vaguely resembling a drive :-/ ... which passes those options back to /lib/udev/hdparm ... which then tries to run hdparm with the options ... which fails because the drive doesn't like it root@AH01:/lib/udev# . /lib/hdparm/hdparm-functions root@AH01:/lib/udev# hdparm_options /dev/sda -B254 root@AH01:/lib/udev# hdparm `hdparm_options /dev/sda` /dev/sda /dev/sda: setting Advanced Power Management level to 0xfe (254) HDIO_DRIVE_CMD failed: Input/output error APM_level = not supported root@AH01:/lib/udev# dmesg | grep ata9 | tail ... [ 1397.244573] ata9.00: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 [ 1397.244577] ata9.00: irq_stat 0x40000001 [ 1397.244579] ata9.00: failed command: SET FEATURES [ 1397.244583] ata9.00: cmd ef/05:fe:00:00:00/00:00:00:00:00/40 tag 11 [ 1397.244584] ata9.00: status: { DRDY ERR } [ 1397.244585] ata9.00: error: { ABRT } What's the preferred method of telling hdparm to kindly bugger off in this context? I feel like it's maybe a bug because not all drives support all APM values consistently and I suspect that because the Marvell "Virtual" ATA device is in the mix, that the hdparm scripts are blindly seeing it as a drive as well - and that sending unsupported drive parameters to a given device is asking for trouble.
participants (3)
-
Anthony
-
Anthony Hogan
-
Craig Sanders