A20-OLinuXino-MICRO: SATA interface hardware failure?

Started by Nate, September 19, 2018, 03:54:00 PM

Previous topic - Next topic

Nate

I have an A20-OLinuXino-MICRO for almost a year.  Early on I set it up to use Devuan (a Debian derivative) with a 60 GB SATA SSD as the root file system.  After being up for several months with no apparent issues, I had to reboot it two weeks ago and saw the kernel reporting ATA errors of the form:


Sep  5 21:47:20 wxbox kernel: [    1.690757] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  5 21:47:20 wxbox kernel: [    1.696591] ata1.00: configured for UDMA/133
Sep  5 21:47:20 wxbox kernel: [    1.701894] ata1: EH complete
Sep  5 21:47:20 wxbox kernel: [    1.719428] ata1.00: exception Emask 0x12 SAct 0x8000000 SErr 0xa80500 action 0x6 frozen
Sep  5 21:47:20 wxbox kernel: [    1.724798] ata1.00: irq_stat 0x08000000, interface fatal error
Sep  5 21:47:20 wxbox kernel: [    1.730118] ata1: SError: { UnrecovData Proto 10B8B BadCRC LinkSeq }
Sep  5 21:47:20 wxbox kernel: [    1.735406] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    1.740701] ata1.00: cmd 60/08:d8:88:0a:74/00:00:07:00:00/40 tag 27 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    1.740701]          res 40/00:dc:88:0a:74/00:00:07:00:00/40 Emask 0x12 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    1.751312] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    1.756516] ata1: hard resetting link
Sep  5 21:47:20 wxbox kernel: [    1.839341] usb 4-1: new full-speed USB device number 2 using ohci-platform
Sep  5 21:47:20 wxbox kernel: [    2.090743] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  5 21:47:20 wxbox kernel: [    2.096356] ata1.00: configured for UDMA/133
Sep  5 21:47:20 wxbox kernel: [    2.101624] ata1: EH complete
Sep  5 21:47:20 wxbox kernel: [    2.107771]  sda: sda1 sda2
Sep  5 21:47:20 wxbox kernel: [    2.114784] sd 0:0:0:0: [sda] Attached SCSI disk
Sep  5 21:47:20 wxbox kernel: [    2.114835] sda: detected capacity change from 0 to 64023257088
Sep  5 21:47:20 wxbox kernel: [    2.125389] usb 4-1: New USB device found, idVendor=10c4, idProduct=ea60
Sep  5 21:47:20 wxbox kernel: [    2.130530] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Sep  5 21:47:20 wxbox kernel: [    2.135500] usb 4-1: Product: CP2102 USB to UART Bridge Controller
Sep  5 21:47:20 wxbox kernel: [    2.139401] EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
Sep  5 21:47:20 wxbox kernel: [    2.141003] EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
Sep  5 21:47:20 wxbox kernel: [    2.150405] usb 4-1: Manufacturer: Silicon Labs
Sep  5 21:47:20 wxbox kernel: [    2.155324] usb 4-1: SerialNumber: 0001
Sep  5 21:47:20 wxbox kernel: [    2.160734] ata1.00: exception Emask 0x12 SAct 0x20 SErr 0xa80500 action 0x6 frozen
Sep  5 21:47:20 wxbox kernel: [    2.165879] ata1.00: irq_stat 0x08000000, interface fatal error
Sep  5 21:47:20 wxbox kernel: [    2.171045] ata1: SError: { UnrecovData Proto 10B8B BadCRC LinkSeq }
Sep  5 21:47:20 wxbox kernel: [    2.176075] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.180977] ata1.00: cmd 60/08:28:00:08:40/00:00:00:00:00/40 tag 5 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.180977]          res 40/00:2c:00:08:40/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.190784] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.195644] ata1: hard resetting link
Sep  5 21:47:20 wxbox kernel: [    2.530742] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  5 21:47:20 wxbox kernel: [    2.536122] ata1.00: configured for UDMA/133
Sep  5 21:47:20 wxbox kernel: [    2.541074] ata1: EH complete
Sep  5 21:47:20 wxbox kernel: [    2.559447] ata1: limiting SATA link speed to 1.5 Gbps
Sep  5 21:47:20 wxbox kernel: [    2.564406] ata1.00: exception Emask 0x10 SAct 0x7e000007 SErr 0x280100 action 0x6 frozen
Sep  5 21:47:20 wxbox kernel: [    2.569376] ata1.00: irq_stat 0x08000000, interface fatal error
Sep  5 21:47:20 wxbox kernel: [    2.574208] ata1: SError: { UnrecovData 10B8B BadCRC }
Sep  5 21:47:20 wxbox kernel: [    2.579077] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.584023] ata1.00: cmd 60/08:00:40:29:40/00:00:00:00:00/40 tag 0 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.584023]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.594105] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.599080] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.604128] ata1.00: cmd 60/08:08:48:29:40/00:00:00:00:00/40 tag 1 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.604128]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.614213] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.619175] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.624146] ata1.00: cmd 60/08:10:50:29:40/00:00:00:00:00/40 tag 2 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.624146]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.634182] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.639101] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.644099] ata1.00: cmd 60/08:c8:10:29:40/00:00:00:00:00/40 tag 25 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.644099]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.654121] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.659061] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.664066] ata1.00: cmd 60/08:d0:18:29:40/00:00:00:00:00/40 tag 26 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.664066]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.674337] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.679447] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.684524] ata1.00: cmd 60/08:d8:20:29:40/00:00:00:00:00/40 tag 27 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.684524]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.694831] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.699897] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.705067] ata1.00: cmd 60/08:e0:28:29:40/00:00:00:00:00/40 tag 28 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.705067]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.715498] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.720657] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.725792] ata1.00: cmd 60/08:e8:30:29:40/00:00:00:00:00/40 tag 29 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.725792]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.736190] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.741329] ata1.00: failed command: READ FPDMA QUEUED
Sep  5 21:47:20 wxbox kernel: [    2.746482] ata1.00: cmd 60/08:f0:38:29:40/00:00:00:00:00/40 tag 30 ncq dma 4096 in
Sep  5 21:47:20 wxbox kernel: [    2.746482]          res 40/00:0c:48:29:40/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Sep  5 21:47:20 wxbox kernel: [    2.756894] ata1.00: status: { DRDY }
Sep  5 21:47:20 wxbox kernel: [    2.762106] ata1: hard resetting link
Sep  5 21:47:20 wxbox kernel: [    3.090736] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep  5 21:47:20 wxbox kernel: [    3.096400] ata1.00: configured for UDMA/133



After some more errors from sd 0:0:0:0:, things settled down for a few days and then it seemed to fail completely on the evening of Sep 8.

Suspecting the SSD I temporarily installed it in my laptop, booted from a CD and ran fsck on it.  All seemed fine.  I tried the SSD with the MICRO again and saw a steady stream of ATA errors with just the drive attached but the rootfs on /dev/mmcblk0p2 and not even attempting to mount any partition on the SSD.  I had another small spinner drive laying around so I set it up like the SSD and tried it and still saw the ATA errors.  I tried a different SATA cable with both drives and still received errors.

I have since opted to run the root system on /dev/mmcblk0p2 and have connected the SSD with a SATA to USB adapter I had laying around with partitions on the SSD mounted as /home and /var (where the majority of disk write operations take place in my application) from /etc/fstab.  All is working well for over a week without errors.

I am concerned that the ATA hardware is failing on my MICRO.  Should I seek a replacement or is there something else can could cause the failure messages?  While my present setup using the SSD connected via USB is working, it's not as clean as I'd like as the SSD is outside the case and both USB ports are constantly in use.  I am also concerned that there might be additional hardware failures in the future should this failure indeed be hardware related.


LubOlimex

I've seen these errors when there is not sufficient powering for the hard disk. When dealing with the hard disks it is important to ensure the board is sufficiently powered. Try with a power supply capable of providing more current and maintaining stable voltage. Consider that it might be also bad cables or contact between board and PSU or between board and disk (sata or power cable). If you suspect that the board might suddenly lose power consider adding back-up battery.

> I am concerned that the ATA hardware is failing on my MICRO.  Should I seek a replacement or is there something else can could cause the failure messages?

If it happens again I would also suggest testing with the official Olimex images and check if it might be a software issue. You can find these here: https://www.olimex.com/wiki/A20-OLinuXino-MICRO

Best regards,
Lub/OLIMEX
Technical support and documentation manager at Olimex

Nate

Thanks, Lub.

What I find interesting is that I'm using the exact same hardware complement only through one of the USB ports rather than the SATA port and it is working without errors.  This includes the same power supply purchased with the MICRO from Olimex last year.

This does give me some ideas to work on when I get the time.

LubOlimex

> This includes the same power supply purchased with the MICRO from Olimex last year.

Oh, you got one of our supplies - then it is likely the problem. Our supplies are low amperage. They are sufficient for the A20 boards alone. A setup of A20 board and hard disk might draw more than these adapters are capable to provide. If possible check the maximum current (or wattage) that your hard disk can consume (should be mentioned in its datasheet).

Search for a similar adapter that is can maintain 12V of voltage while capable to provide up to 2A or 3A of current.
Technical support and documentation manager at Olimex

Nate

I had a small linear power supply on hand that I had not used for years.  It is putting out 13.8 VDC and is rated at 6 amps.  I had a cord with the correct plug so I am now using that to power the MICRO with the SSD.  Testing with that supply showed no ATA errors.

I then put everything back together with the short SATA cable I bought with the MICRO and had ATA errors.  As that cable must be twisted one half turn and then make a sharp U-turn after coming out of the case (I am using the BOX-MICRO-B-BLACK case), I suspected mechanical strain on the SATA connector being the cause.  I had another SATA cable that is thinner and longer so I was able to make a nice loop inside the case and then another gentle loop outside the case to plug into the connector and so far, no ATA errors after three days.

It is interesting that the original power supply didn't seem to have a problem powering the same drive via the USB port with a SATA to USB adapter.  But then USB is 5V, so that may have helped.

Thanks for the replies.