Frequent A20-Olinuxino-Lime2-eMMC board hangs

Started by chradev, July 23, 2016, 12:20:50 PM

Previous topic - Next topic

chradev

Hi to All,

I have just received 3-rd A20-Olinuxino-Lime2-eMMC board and mount it in our prototype.
The system (HW and SW) is as described in my posts:
Last used FW is based on customized by me Armbian 5.17, U-Boot 2016.07, Kernel 4.6.4 and Debian Jessie.
There is no difference in CPU / DRAM clocks and timings for months. There is no difference in FW before and after mounting the new Lime2 board as well.

The 2 prototypes have worked perfect for months without even one registered hang.

Unfortunately, the new board hangs 3 times for a few days even last 2 of them happens in 6 hours interval which is not acceptable for the planned board usage.

The registered hangs have happened at:

  • relatively low work load (1.7-2.7),
  • normal consumption (1.5-1.7W reported by PMU and measured 2.8-3.2W from 5V/4A high quality AC-DC adapter) and
  • comfort temperature ranges (Cooler: 27-32°C, CPU: 29-33°C and PMU: 33-39°C).

One difference noted with previous board is that PMU temperature reported by RPI Monitor is more than 10°C higher.
This fact can not be unexplained neither with PMU sensor error nor with overheating because all chips (PMU, CPU, DRAM and Flash) are effectively cooled:

  • The board is mounted in 160x165x51.5 mm aluminum box with 15mm thick aluminum thermo-conductor (cooler);
  • The cooler contacts to all chips via thin (less than 1.5 mm) thermal pads (6 W/mK) and to the box wall via ARCTIC MX-4 thermal compound (with carbon micro-particles);
Similar effect was noted by me in 2 A20-Olinuxino-Lime2-4GB (HW rev. C) boards reported to Olimex but leaving me without acceptable explanation.
The only feasible source of this problem could be the bad soldering of PMU thermal pads planned to take away almost all heat generated inside.
In case of bad soldering DRAM and eMMC Flash (which are BGA) could also have problems.

I am not sure if PMU over-temperature can explain Lime2 board hangs but definitely it is not a SW provoked problem so Olimex opinion is more then welcome.


Best regards
Chris

chradev

#1
Hi to All,

Some update after more tests.

Lime2-eMMC board behavior is stable. It continues hang in


  • 6-10 hours intervals if the work load is started immediately after resetting it;

  • 1-3 hours intervals if the work load is started some time (8/24 hours tested) after resetting it.
The only messages logged at the time of the load are:

ieee80211 phy0: rt2x00usb_watchdog_tx_dma: Warning - TX queue 2 DMA timed out, invoke forced forced reset

very rare till a single message

ieee80211 phy0: rt2800usb_txdone: Warning - Data pending for entry 3 in queue 2

many massages (34 per 2.5 hours at last hang) and irregularly spread.
Such massages was observed with previous Lime2-eMMC boards as well.
There is no correlation between hangs and above messages.

Not mention in previous post:


  • After the hanging the board consumption (~2.8W) continues to be like at work load time but with almost no variation (<0.5%).

  • The network load is total 320kBps (upload) as reported by RPI Monitor spread between WiFi and GBit Ethernet interfaces keeping 2 client connections each.

I can post some RPI Monitor's charts but attachment option is not visible/allowed to me.

EDIT: RPI Monitor charts ware published on Armbian Forum:
  http://forum.armbian.com/index.php/topic/853-armbian-customization/page-5#entry13005

Best regards
Chris

chradev