Olimex Support Forum

OLinuXino Android / Linux boards and System On Modules => A20 => Topic started by: olHelp on July 26, 2015, 10:37:45 PM

Title: Lime2 instability
Post by: olHelp on July 26, 2015, 10:37:45 PM
Hello,

i am having stability problems with the lime2. I intend to use it as a server, running some service that will use a threaded load of about 35%cpu on an A7@1ghz, with mild i/o on the storage and about 200Kb/s network load, no planned downtime.
Connected to the board is the barrel connector, ethernet, an usb disk for storage and the micro sd card containing Linux (tried different distributions). Using the black enclosure.


However, i the lime2 crashes for unknown reasons after 2-11h. Nothing in the logs, so i guessed it is related to bas power supply or temperature problems. Measures taken:

1) tried different 5v 1A supply (using ubuntu and self compiled mainline kernel)
2) opened the enclosure
3) different usb stick
4) switched to 5v 2A suppy
5) used cpufreq to limit frequency to ~720mhz and after that ~560mhz
6) different linux distribution on different sd card (arch linux for arm)
7) removed usb stick and moved storage on the sd card for testing

...however its still not stable.

Do you have any suggestions if there could be anything else wrong or how to diagnose the problem?
The lime2 sits right by an old pandaboard, thats running rock stable for years under the same conditions

Title: Re: Lime2 instability
Post by: Gerrit on July 27, 2015, 12:26:09 AM
What current is mentioned on your HD ?

Try the 1Amp supply exclusive for the board, and the USB disk to have an external power supply or when that is not possible trough a powered USB hub.
Title: Re: Lime2 instability
Post by: soenke on July 27, 2015, 09:47:05 AM
Try to connect a serial console via UART0 (the 3 pins besides the ethernet port) and see, if there is some kernel output. If you see something like "mmcqd blocked ...." it is a half-broken controller on the sd-card. I had exactly the same issuses with about 5 miroSD-cards (sandisk extreme PRO) until now.
Title: Re: Lime2 instability
Post by: JohnS on July 27, 2015, 09:56:12 AM
I'd start by seeing if there are kernel messages on the console uart as it crashes (or before that).

I like to leave the console open from another Linux system, logging the output, such as by screen -L

John
Title: Re: Lime2 instability
Post by: olHelp on July 28, 2015, 01:09:24 PM
Hey,

i am not even using an USB HDD, just an USB Stick. Sorry if i wasnt clear.

I connected another board via the UART0, however minicom just switched to offline, there is no message when the board stops working.
After that i  moved the rootfs onto the usb stick and pulled the sd card after booting, still stops working after a while. Same result with connecting the usb stick via powered USB hub.

Maybe it is a temperature related issue? The load is not really high and without the case there should be enough airflow. Digging further..
Title: Re: Lime2 instability
Post by: JohnS on July 28, 2015, 03:30:29 PM
When does minicom do that?  Do you at least see the messages during boot?

It's not connected right if not.

John
Title: Re: Lime2 instability
Post by: olHelp on July 28, 2015, 04:28:46 PM
Quote from: JohnS on July 28, 2015, 03:30:29 PM
When does minicom do that?  Do you at least see the messages during boot?

It's not connected right if not.

John

The bootlog shows up, ditto for kernel messages like connected peripherals.
Title: Re: Lime2 instability
Post by: soenke on July 28, 2015, 05:26:14 PM
have you tried something like stress -c 2 etc. to create different kind of loads (cpu/memory/IO/all at once)?
Title: Re: Lime2 instability
Post by: olHelp on July 28, 2015, 06:01:15 PM
i tried stress with some random loads, mostly all at once (4 threads, maxing out the ram,i/o on all mounted filesystems etc), but nothing systematically (and it did not crash).

Thats a good suggestion, running stress right now
Title: Re: Lime2 instability
Post by: JohnS on July 28, 2015, 07:55:54 PM
Something to check: all the voltages being set (dcdc2 etc) for the various parts of the board.  Wrong volts/clock (cpu speed) = liable to crash.

E.g. 1008MHz does not seem reliable except with over-voltages -- themselves a bad idea.

(A recent topic on linux-sunxi ML.)

John
Title: Re: Lime2 instability
Post by: olHelp on July 28, 2015, 09:54:54 PM
Quote from: JohnS on July 28, 2015, 07:55:54 PM
Something to check: all the voltages being set (dcdc2 etc) for the various parts of the board.  Wrong volts/clock (cpu speed) = liable to crash.

E.g. 1008MHz does not seem reliable except with over-voltages -- themselves a bad idea.

(A recent topic on linux-sunxi ML.)

John

I just ran cpufreq-ljt-stress-test as mentioned on http://linux-sunxi.org/Hardware_Reliability_Tests#Reliability_of_cpufreq_voltage.2Ffrequency_settings

and the system froze at the second core, even with the freq. max. at 528mhz.

How can i check the specified voltages?
Title: Re: Lime2 instability
Post by: JohnS on July 28, 2015, 10:36:51 PM
Depends on kernel.  Fex and/or uboot and/or DT.  See linux-sunxi pages etc.

A voltmeter is useful with the chip datasheet!

John
Title: Re: Lime2 instability
Post by: Pawel_W on August 01, 2015, 01:46:46 AM
IMO the CPU overheats in this black enclosure under server workload.
I glued a heat sink bar to the CPU and memory by using thermopads
(warning - surfaces of the CPU and memory chips are not at the same level).
It helped, but the case still has no air circulation.
I'm going to connect UEXT ribbon cable, so I'll have to make a hole for it.
You can test Lime2 without the lid of this enclosure (or these yellow panels) and buy the original heat sink for the A20 CPU.
P.S.
Bad solder joints are also a possibility.
Title: Re: Lime2 instability
Post by: soenke on August 01, 2015, 08:42:52 AM
Thanks for the testing!
Maybe the next case should have some vents on the top side... The current solution involves a drill ;)

Does your A20 also overheat without a heat sink and with an open enclosure?

The temperature of the DRAM should not be an issue, normally they dont require additional cooling.
Title: Re: Lime2 instability
Post by: olHelp on August 01, 2015, 01:44:42 PM
Currently testing a custom build kernel with increased voltage,

(as suggested by http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/334714.html)
Quoteoperating-points = <
            /* kHz     uV */
            1008000 1450000
            912000  1425000
            864000  1350000
            720000  1250000
            528000  1150000
            312000  1100000
            144000  1050000
            >;

instead of
Quote
         operating-points = <
            /* kHz    uV */
            960000  1400000
            912000  1400000
            864000  1300000
            720000  1200000
            528000  1100000
            312000  1000000
            144000  900000
            >;

with uptime well over 12h now. With stress running for various tests!

http://postimg.org/image/vtdr22v6z/
Title: Re: Lime2 instability
Post by: Pawel_W on August 01, 2015, 08:10:12 PM
I made some photos of my A20-Lime2 with heat sink:
http://www.fotosik.pl/pokaz_obrazek/68cbc3aa6d2c6170.html
http://www.fotosik.pl/pokaz_obrazek/a288ee10e356f77f.html
http://www.fotosik.pl/pokaz_obrazek/bc68c956f8cfc6e9.html

Heat sink is glued to memory chips by additional thermopad to offset height difference between them and CPU.
I tested Lime2 without heat sink, locked in the case - it hanged sometimes under load (Android games).
Now is much better, but without ventilation holes the heat transfer is still not sufficient for stable 24/7 100% load and the case gets warm.
Title: Re: Lime2 instability
Post by: soenke on August 01, 2015, 11:24:59 PM
maybe you also try to increase the core voltage and see if that solves your problem?

That huge heasink seem a little bit overkill :)
Title: Re: Lime2 instability
Post by: JohnS on August 02, 2015, 03:54:59 PM
Sounds like you're running it too fast already.  More voltage = more heat!!

John
Title: Re: Lime2 instability
Post by: soenke on August 02, 2015, 05:06:29 PM
If it is running on standard clock speeds, i would not change it.

I dont think that increasing the max. voltage from 1.4 to 1.45 has a sigificant impact on the CPU temperature.

He is already using a huge heatsink so i dont think his problem is related to the core temperature but more to undervoltage on certain clock speeds.
Title: Re: Lime2 instability
Post by: JohnS on August 02, 2015, 09:19:57 PM
The typical "standard" speed is often too fast due to being beyond the spec.  Some chips work, some don't.

Many Android tabs say they run at 1008MHz but actually don't.

He appears to have a heat problem now, as he pretty much says.

John
Title: Re: Lime2 instability
Post by: olHelp on August 02, 2015, 10:29:27 PM
By the way: my lime2 is over 24h uptime, with a further increase:

Quote
            912000  1430000
            864000  1375000
            720000  1275000
            528000  1175000
            312000  1125000
            144000  1075000

running with opened case. survived prolonged runs of stress with multiple i/o on all mounted filesystems. Max frequency is 960mhz
Title: Re: Lime2 instability
Post by: JohnS on August 02, 2015, 11:28:32 PM
1.43V is out of spec isn't it?

Yours apparently works.  It's to be expected that some will not, as indeed others have found.

Each owner can choose what they do; it may or may not work reliably.  It may not be the problem here, but there again it may be.  I'm not seeing many other suggestions.

John
Title: Re: Lime2 instability
Post by: olHelp on August 03, 2015, 09:55:29 PM
Meh, i am apparently still able to crash the board, even if its not reproducible. Given the increased voltage, it may be overheating now.
The last resort would be another high quality power brick, even if 2.5A (by now) should be plenty for the board+usb stick.
Title: Re: Lime2 instability
Post by: JohnS on August 03, 2015, 10:13:06 PM
If the CPU volts are wrong, or the frequency too high, adding current will not help.

John
Title: Re: Lime2 instability
Post by: olHelp on August 03, 2015, 10:29:07 PM
Yes, i am aware of that. Over at the arch linux forums someone mentioned running into instabilities with a cubieboard at default settings powered by a 2A plug, so maybe the small plugs can be instable, or i simply tried 3 different, very low quality power adaptors.

Will report back if Kernel 4.3 is out or if i feel like buying another power supply
Title: Re: Lime2 instability
Post by: olHelp on October 18, 2015, 03:23:06 PM
Just to update this topic:
I updated the kernel to 4.3rc5, limited the max.freq to 864mhz, using the default .dtb/voltages and got a new sd card. Board is running without attachments or case....but it still hangs.
There are two different cron-Jobs running,each full hour starts stress -c 2 with 1000s timeout, every half hour (8:30,9:30..) stress runs with -d 2 and 1000s timeout.
Both scripts may crash the system, but its not reproducible. Next stop: another board  :'(
Title: Re: Lime2 instability
Post by: JohnS on October 18, 2015, 07:42:38 PM
Your kernel may be setting any/some of the on-board supplies (dcdc2 etc) to bad values and/or setting bad RAM parameters.

It may be worth asking for help on the linux-sunxi list (or IRC).  They'll want the boot (dmesg or equivalent) logs and so on.

John
Title: Re: Lime2 instability
Post by: olHelp on October 19, 2015, 09:45:51 AM
Oh well, i did not mean to sound that negative. Currently downloading a20-lime2_debian_3.4.90_release_2.img. At least Arch Linux was using the mainline kernel, but it still could be something wrong in main. Will report back in a week or so
Title: Re: Lime2 instability
Post by: olHelp on October 28, 2015, 10:49:01 PM
Hello again,

the default debian 3.4 seems to be pretty stable. Board running without case, crashes only for insane loads with multiple stress and load > 12 or so. Maybe its 100% safe with a heatsink. Now if i only could get 4.3 to be that solid
Title: Re: Lime2 instability
Post by: JohnS on October 29, 2015, 12:13:57 AM
I think 4.3 is not stable.  If you haven't already, referring to linux-sunxi ML / IRC may be useful.

If you build from source you can compare the machine-dependent parts.  It's probably a speed/timing issue.

John
Title: Re: Lime2 instability
Post by: igorpec on October 29, 2015, 08:43:57 AM
4.3 maybe not be because it's not released yet and things are still fixing.

But I am running 4.2.3 on Lime now for two weeks. It doesn't want's to hang :P, kernel 3.4 with Lime 2 for two months with running a lot of stuff on it.

But it's not just kernel. U-boot can also be responsible for problems.