With latest "mainline" image provided by Olimex (http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20201207-193928.img.7z) comes the following kernel :
QuoteLinux debian10 5.8.18-olimex #190400 SMP Mon Dec 7 19:05:17 UTC 2020 aarch64 GNU/Linux
If I install this image on a A64-OLinuXino-2Ge8G-IND board, it's very sluggish.
I'm monitoring my boards with Prometheus Node exporter, and it takes 12 seconds to output ip:9100/metrics where it took 1 to 2 seconds. The result is that Prometheus fails to monitor it (timeout).
A simple "top" shows several processes kworker eating 20 to 50% of a CPU, even when the board is idle.
If I install (on the same board) the previous version of this image (A64-OLinuXino-buster-minimal-20201105-143953.img.7z), that comes with kernel :
QuoteLinux a64-olinuxino 5.8.18-olimex #140443 SMP Thu Nov 5 14:08:32 UTC 2020 aarch64 GNU/Linux
everything is fine : no kworker processes eating CPU, no slowness.
If I run "apt upgrade", it upgrades the kernel : after a reboot, the slowness and kworker processes appear.
Steps to reproduce :
- install A64-OLinuXino-buster-minimal-20201207-193928.img on a A64-OLinuXino-2Ge8G-IND board (or upgrade the kernel on an older mainline image)
- run "top"
We are investigating. Meanwhile, try different scaling governor option, try:
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Test.
Then try with:
echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Report back your findings. It seems others have experienced similar:
https://forum.armbian.com/topic/13117-pine64-lts-high-kworker-load-with-cpufreq-problems-after-upgrade-to-20021/
With "performance" governor, the problem seems to disappear : no more kworker processes eating CPU, and the prometheus node exporter takes ~1.5 seconds to respond
With "powersave" governor, the problem also disappears. The node exporter takes ~2.2 seconds
With "ondemand" governor (default value), the problem reappears : kworker processes eat some CPU, and node exporter takes ~11 seconds
So switching to performance or powersave governor is a workaround
Alright thanks for your time and tests! We will fix it back but this would come at a cost. Because this thing is more serious than it seems. The sluggish performance of the ondemand governor is caused by the wokaround for the arch timer of the а64 chip - the a64 chip has known design problems with timer and timer might jump forward in time. Please check this series of patches:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html
Quote from: LubOlimex on December 17, 2020, 09:06:22 AMhttps://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html
This link returns a 404 Not Found error. But, yes, I'm aware of the timer issue. I even reported it again recently : https://www.olimex.com/forum/index.php?topic=7238.msg29730#msg29730
I see that your latest image http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20201217-194545.img.7z ships an older version of the kernel :
Quote from: undefinedLinux a64-olinuxino 5.8.18-olimex #122632 SMP Wed Dec 16 12:28:30 UTC 2020 aarch64 GNU/Linux
I suppose it's a workaround before finding a real fix?
I suppose this version #122632 still has this timer issue?
This should be the link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html
>I suppose it's a workaround before finding a real fix?
There are no real fixes for bugs in the chip's design and silicon. I don't know if a better workaround would be proposed.
If the kernel has timer issue, our images also would have it. But the governor sluggishness should be gone.
Quote from: LubOlimex on December 21, 2020, 08:25:45 AMThis should be the link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html
Your link is still not valid : the forum software adds "email" tags inside the url. The following link looks OK (at least in the preview. I created a "url" instead of letting it do it automatically) :
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html)
I confirm that the timer issue is there with http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20201217-194545.img.7z
I just ran into it : one A64-OLinuXino-2Ge8G-IND board (installed yesterday) jumped to Thu 20 Feb 05:01:58 CET 2116, and (as usually) lost network access.
It's a blocker for me