Olimex Support Forum

OLinuXino Android / Linux boards and System On Modules => A64 => Topic started by: mossroy on December 15, 2020, 05:43:07 PM

Title: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: mossroy on December 15, 2020, 05:43:07 PM
With latest "mainline" image provided by Olimex (http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20201207-193928.img.7z) comes the following kernel :
QuoteLinux debian10 5.8.18-olimex #190400 SMP Mon Dec 7 19:05:17 UTC 2020 aarch64 GNU/Linux

If I install this image on a A64-OLinuXino-2Ge8G-IND board, it's very sluggish.
I'm monitoring my boards with Prometheus Node exporter, and it takes 12 seconds to output ip:9100/metrics where it took 1 to 2 seconds. The result is that Prometheus fails to monitor it (timeout).
A simple "top" shows several processes kworker eating 20 to 50% of a CPU, even when the board is idle.


If I install (on the same board) the previous version of this image (A64-OLinuXino-buster-minimal-20201105-143953.img.7z), that comes with kernel :
QuoteLinux a64-olinuxino 5.8.18-olimex #140443 SMP Thu Nov 5 14:08:32 UTC 2020 aarch64 GNU/Linux
everything is fine : no kworker processes eating CPU, no slowness.
If I run "apt upgrade", it upgrades the kernel : after a reboot, the slowness and kworker processes appear.

Steps to reproduce :
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: LubOlimex on December 16, 2020, 01:43:24 PM
We are investigating. Meanwhile, try different scaling governor option, try:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Test.

Then try with:

echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Report back your findings. It seems others have experienced similar:

https://forum.armbian.com/topic/13117-pine64-lts-high-kworker-load-with-cpufreq-problems-after-upgrade-to-20021/
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: mossroy on December 16, 2020, 07:31:31 PM
With "performance" governor, the problem seems to disappear : no more kworker processes eating CPU, and the prometheus node exporter takes ~1.5 seconds to respond
With "powersave" governor, the problem also disappears. The node exporter takes ~2.2 seconds
With "ondemand" governor (default value), the problem reappears : kworker processes eat some CPU, and node exporter takes ~11 seconds

So switching to performance or powersave governor is a workaround
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: LubOlimex on December 17, 2020, 09:06:22 AM
Alright thanks for your time and tests! We will fix it back but this would come at a cost. Because this thing is more serious than it seems. The sluggish performance of the ondemand governor is caused by the wokaround for the arch timer of the а64 chip - the a64 chip has known design problems with timer and timer might jump forward in time. Please check this series of patches:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: mossroy on December 19, 2020, 07:19:01 PM
Quote from: LubOlimex on December 17, 2020, 09:06:22 AMhttps://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html

This link returns a 404 Not Found error. But, yes, I'm aware of the timer issue. I even reported it again recently : https://www.olimex.com/forum/index.php?topic=7238.msg29730#msg29730

I see that your latest image http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20201217-194545.img.7z ships an older version of the kernel :
Quote from: undefinedLinux a64-olinuxino 5.8.18-olimex #122632 SMP Wed Dec 16 12:28:30 UTC 2020 aarch64 GNU/Linux

I suppose it's a workaround before finding a real fix?
I suppose this version #122632 still has this timer issue?
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: LubOlimex on December 21, 2020, 08:25:45 AM
This should be the link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html

>I suppose it's a workaround before finding a real fix?

There are no real fixes for bugs in the chip's design and silicon. I don't know if a better workaround would be proposed.

If the kernel has timer issue, our images also would have it. But the governor sluggishness should be gone.
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: mossroy on December 21, 2020, 02:49:51 PM
Quote from: LubOlimex on December 21, 2020, 08:25:45 AMThis should be the link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html

Your link is still not valid : the forum software adds "email" tags inside the url. The following link looks OK (at least in the preview. I created a "url" instead of letting it do it automatically) :
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1898487.html)
Title: Re: Slowness and kworker eating CPU with version #190400 of kernel 5.8.18-olimex
Post by: mossroy on December 27, 2020, 10:16:23 PM
I confirm that the timer issue is there with http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20201217-194545.img.7z

I just ran into it : one A64-OLinuXino-2Ge8G-IND board (installed yesterday) jumped to Thu 20 Feb 05:01:58 CET 2116, and (as usually) lost network access.

It's a blocker for me