Olimex Support Forum

OLinuXino Android / Linux boards and System On Modules => A64 => Topic started by: jch on April 08, 2022, 11:44:08 PM

Title: A64 stalls under network load with kernel .105
Post by: jch on April 08, 2022, 11:44:08 PM
Hi,

Ever since upgrading to .105, I'm seeing my A64 board fail under heavy network load (routing between Ethernet and Wifi).  The serial shell is still responsive, but the board is no longer routing until I reboot it.  The little console consoles itself with messages such as this:
[ 1841.000879] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... 1-... 2-... } 37157 jiffies s: 105 root: 0x7/.                             
[ 1841.012496] rcu: blocking rcu_node structures:                               
[ 1890.618734] rcu: INFO: rcu_sched self-detected stall on CPU                 
[ 1890.624321] rcu:     1-....: (54120 ticks this GP) idle=27e/1/0x4000000000000002 softirq=6037/6037 fqs=26244                                                 
[ 1904.486155] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... 1-... 2-... } 53029 jiffies s: 105 root: 0x7/.                             
[ 1904.497762] rcu: blocking rcu_node structures:                               
[ 1953.628024] rcu: INFO: rcu_sched self-detected stall on CPU                 
[ 1953.633612] rcu:     1-....: (69871 ticks this GP) idle=27e/1/0x4000000000000002 softirq=6037/6037 fqs=34118
Title: Re: A64 stalls under network load with kernel .105
Post by: LubOlimex on April 12, 2022, 08:05:16 AM
This looks like the problem I wrote about here, it was supposed to be fixed:

https://www.olimex.com/forum/index.php?topic=8643.msg33463#msg33463

1) What does this three commands return:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

2) Are you using the latest image from here: https://images.olimex.com/release/a64/
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on April 28, 2022, 11:06:01 PM
It's using the performance governor, with the stock max frequency.

Just to be clear: the board is *not* overheating, it never reaches 70°C.  The WiFi interface just randomly hangs under network load (not CPU load).
Title: Re: A64 stalls under network load with kernel .105
Post by: LubOlimex on April 29, 2022, 08:49:05 AM
If the whole board stalls, then it is not just the WIFI.

Performance governor basically makes the board ignore temperature settings.

Try ondemand governor. Or maybe even powersave to test if it improves reliability:

echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on April 30, 2022, 04:07:03 PM
> If the whole board stalls, then it is not just the WIFI.

Please read my initial posting again.  "The serial shell is still responsive, but the board is no longer routing until I reboot it."

> Performance governor basically makes the board ignore temperature settings.

I'm pretty sure that's not correct.  I've just confirmed that the CPU frequency slows down when the board reaches 70°C even with the performance governor.
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on June 02, 2022, 01:51:32 AM
It looks like it's this bug: https://bugzilla.kernel.org/show_bug.cgi?id=215542, which is fixed in Linux 5.10.106.  Olimex, could we please have an update with 5.10.106 or later?
Title: Re: A64 stalls under network load with kernel .105
Post by: LubOlimex on June 02, 2022, 01:26:34 PM
Nice find. Sure we will update it, if you check the branches you can see we update it regularly:

https://github.com/OLIMEX/linux-olimex
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on May 18, 2023, 03:24:24 AM
> Nice find. Sure we will update it, if you check the branches you can see we update it regularly:

I've just reflashed my board with A64-OLinuXino-bullseye-minimal-20230515-130040, and the kernel is still 5.10.105.

@Olimex, I've reported this issue almost a year ago... may I please ask that you provide an updated kernel?
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on July 03, 2023, 06:05:49 PM
@LubOlimex: no further kernel updates?
Title: Re: A64 stalls under network load with kernel .105
Post by: LubOlimex on July 04, 2023, 09:10:21 AM
@jch I sent you a personal message more than a month ago with experimental image with newer kernel, but you didn't respond back. Can you check your inbox?
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on July 28, 2023, 06:18:55 PM
Indeed, I missed it.  I'll try to find time to test it this week-end, and report back.
Title: Re: A64 stalls under network load with kernel .105
Post by: jch on July 29, 2023, 11:57:07 PM
@LubOlimex, I've just reflashed a board with the experimental 5.10.180 image that you provided, and configured it as an AP, the exact same configuration that would freeze under .105.

As far as I can tell, it's rock solid: I've downloaded 200MB of data through it, and it's still up (I'm actually posting this message through it).
Title: Re: A64 stalls under network load with kernel .105
Post by: LubOlimex on July 31, 2023, 08:16:45 AM
Thanks for the feedback will forward the info to the developers. Let me know if you notice something strange with that image.