I have 3 A64-OLinuXino-2Ge8G-IND devices, using them as headless debian servers.
Yesterday, one of them became unreachable from the local network.
After using the debug cable, I saw that its date had suddenly changed to 2114 October 7, 05:20:08.
Manually fixing the date resolved the issue. For some reason, this wrong date made a few services to fail (even after a reboot) : at least NetworkManager and MariaDB.
I had the stack below in the logs, at this precise moment.
I'm using ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/buster/images/Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5.7z , with all current updates. Kernel 5.2.5.
What happened? Is it a hardware or software issue?
It's very annoying as it can not be fixed through the network.
[2115903.263561] rcu: INFO: rcu_sched self-detected stall on CPU
[2115903.263579] rcu: 2-...!: (3 ticks this GP) idle=302/0/0x1 softirq=20234276/20234277 fqs=0
[2115903.263583] (t=750639755227 jiffies g=72093289 q=1)
[2115903.263589] rcu: rcu_sched kthread starved for 750639755227 jiffies! g72093289 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[2115903.263591] rcu: RCU grace-period kthread stack dump:
[2115903.263595] rcu_sched I 0 10 2 0x00000028
[2115903.263602] Call trace:
[2115903.263619] __switch_to+0xb4/0x1b8
[2115903.263628] __schedule+0x1f4/0x4a0
[2115903.263632] schedule+0x28/0x98
[2115903.263638] schedule_timeout+0x90/0x3a0
[2115903.263647] rcu_gp_kthread+0x714/0x968
[2115903.263653] kthread+0x124/0x128
[2115903.263657] ret_from_fork+0x10/0x1c
[2115903.263662] Task dump for CPU 0:
[2115903.263665] swapper/0 R running task 0 0 0 0x0000002a
[2115903.263669] Call trace:
[2115903.263673] __switch_to+0xb4/0x1b8
[2115903.263678] 0xffff000010df8000
[2115903.263680] Task dump for CPU 1:
[2115903.263683] swapper/1 R running task 0 0 1 0x0000002a
[2115903.263688] Call trace:
[2115903.263692] __switch_to+0xb4/0x1b8
[2115903.263695] 0xffff000010df8000
[2115903.263697] Task dump for CPU 2:
[2115903.263699] swapper/2 R running task 0 0 1 0x0000002a
[2115903.263704] Call trace:
[2115903.263709] dump_backtrace+0x0/0x140
[2115903.263714] show_stack+0x14/0x20
[2115903.263720] sched_show_task+0xf4/0x128
[2115903.263726] dump_cpu_task+0x40/0x50
[2115903.263731] rcu_dump_cpu_stacks+0xc8/0x118
[2115903.263735] rcu_sched_clock_irq+0xf8/0x7e8
[2115903.263740] update_process_times+0x2c/0x58
[2115903.263747] tick_sched_handle.isra.5+0x30/0x48
[2115903.263751] tick_sched_timer+0x48/0x98
[2115903.263756] __hrtimer_run_queues+0xfc/0x218
[2115903.263760] hrtimer_interrupt+0xf8/0x2d0
[2115903.263766] arch_timer_handler_phys+0x28/0x40
[2115903.263771] handle_percpu_devid_irq+0x80/0x140
[2115903.263777] generic_handle_irq+0x24/0x38
[2115903.263782] __handle_domain_irq+0x5c/0xb0
[2115903.263786] gic_handle_irq+0x58/0xa8
[2115903.263789] el1_irq+0xb8/0x140
[2115903.263793] arch_cpu_idle+0x10/0x18
[2115903.263797] do_idle+0x1e0/0x2c0
[2115903.263801] cpu_startup_entry+0x20/0x28
[2115903.263807] secondary_start_kernel+0x190/0x1d0
[2115903.263810] Task dump for CPU 3:
[2115903.263812] swapper/3 R running task 0 0 1 0x0000002a
[2115903.263817] Call trace:
[2115903.263821] __switch_to+0xb4/0x1b8
[2115903.263824] 0xffff000010df8000
Hello,
It is a known bug. It is supposed to be fixed in 5.3 (which is yet to be released). You can read more about it here: https://forum.armbian.com/topic/7423-pine64-massive-datetime-clock-problem/ and here: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
There is a workaround if you can't wait until 5.3 it is described here: https://github.com/torvalds/linux/commit/c950ca8c35eeb32224a63adc47e12f9e226da241#diff-27e9d89068f8125140ea388216ae3140
There is a tool that would allow you to test if the bug still occurs: https://raw.githubusercontent.com/apritzel/pine64/d540a917dc1fdd988ebed3630afc91c4c1e7dd1d/tools/test_timer.c
Many thanks for this detailed answer.
To sum up :
- It comes from a hardware issue in the Allwinner A64 chip, and affects all boards using the A64 (not only from Olimex)
- Allwinner is informed but doesn't care much
- A software workaround is implemented in kernel 5.3, and might be backported in previous kernel versions if necessary. But it involves recompiling the kernel in any case.
Now that the kernel 5.3 is officially released, when will you release a new version of your images in ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/ ? And/or a kernel update in http://repository.olimex.com ?
Soon™
Well, we're now Soon™ + one month...
I saw that kernel 5.3 has been released for a13 : https://olimex.wordpress.com/2019/11/06/new-mainline-linux-images-with-kernel-5-3-8-for-a13-olinuxino-and-a13-som-are-uploaded/
Any update for a kernel >=5.3 for A64 that fixes the date issue?
Soon™ + 2 months...
It's still unclear to me what differs between the armbian version distributed by Olimex and the upstream armbian version from https://www.armbian.com/olimex-lime-a64/ (which already has kernel 5.3, and looks stable. But version Armbian_19.11.3_Lime-a64_buster_current_5.3.9.7z does not seem to be able to install to eMMC : I hope it's only a temporary issue)
Quote from: mossroy on December 28, 2019, 12:21:33 PMBut version Armbian_19.11.3_Lime-a64_buster_current_5.3.9.7z does not seem to be able to install to eMMC
My Olinuxino-A64 (1Ge4GW) is running from eMMC for more than a year, and I've recently installed my own kernel build 5.5.0-rc2.
What "dmesg | grep -i mmc" is reporting on yours ?
Quote from: martinayotte on December 28, 2019, 03:49:16 PMMy Olinuxino-A64 (1Ge4GW) is running from eMMC for more than a year, and I've recently installed my own kernel build 5.5.0-rc2.
What "dmesg | grep -i mmc" is reporting on yours ?
Using Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5 image (from Olimex), eMMC is available, and "dmesg | grep -i mmc" gives :
[ 3.603833] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 3.629312] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 3.655378] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[ 3.680544] mmc0: new high speed SDHC card at address 0007
[ 3.683008] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[ 3.684725] mmcblk0: p1
[ 3.729736] mmc1: new high speed MMC card at address 0001
[ 3.732172] mmcblk1: mmc1:0001 Q2J55L 7.09 GiB
[ 3.733974] mmcblk1boot0: mmc1:0001 Q2J55L partition 1 16.0 MiB
[ 3.735800] mmcblk1boot1: mmc1:0001 Q2J55L partition 2 16.0 MiB
[ 3.737210] mmcblk1: p1
[ 5.898007] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 8.469645] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
Using Armbian_19.11.3_Lime-a64_buster_current_5.3.9 (from Armbian), eMMC is not detected, and the same command gives :
[ 2.605615] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 2.630895] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.631937] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[ 2.655592] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.671555] mmc0: new high speed SDHC card at address 0007
[ 2.673093] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[ 2.674281] mmcblk0: p1
[ 2.681172] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[ 2.693501] sunxi-mmc 1c11000.mmc: no support for card's volts
[ 2.693507] mmc2: error -22 whilst initialising SDIO card
[ 4.949590] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 6.676378] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
I also tested the unsupported Armbian_19.11.4.356_Lime-a64_buster_dev_5.4.6_minimal (from Armbian). eMMC is also undetected, and "dmesg | grep -i mmc" gives :
[ 2.476549] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 2.501821] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.502875] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[ 2.526529] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.539784] mmc0: new high speed SDHC card at address 0007
[ 2.540613] mmcblk0: mmc0:0007 SD04G 3.71 GiB
[ 2.542829] mmcblk0: p1
[ 2.552931] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[ 2.565206] sunxi-mmc 1c11000.mmc: no support for card's volts
[ 2.565213] mmc2: error -22 whilst initialising SDIO card
[ 3.927959] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 6.864042] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
[ 21.662476] EXT4-fs (mmcblk0p1): resizing filesystem from 190464 to 919552 blocks
[ 26.195849] EXT4-fs (mmcblk0p1): resized filesystem to 919552
Strange ...
I've downloaded and booted the Armbian_19.11.3_Lime-a64_buster_current_5.3.9_minimal.img.
I was able to mount eMMC without issue :
root@lime:~# dmesg | grep mmc
[ 3.971826] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 4.001549] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 4.021533] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[ 4.049970] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 4.079874] mmc0: new high speed SDHC card at address 0007
[ 4.086860] mmcblk0: mmc0:0007 SD32G 29.9 GiB
[ 4.093169] mmcblk0: p1
[ 4.097128] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[ 4.239781] mmc1: new high speed SDIO card at address 0001
[ 4.534821] mmc2: new DDR MMC card at address 0001
[ 4.541531] mmcblk2: mmc2:0001 P1XXXX 3.60 GiB
[ 4.547368] mmcblk2boot0: mmc2:0001 P1XXXX partition 1 16.0 MiB
[ 4.554767] mmcblk2boot1: mmc2:0001 P1XXXX partition 2 16.0 MiB
[ 4.562367] mmcblk2: p1
[ 5.277142] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 7.464783] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
[ 116.548529] EXT4-fs (mmcblk2p1): mounted filesystem with ordered data mode. Opts: (null)
Maybe you should try to build your own Armbian image using their build script ...
I exchanged the microSD cards with another olinuxino A64 (same model), to check it's not a hardware issue. It's not : I have the same result on both devices.
I also tested the latest Armbian_20.02.0-rc0_Lime-a64_buster_current_5.4.12.img, same result too :
root@lime:~# dmesg | grep mmc
[ 2.535544] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 2.560804] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.561852] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[ 2.585507] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.598670] mmc0: new high speed SDHC card at address 0007
[ 2.599450] mmcblk0: mmc0:0007 SD04G 3.71 GiB
[ 2.601893] mmcblk0: p1
[ 2.612057] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[ 2.624333] sunxi-mmc 1c11000.mmc: no support for card's volts
[ 2.624357] mmc2: error -22 whilst initialising SDIO card
[ 3.952318] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 6.765327] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
[ 19.514557] EXT4-fs (mmcblk0p1): resizing filesystem from 378880 to 919552 blocks
[ 22.647542] EXT4-fs (mmcblk0p1): resized filesystem to 919552
With this latest image, I also tried to switch to kernel 5.5.0-rc6, which gives the same result :
dmesg | grep mmc
[ 2.583998] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 2.609265] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.610288] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[ 2.633956] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.649927] mmc0: new high speed SDHC card at address 0007
[ 2.650732] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[ 2.652700] mmcblk0: p1
[ 2.660149] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[ 2.672457] sunxi-mmc 1c11000.mmc: no support for card's volts
[ 2.672463] mmc2: error -22 whilst initialising SDIO card
[ 3.980770] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 5.603862] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
And same with kernel 4.19.63 :
[ 2.045784] sunxi-mmc 1c0f000.mmc: Linked as a consumer to regulator.2
[ 2.046171] sunxi-mmc 1c10000.mmc: Linked as a consumer to regulator.9
[ 2.046234] sunxi-mmc 1c10000.mmc: Linked as a consumer to regulator.14
[ 2.046246] sunxi-mmc 1c11000.mmc: Linked as a consumer to regulator.1
[ 2.064426] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[ 2.064924] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[ 2.089729] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.089733] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB
[ 2.089738] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[ 2.095163] sunxi-mmc 1c11000.mmc: no support for card's volts
[ 2.095169] mmc2: error -22 whilst initialising SDIO card
[ 2.129603] mmc0: new high speed SDHC card at address 0007
[ 2.131212] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[ 2.132370] mmcblk0: p1
[ 6.442146] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[ 8.030362] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
About the date bug and a new image released by Olimex, we're now on "Soon™ + 3 months"...
I am not convinced that the timer issue persists with our software images in first place. I could not replicate the problem in my tests so far. Ran both Olimex images Ubuntu and Debian and couldn't replicate it. Also ran my tests on couple of A64 boards. Can you do the following:
Use of the official images:
ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/bionic/images/
or
ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/buster/images/
Run this test program: https://github.com/apritzel/pine64/blob/master/tools/test_timer.c
Download it to a directory on the board, compile it with:
gcc -o test_timer test_timer.c
and run it with
./test_timer
I got the software from here: https://patchwork.kernel.org/patch/10392891/#21914739
These are my results with Ubuntu (Debian is the same):
root@olinuxino:/home/test_timer# date
Wed Jan 29 12:59:03 UTC 2020
root@olinuxino:/home/test_timer# ./test_timer
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
ok 2 native counter reads are monotonic # 0 errors
# min: 6, avg: 6, max: 11406
ok 3 Linux counter reads are monotonic # 0 errors
# min: 458, avg: 490, max: 64625
# core 0: counter value: 181435460523 => 7559 sec
# core 0: offsets: back-to-back: 9, b-t-b synced: 6, b-t-b w/ delay: 8
# core 1: counter value: 181435462090 => 7559 sec
# core 1: offsets: back-to-back: 8, b-t-b synced: 7, b-t-b w/ delay: 8
# core 2: counter value: 181435463546 => 7559 sec
# core 2: offsets: back-to-back: 8, b-t-b synced: 10, b-t-b w/ delay: 8
# core 3: counter value: 181435464907 => 7559 sec
# core 3: offsets: back-to-back: 12, b-t-b synced: 6, b-t-b w/ delay: 9
1..3
It's true that I did not have this issue again since October 2019. Even if my boards are running 24/7.
I compiled and ran the test_timer.c for a few hours (in a loop) on 2 A64-OLinuXino-2Ge8G-IND boards.
It gave similar results as yours, and the date did not change :
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
ok 2 native counter reads are monotonic # 0 errors
# min: 6, avg: 6, max: 887
ok 3 Linux counter reads are monotonic # 0 errors
# min: 458, avg: 483, max: 75667
# core 0: counter value: 20877018122385 => 869875 sec
# core 0: offsets: back-to-back: 15, b-t-b synced: 7, b-t-b w/ delay: 12
# core 1: counter value: 20877018123819 => 869875 sec
# core 1: offsets: back-to-back: 14, b-t-b synced: 7, b-t-b w/ delay: 9
# core 2: counter value: 20877018128599 => 869875 sec
# core 2: offsets: back-to-back: 12, b-t-b synced: 6, b-t-b w/ delay: 9
# core 3: counter value: 20877018129983 => 869875 sec
# core 3: offsets: back-to-back: 8, b-t-b synced: 7, b-t-b w/ delay: 9
1..3
So I'd say it's encouraging.
On the other hand, I was already using the latest image you mention (Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5.7z) when I had the issue. See my first post of this topic.
So I don't see what could have fixed the issue in between (except some software update?).
Reading https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/page/4/, some people seem to say that keeping systemd-timesyncd service disabled helps (this service is not running on my boards, because /usr/sbin/ntpd does not exist). But it's an old thread.
The thread https://forum.armbian.com/topic/7423-pine64-massive-datetime-clock-problem/page/3/ is more recent, and they have made a kernel patch that seems to work well (based on the comments).
I just had this same issue again on a very recent image from olimex (A64-OLinuXino-buster-minimal-20201008-174232.img.7z).
Same symptom : date jumps in 2015 (December 23) and the ethernet network is stopped.
This is a very big problem on headless servers, as they become unreachable.
Here is an excerpt of syslog :
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.326498] rcu: INFO: rcu_sched self-detected stall on CPU
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.332258] rcu: 3-...!: (1 ticks this GP) idle=b06/0/0x1 softirq=309435/309435 fqs=0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.340422] (t=11812 jiffies g=1294529 q=30)
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.340430] rcu: rcu_sched kthread starved for 11812 jiffies! g1294529 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=3
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.351106] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.360221] rcu: RCU grace-period kthread stack dump:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365437] rcu_sched I 0 10 2 0x00000028
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365446] Call trace:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365462] __switch_to+0xf8/0x198
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365474] __schedule+0x238/0x670
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365480] schedule+0x58/0xe8
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365487] schedule_timeout+0x180/0x398
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365496] rcu_gp_kthread+0x410/0xb60
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365502] kthread+0x12c/0x130
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365508] ret_from_fork+0x10/0x30
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365518] Task dump for CPU 0:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365522] swapper/0 R running task 0 0 0 0x00000028
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365529] Call trace:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365536] __switch_to+0xf8/0x198
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365541] 0x0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365545] Task dump for CPU 1:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365548] swapper/1 R running task 0 0 1 0x0000002a
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365555] Call trace:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365561] __switch_to+0xf8/0x198
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365565] 0x0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365569] Task dump for CPU 3:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365572] swapper/3 R running task 0 0 1 0x0000002a
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365578] Call trace:
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365586] dump_backtrace+0x0/0x1b8
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365593] show_stack+0x20/0x30
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365600] sched_show_task+0x150/0x180
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365607] dump_cpu_task+0x4c/0x60
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365613] rcu_dump_cpu_stacks+0xc4/0x10c
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365618] rcu_sched_clock_irq+0x7c0/0x9b0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365627] update_process_times+0x38/0x98
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365636] tick_sched_handle.isra.19+0x48/0x58
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365642] tick_sched_timer+0x54/0xb0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365647] __hrtimer_run_queues+0x10c/0x360
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365652] hrtimer_interrupt+0x11c/0x2f0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365662] arch_timer_handler_phys+0x38/0x48
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365670] handle_percpu_devid_irq+0x94/0x240
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365677] generic_handle_irq+0x38/0x50
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365682] __handle_domain_irq+0x6c/0xc8
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365688] gic_handle_irq+0x5c/0xb8
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365693] el1_irq+0xb8/0x140
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365699] arch_cpu_idle+0x40/0x1d8
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365704] default_idle_call+0x3c/0x60
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365711] do_idle+0x228/0x2a0
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365717] cpu_startup_entry+0x2c/0x98
Oct 29 23:34:05 a64-olinuxino kernel: [1324864.365726] secondary_start_kernel+0x15c/0x170
Oct 29 23:35:01 a64-olinuxino CRON[26942]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.590436] rcu: INFO: rcu_sched self-detected stall on CPU
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.596203] rcu: 1-...0: (1 GPs behind) idle=c9e/0/0x1 softirq=340248/340249 fqs=2
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604115] (t=750629252211 jiffies g=1294529 q=2003)
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604121] Task dump for CPU 1:
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604127] swapper/1 R running task 0 0 1 0x0000002a
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604136] Call trace:
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604152] dump_backtrace+0x0/0x1b8
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604162] show_stack+0x20/0x30
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604170] sched_show_task+0x150/0x180
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604178] dump_cpu_task+0x4c/0x60
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604185] rcu_dump_cpu_stacks+0xc4/0x10c
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604190] rcu_sched_clock_irq+0x7c0/0x9b0
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604198] update_process_times+0x38/0x98
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604207] tick_sched_handle.isra.19+0x48/0x58
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604213] tick_sched_timer+0x54/0xb0
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604219] __hrtimer_run_queues+0x10c/0x360
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604224] hrtimer_interrupt+0x11c/0x2f0
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604234] arch_timer_handler_phys+0x38/0x48
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604243] handle_percpu_devid_irq+0x94/0x240
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604249] generic_handle_irq+0x38/0x50
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604255] __handle_domain_irq+0x6c/0xc8
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604260] gic_handle_irq+0x5c/0xb8
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604265] el1_irq+0xb8/0x140
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604272] arch_cpu_idle+0x40/0x1d8
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604281] default_idle_call+0x3c/0x60
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604288] do_idle+0x228/0x2a0
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604294] cpu_startup_entry+0x30/0x98
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604302] secondary_start_kernel+0x15c/0x170
Dec 23 08:04:14 a64-olinuxino kernel: [1325159.604357] hrtimer: interrupt took 3002517008854765534 ns
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.150574] rcu: INFO: rcu_sched self-detected stall on CPU
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.156334] rcu: 1-...!: (3 ticks this GP) idle=5b6/1/0x4000000000000002 softirq=348968/348970 fqs=1
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.165800] (t=12887 jiffies g=1311021 q=624)
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.165808] rcu: rcu_sched kthread starved for 12885 jiffies! g1311021 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.176486] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.185603] rcu: RCU grace-period kthread stack dump:
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190823] rcu_sched I 0 10 2 0x00000028
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190834] Call trace:
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190852] __switch_to+0xf8/0x198
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190864] __schedule+0x238/0x670
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190871] schedule+0x58/0xe8
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190878] schedule_timeout+0x180/0x398
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190890] rcu_gp_kthread+0x410/0xb60
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190896] kthread+0x12c/0x130
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190903] ret_from_fork+0x10/0x30
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190932] Task dump for CPU 1:
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190937] systemd-journal R running task 0 16594 1 0x00000802
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190946] Call trace:
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190954] dump_backtrace+0x0/0x1b8
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190961] show_stack+0x20/0x30
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190968] sched_show_task+0x150/0x180
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190976] dump_cpu_task+0x4c/0x60
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190981] rcu_dump_cpu_stacks+0xc4/0x10c
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190987] rcu_sched_clock_irq+0x7c0/0x9b0
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.190997] update_process_times+0x38/0x98
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191007] tick_sched_handle.isra.19+0x48/0x58
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191013] tick_sched_timer+0x54/0xb0
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191018] __hrtimer_run_queues+0x10c/0x360
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191023] hrtimer_interrupt+0x11c/0x2f0
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191034] arch_timer_handler_phys+0x38/0x48
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191044] handle_percpu_devid_irq+0x94/0x240
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191050] generic_handle_irq+0x38/0x50
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191056] __handle_domain_irq+0x6c/0xc8
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191061] gic_handle_irq+0x5c/0xb8
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191066] el1_irq+0xb8/0x140
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191074] ___bpf_prog_run+0xad8/0x19e0
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191079] __bpf_prog_run32+0x44/0x68
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191090] __seccomp_filter+0x88/0x620
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191097] __secure_computing+0x44/0xd0
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191104] syscall_trace_enter+0x1a4/0x1d8
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191113] el0_svc_common.constprop.3+0x60/0x178
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191120] do_el0_svc+0x2c/0x98
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191126] el0_sync_handler+0x148/0x1a8
Dec 23 08:05:31 a64-olinuxino kernel: [1325237.191131] el0_sync+0x158/0x180
Dec 23 08:12:34 a64-olinuxino kernel: [1325659.690143] power_supply axp813-ac: driver failed to report `online' property: -110
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.545378] rcu: INFO: rcu_sched self-detected stall on CPU
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.551136] rcu: 0-...0: (7045096 ticks this GP) idle=22e/1/0x4000000000000002 softirq=631070/631074 fqs=2422
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561381] (t=5253 jiffies g=1391941 q=6452)
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561385] Task dump for CPU 0:
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561390] systemd-journal R running task 0 16594 1 0x00000802
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561400] Call trace:
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561416] dump_backtrace+0x0/0x1b8
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561425] show_stack+0x20/0x30
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561433] sched_show_task+0x150/0x180
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561440] dump_cpu_task+0x4c/0x60
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561447] rcu_dump_cpu_stacks+0xc4/0x10c
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561453] rcu_sched_clock_irq+0x7c0/0x9b0
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561461] update_process_times+0x38/0x98
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561470] tick_sched_handle.isra.19+0x48/0x58
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561476] tick_sched_timer+0x54/0xb0
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561481] __hrtimer_run_queues+0x10c/0x360
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561486] hrtimer_interrupt+0x11c/0x2f0
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561496] arch_timer_handler_phys+0x38/0x48
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561505] handle_percpu_devid_irq+0x94/0x240
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561511] generic_handle_irq+0x38/0x50
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561517] __handle_domain_irq+0x6c/0xc8
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561522] gic_handle_irq+0x5c/0xb8
Dec 23 08:12:53 a64-olinuxino kernel: [1325678.561528] el0_irq_naked+0x4c/0x54
This sounds like the bug in the A64 chip which Linux tries to work around - I'm not sure if the workaround actually works.
You can read about it in posts on the linux-sunxi ML and probably elsewhere.
John
This issue is still there.
I recently had it on the latest kernel 5.8.18 (see https://www.olimex.com/forum/index.php?topic=7976.msg30112#msg30112).
And today I had it on kernel 5.10.4 (from "staging" repository of olimex) : my board jumped to Thu 27 Feb 23:48:54 CET 2116
There must be a software way to fix or workaround this issue, because some of my boards do not seem to have it (at least I don't remember they had it)
Here is some detail :
- board 1, installed with image A64-OLinuXino-buster-minimal-20200522-193443.img, with kernel 5.8.18 : stable
- board 2, installed with Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5.img, kernel 5.2.5 : stable
- board 3, installed with A64-OLinuXino-buster-minimal-20200601-131837.img, with kernel 5.8.18 : stable
- board 4, installed with A64-OLinuXino-buster-minimal-20201207-193928.img, with kernel 5.8.18 : timer issue
- board 5, installed with A64-OLinuXino-buster-minimal-20201207-193928.img, with kernel 5.10.4 (from staging olimex repository) : timer issue
Only the last two boards, freshly installed with a recent image, recently had the timer issue.
You can see in some other posts I did here that I also had the timer issue with image A64-OLinuXino-buster-minimal-20201008-174232.img and image A64-OLinuXino-buster-minimal-20201217-194545.img
Could it be possible that recent images have a regression on the timer issue?
Of course, there might be other factors, I'm not sure at all.
Quote from: mossroy on January 06, 2021, 12:48:14 AMCould it be possible that recent images have a regression on the timer issue?
Yes, because the workaround to fix the time jump issue caused sluggishness that affected everything else. So no workaround applied in latest releases. We are waiting for a better workaround that doesn't cause bigger issues.
Doesn't enabling NTP work for you?
ntp status depends on the board :
- board 1 : systemd-timesyncd does not start because of a systemd condition : "ConditionFileIsExecutable=!/usr/sbin/ntpd was not met". ntp service is running but reports "kernel reports TIME_ERROR: 0x41: Clock Unsynchronized"
- board 2 : systemd-timesyncd does not start because of a systemd condition : "ConditionFileIsExecutable=!/usr/sbin/ntpd was not met". ntp service is running but reports "kernel reports TIME_ERROR: 0x41: Clock Unsynchronized"
- board 3 : systemd-timesyncd running
- board 4 : systemd-timesyncd running
- board 5 : systemd-timesyncd running
Please advise on what would be the right way to configure NTP, if it could be a workaround.
Regarding the sluggishness of recent images, I'm aware of that issue.
On the other hand, changing the governor to something different than "ondemand" seemed to be a workaround.
IMHO, leaving the timer issue in your images is much worse than forcing the governor in them.
As already explained here, the timer jump symptom has critical consequences when you use the board as a headless server (which is my case) :
- the board looses its network access : you can not fix it remotely. You have to physically plug a debug cable, and login in console mode to change the date back
- some software does not support this time change gracefully. Sometimes "touching" the files to set their last modification date is enough. Sometimes it's not : in particular, I did not manage to fix prometheus data, which is considered corrupt
Just out of interest, are you having this discussion with the people responsible for the software (linux-sunxi people) as well, because they need to know your actual experience don't they?
Of course the real problem is the faulty silicon CPU from Allwinner...
John
I did not contact linux-sunxi. I'm not 100% sure it only comes from the kernel (see the detail of my boards in a post above : some have the same kernel and not the same behavior).
I would expect Olimex to do what is necessary to provide the right images.
I know the original issue is a hardware one in the Allwinner A64 CPU. But Olimex has IMHO a responsibility to sell usable boards (stable, secure, with reasonably up-to-date software, able to use all the hardware capabilities, upgradable, etc).
They also have enough boards to test (I don't), precise knowledge of their images and their roadmap.
I'm still willing to help if/when I can. I try to give precise feedback when I see problems, with possible solutions when I can.
On the other hand, I bought my boards to actually use them. I did not expect to be a beta-tester of Olimex images, 2 years after I bought my first one.
I'm running many things on them and need them to be stable servers.
I can't keep unstable boards to test workarounds. For example I'm currently reinstalling my boards 4 and 5 with older images to make them stable again : sorry if it's a loss of a test-case but I don't have a choice.
I'm also a bit disappointed by the recent problems in Olimex images (see latest A64 topics). Most of them could have been avoided with more checking/testing on Olimex side IMHO. I'm also disappointed by the many "soon" timelines given by Olimex, that are too vague to plan, and sometimes become a "never".
Anyway, I'm still trying to stay positive and I don't understimate the difficulty on Olimex side. I'm very happy with my A20 Olinuxino boards, that are stable and 100% working on Debian for years now. I hope it will happen soon with A64 ones, and that I will be able to continue to be an Olinuxino supporter.
So far as I can tell no-one can do what you ask unless Allwinner issue working chips but you can get a better answer from devs on linux-sunxi I guess.
John
We will probably try to revert the clock changes and set "performance" governor, something that you reported as working.
Unfortunately, we can't test all the things all the time. Especially, since we are trying to move forward to newer kernels and some new issues get introduced.
We can provide something outdated and tested that would never move forward and would probably have everything working.
@mossroy we really appreciate your feedback and efforts.
Linux never have been our strong side and we rely on Linux-Sunxi community mostly for the software support. We have two developers who work on the Linux images but we can only do what we can do - all patches we do are submitted upstream only for the issues we understand and can fix.
This clock bug for A64 is not Olimex board specific, it is also in all current Linux distributions and same in all A64 boards in the market. It will be fixed when someone who know how to fix it do it. DO not expect from us more than we can do.
What we can do is to discuss with you and other customers which work arond works best for as many of you as possible.
In this case the frequency may be set to performance which means processor will waste too much energy and power or to minimal. Both are not universal solution. Our people wrongly assumed the date problem is minor to the flexibility to change processor performance over the time.
Again we appreciate your feedback, please do not be disappointed by us, just this issue behind our capabilities to fix.
Thanks for your answers.
I'm OK to participate in the discussion you mentioned, to try to find an acceptable solution for most customers.
My usage (as headless servers) is certainly not representative of all your customers, so you'll need to have feedback from other ones, too.
We might be not very far from the goal, because I already have some A64 boards that are working, seem stable (for my usage), and use the default ondemand governor (boards 1 and 3).
I personally don't need a kernel newer than 5.6. I just need future patches on the kernel, without regression risks.
If some customers need a cutting-edge kernel, you might provide an easy way to upgrade it (by activating a new repository, like "staging"?), while keeping a stable one by default.
Olimex is responsible for providing suitable images for their boards (at least until the boards are fully supported by Debian installer, which unfortunately will not yet be the case with Debian 11 Bullseye).
Your community/customers can help, but the communication with them (us) needs IMHO to be improved. It's sometimes hard to have precise answers (or even answers), or to know what has changed between 2 versions, or to have information on your roadmap.
And changes have to be tested enough to be safe.
Quote from: mossroy on January 12, 2021, 11:10:30 PMYour community/customers can help, but the communication with them (us) needs IMHO to be improved. It's sometimes hard to have precise answers (or even answers), or to know what has changed between 2 versions, or to have information on your roadmap.
I see that a new image http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20210127-100834.img.7z is now available
Could you tell us the changes that have been made?
Some of the changes can be found on https://github.com/OLIMEX/olimage/commits/master , but I suppose it's not all of them? (and sometimes the commit comments don't give the purpose of the change)
I would suggest to provide a changelog for each version, and announce these new versions somewhere (sorry if it's already done and I missed it?)
NB : I'm still available for the discussion you mentioned
Mainly updated u-boot 2021.01 and kernel to 5.10.x, the rest can be seen, as you mentioned, here:
https://github.com/OLIMEX/olimage/commits/master
More like here:
https://github.com/OLIMEX/linux-olimex/commits/release-20210127-v5.10.10
Sorry, but I can't help without a proper changelog, and some cooperation.
Thanks for the 2 URLs you provided but :
- the second URL https://github.com/OLIMEX/linux-olimex/commits/release-20210127-v5.10.10 does not work any more : why did you remove this branch?
- I did not find in these repositories some changes that I noticed (but maybe I missed them) : the missing "ondemand" governor, see below
- another release has been made in the meantime : http://images.olimex.com/release/a64/A64-OLinuXino-buster-minimal-20210203-221802.img.7z . The kernel seems to be upgraded to 5.10.12, but did you do any other change?
I'm asking for some communication when a new release is out : give us the information, and a changelog. For now, I did not find any other solution than manually "polling" http://images.olimex.com/release/a64/ , and test random things.
The ping issue seems to be fixed in this release. That's good news, and would be worth mentioning in a changelog, don't you think?
It seems that the default governor is "performance" in this A64-OLinuXino-buster-minimal-20210127-100834 release (probably as a workaround for the time issue, like you mentioned earlier)
I wanted to test it with "ondemand" governor, to check if the sluggishness was gone with it or not, but this governor is not available any more :
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative userspace powersave performance schedutil
On older versions, it was there :
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative userspace powersave ondemand performance schedutil
Is it on purpose? Did you check that the symptoms of https://www.olimex.com/forum/index.php?topic=7976.0 are still there with this release if the governor is set to ondemand? We can not test by ourself any more.
NB : I'm still available for the discussion you mentioned, but you need to help us to help you. Else I'll give up
5.10.10 branch was removed because it had critical bug and new 5.10.12 branch was made. Maybe just go here:
https://github.com/OLIMEX/linux-olimex/commits/
As far as I understood from the developers, there are two options:
- different scaling governors and date jump bug present
- only performance scaling governor and no date jump bug present
We went for #2 in this release.
Change log is not coming, it would be the same as commit log. It makes no sense.
QuoteI'm still available for the discussion you mentioned, but you need to help us to help you. Else I'll give up
What do you mean or expect? I thought this exchange is the discussion.
I'll try to rephrase.
The boards I bought are used in production.
After 2 years, some of them are still not stable, and the other ones use old versions (without security patches on the kernel)
Yesterday another board became unreachable because of a time jump (I had to reinstall it last month with A64-OLinuXino-buster-minimal-20200601-131837.img, it used kernel 5.6.14, which I thought had the time jump fix)
I have another board that I reinstalled recently with A64-OLinuXino-buster-minimal-20201207-193928.img (kernel 5.10.10), but you just said here that it has a "critical bug" (what is it? which impact? which workaround?), and that we should use A64-OLinuXino-buster-minimal-20210203-221802.img (kernel 5.10.12) instead.
Now I see that the 5.10.12 branch has been removed too, and the image has been replaced by A64-OLinuXino-buster-minimal-20210210-151806.img (kernel 5.10.14), with no explanation. Did it have a critical bug, too? When should I expect to have a good image?
Please try to put yourself in my shoes : images and branches appear and disappear, some have critical bugs but I don't have more information. Existing boards have some kernel upgrades through your repository, so they will probably get the critical bugs if I upgrade them?
What should I do? Reinstalling them with the latest image seems unsafe for the moment, plus I can't do that every 2 weeks! Upgrading the existing boards might be unsafe too (plus regressions happened in the past), but I can't keep old kernels because I also need security updates.
This situation puts me in the state of a beta-tester (which is not expected after 2 years, and not compatible with production), without even enough information to beta-test (see below). As I said earlier, I was an Olimex supporter, but this situation is not acceptable.
Having a change log (sometimes called "release notes") would make sense IMHO. It's good to have the commit log, but it does not seem to provide some other crucial information :
- the fact that some images have a critical bug, and what people using it should do (we can't reinstall all the time)
- the changes between 2 images : the commit log seems to be a rebase of your patches on each kernel version. So it's almost the same between 2 releases : it does not provide a summary of the changes between 2 releases (especially when the previous branches are removed)
- the changes that are not in the kernel, like removing a scaling governor and forcing another one
- the intentions behind these changes. A commit log on github is very technical and does not explain why these changes are done, what is the expected result etc
- the steps that are required to have a working board after installation, like disabling the display to have a stable network, as mentioned here : https://www.olimex.com/forum/index.php?topic=7921.msg29820#msg29820
Release notes might also avoid some questions asked in this forum, and time wasted on both sides.
If a stable image is finally released, the subsequent kernel upgrades should be safe too (i.e provide only the security patches, or be very carefully tested before putting them in your repo)
If you think all of this is out of reach in a short delay, I'm asking for a refund and will stop bugging you here.
NB : Regarding the "performance" governor, I'm concerned about its impact on CPU temperature (in the summer) and lifespan, but I might be wrong.
Hi,
Your questions have reason, and I ordered change log to be add to the images.
The time jump is hardware processor related. The random nature of appearance means that this will be hard to impossible to work around. All A64 boards on the market suffer from it.
Our developers do their the best. Having your feed back help us to improve things. They are notified about this post.
Tsvetan
I saw that there is now a changelog here : http://images.olimex.com/changelog.txt
That's good news, thanks.
Unfortunately, you're still trying to rewrite history : the changelog of an image has been removed :
Yesterday, there was an image called A64-OLinuXino-buster-minimal-20210317-113356.img.7z in http://images.olimex.com/release/a64/, and it was appearing in your new changelog. By memory, it was mentioning the kernel upgrade to 5.10.23 and another change regarding stm32.
Today, there is another image called A64-OLinuXino-buster-minimal-20210318-122357.img.7z, and the changelog has been rewritten : there is no mention of 20210317 any more. Its changes have been apparently "merged" with the changes of 20210318.
And the corresponding branch has also disappeared from github :-(
Why do you do such thing?
When your users have downloaded and installed 20210317, what should they do when they discover that their version is not in the changelog any more, not in github any more, like if it never existed?
I suppose there was a problem in this image, and you quickly replaced it with 20210318? In this case, leave version 20210317 in the changelog, add an explanation of what was wrong with it, and explain us what we should do if we already installed it : is an "apt update && apt upgrade" enough, for example?
Again, put yourself in our shoes : your images are installed, they are used by real people that need to maintain the software running on it, and that can not reinstall all their boards every day.
I don't blame the developers here. I blame the poor communication, especially when you repeat the same mistake I explained in my previous post here (making a released image "disappear" with no information on what to do for people who used it, just like if it was a temporary beta version)
By the way, the dates seem wrong in your http://images.olimex.com/changelog.txt : they are all in 2020. I suppose you meant 2021.
I would also advise to make this changelog more easy to find. For example you could put a link to it in the pages where you provide links to images (like https://www.olimex.com/Products/OLinuXino/A64/A64-OLinuXino/open-source-hardware)