Date suddenly changed to 2114

Started by mossroy, October 16, 2019, 12:42:30 PM

Previous topic - Next topic

mossroy

I have 3 A64-OLinuXino-2Ge8G-IND devices, using them as headless debian servers.

Yesterday, one of them became unreachable from the local network.
After using the debug cable, I saw that its date had suddenly changed to 2114 October 7, 05:20:08.
Manually fixing the date resolved the issue. For some reason, this wrong date made a few services to fail (even after a reboot) : at least NetworkManager and MariaDB.

I had the stack below in the logs, at this precise moment.
I'm using ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/buster/images/Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5.7z , with all current updates. Kernel 5.2.5.

What happened? Is it a hardware or software issue?
It's very annoying as it can not be fixed through the network.

[2115903.263561] rcu: INFO: rcu_sched self-detected stall on CPU
[2115903.263579] rcu:   2-...!: (3 ticks this GP) idle=302/0/0x1 softirq=20234276/20234277 fqs=0
[2115903.263583]        (t=750639755227 jiffies g=72093289 q=1)
[2115903.263589] rcu: rcu_sched kthread starved for 750639755227 jiffies! g72093289 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[2115903.263591] rcu: RCU grace-period kthread stack dump:
[2115903.263595] rcu_sched       I    0    10      2 0x00000028
[2115903.263602] Call trace:
[2115903.263619]  __switch_to+0xb4/0x1b8
[2115903.263628]  __schedule+0x1f4/0x4a0
[2115903.263632]  schedule+0x28/0x98
[2115903.263638]  schedule_timeout+0x90/0x3a0
[2115903.263647]  rcu_gp_kthread+0x714/0x968
[2115903.263653]  kthread+0x124/0x128
[2115903.263657]  ret_from_fork+0x10/0x1c
[2115903.263662] Task dump for CPU 0:
[2115903.263665] swapper/0       R  running task        0     0      0 0x0000002a
[2115903.263669] Call trace:
[2115903.263673]  __switch_to+0xb4/0x1b8
[2115903.263678]  0xffff000010df8000
[2115903.263680] Task dump for CPU 1:
[2115903.263683] swapper/1       R  running task        0     0      1 0x0000002a
[2115903.263688] Call trace:
[2115903.263692]  __switch_to+0xb4/0x1b8
[2115903.263695]  0xffff000010df8000
[2115903.263697] Task dump for CPU 2:
[2115903.263699] swapper/2       R  running task        0     0      1 0x0000002a
[2115903.263704] Call trace:
[2115903.263709]  dump_backtrace+0x0/0x140
[2115903.263714]  show_stack+0x14/0x20
[2115903.263720]  sched_show_task+0xf4/0x128
[2115903.263726]  dump_cpu_task+0x40/0x50
[2115903.263731]  rcu_dump_cpu_stacks+0xc8/0x118
[2115903.263735]  rcu_sched_clock_irq+0xf8/0x7e8
[2115903.263740]  update_process_times+0x2c/0x58
[2115903.263747]  tick_sched_handle.isra.5+0x30/0x48
[2115903.263751]  tick_sched_timer+0x48/0x98
[2115903.263756]  __hrtimer_run_queues+0xfc/0x218
[2115903.263760]  hrtimer_interrupt+0xf8/0x2d0
[2115903.263766]  arch_timer_handler_phys+0x28/0x40
[2115903.263771]  handle_percpu_devid_irq+0x80/0x140
[2115903.263777]  generic_handle_irq+0x24/0x38
[2115903.263782]  __handle_domain_irq+0x5c/0xb0
[2115903.263786]  gic_handle_irq+0x58/0xa8
[2115903.263789]  el1_irq+0xb8/0x140
[2115903.263793]  arch_cpu_idle+0x10/0x18
[2115903.263797]  do_idle+0x1e0/0x2c0
[2115903.263801]  cpu_startup_entry+0x20/0x28
[2115903.263807]  secondary_start_kernel+0x190/0x1d0
[2115903.263810] Task dump for CPU 3:
[2115903.263812] swapper/3       R  running task        0     0      1 0x0000002a
[2115903.263817] Call trace:
[2115903.263821]  __switch_to+0xb4/0x1b8
[2115903.263824]  0xffff000010df8000

LubOlimex

Technical support and documentation manager at Olimex

mossroy

#2
Many thanks for this detailed answer.

To sum up :
- It comes from a hardware issue in the Allwinner A64 chip, and affects all boards using the A64 (not only from Olimex)
- Allwinner is informed but doesn't care much
- A software workaround is implemented in kernel 5.3, and might be backported in previous kernel versions if necessary. But it involves recompiling the kernel in any case.

Now that the kernel 5.3 is officially released, when will you release a new version of your images in ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/ ? And/or a kernel update in http://repository.olimex.com ?

LubOlimex

Technical support and documentation manager at Olimex

mossroy

Well, we're now Soon™ + one month...

I saw that kernel 5.3 has been released for a13 : https://olimex.wordpress.com/2019/11/06/new-mainline-linux-images-with-kernel-5-3-8-for-a13-olinuxino-and-a13-som-are-uploaded/

Any update for a kernel >=5.3 for A64 that fixes the date issue?

mossroy

Soon™ + 2 months...

It's still unclear to me what differs between the armbian version distributed by Olimex and the upstream armbian version from https://www.armbian.com/olimex-lime-a64/ (which already has kernel 5.3, and looks stable. But version Armbian_19.11.3_Lime-a64_buster_current_5.3.9.7z does not seem to be able to install to eMMC : I hope it's only a temporary issue)

martinayotte

Quote from: mossroy on December 28, 2019, 12:21:33 PMBut version Armbian_19.11.3_Lime-a64_buster_current_5.3.9.7z does not seem to be able to install to eMMC
My Olinuxino-A64 (1Ge4GW) is running from eMMC for more than a year, and I've recently installed my own kernel build 5.5.0-rc2.
What "dmesg | grep -i mmc" is reporting on yours ?

mossroy

Quote from: martinayotte on December 28, 2019, 03:49:16 PMMy Olinuxino-A64 (1Ge4GW) is running from eMMC for more than a year, and I've recently installed my own kernel build 5.5.0-rc2.
What "dmesg | grep -i mmc" is reporting on yours ?
Using Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5 image (from Olimex), eMMC is available, and "dmesg | grep -i mmc" gives :
[    3.603833] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    3.629312] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    3.655378] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[    3.680544] mmc0: new high speed SDHC card at address 0007
[    3.683008] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[    3.684725]  mmcblk0: p1
[    3.729736] mmc1: new high speed MMC card at address 0001
[    3.732172] mmcblk1: mmc1:0001 Q2J55L 7.09 GiB
[    3.733974] mmcblk1boot0: mmc1:0001 Q2J55L partition 1 16.0 MiB
[    3.735800] mmcblk1boot1: mmc1:0001 Q2J55L partition 2 16.0 MiB
[    3.737210]  mmcblk1: p1
[    5.898007] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    8.469645] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro

Using Armbian_19.11.3_Lime-a64_buster_current_5.3.9 (from Armbian), eMMC is not detected, and the same command gives :
[    2.605615] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    2.630895] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.631937] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[    2.655592] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.671555] mmc0: new high speed SDHC card at address 0007
[    2.673093] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[    2.674281]  mmcblk0: p1
[    2.681172] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[    2.693501] sunxi-mmc 1c11000.mmc: no support for card's volts
[    2.693507] mmc2: error -22 whilst initialising SDIO card
[    4.949590] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    6.676378] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro

mossroy

I also tested the unsupported Armbian_19.11.4.356_Lime-a64_buster_dev_5.4.6_minimal (from Armbian). eMMC is also undetected, and "dmesg | grep -i mmc" gives :

[    2.476549] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    2.501821] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.502875] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[    2.526529] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.539784] mmc0: new high speed SDHC card at address 0007
[    2.540613] mmcblk0: mmc0:0007 SD04G 3.71 GiB
[    2.542829]  mmcblk0: p1
[    2.552931] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[    2.565206] sunxi-mmc 1c11000.mmc: no support for card's volts
[    2.565213] mmc2: error -22 whilst initialising SDIO card
[    3.927959] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    6.864042] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
[  21.662476] EXT4-fs (mmcblk0p1): resizing filesystem from 190464 to 919552 blocks
[  26.195849] EXT4-fs (mmcblk0p1): resized filesystem to 919552

martinayotte

Strange ...
I've downloaded and booted the Armbian_19.11.3_Lime-a64_buster_current_5.3.9_minimal.img.
I was able to mount eMMC without issue :
root@lime:~# dmesg | grep mmc
[    3.971826] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    4.001549] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    4.021533] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[    4.049970] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    4.079874] mmc0: new high speed SDHC card at address 0007
[    4.086860] mmcblk0: mmc0:0007 SD32G 29.9 GiB
[    4.093169]  mmcblk0: p1
[    4.097128] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[    4.239781] mmc1: new high speed SDIO card at address 0001
[    4.534821] mmc2: new DDR MMC card at address 0001
[    4.541531] mmcblk2: mmc2:0001 P1XXXX 3.60 GiB
[    4.547368] mmcblk2boot0: mmc2:0001 P1XXXX partition 1 16.0 MiB
[    4.554767] mmcblk2boot1: mmc2:0001 P1XXXX partition 2 16.0 MiB
[    4.562367]  mmcblk2: p1
[    5.277142] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    7.464783] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
[  116.548529] EXT4-fs (mmcblk2p1): mounted filesystem with ordered data mode. Opts: (null)

Maybe you should try to build your own Armbian image using their build script ...

mossroy

I exchanged the microSD cards with another olinuxino A64 (same model), to check it's not a hardware issue. It's not : I have the same result on both devices.

I also tested the latest Armbian_20.02.0-rc0_Lime-a64_buster_current_5.4.12.img, same result too :

root@lime:~# dmesg | grep mmc
[    2.535544] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    2.560804] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.561852] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[    2.585507] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.598670] mmc0: new high speed SDHC card at address 0007
[    2.599450] mmcblk0: mmc0:0007 SD04G 3.71 GiB
[    2.601893]  mmcblk0: p1
[    2.612057] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[    2.624333] sunxi-mmc 1c11000.mmc: no support for card's volts
[    2.624357] mmc2: error -22 whilst initialising SDIO card
[    3.952318] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    6.765327] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro
[  19.514557] EXT4-fs (mmcblk0p1): resizing filesystem from 378880 to 919552 blocks
[  22.647542] EXT4-fs (mmcblk0p1): resized filesystem to 919552

mossroy

With this latest image, I also tried to switch to kernel 5.5.0-rc6, which gives the same result :

dmesg | grep mmc
[    2.583998] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    2.609265] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.610288] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[    2.633956] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.649927] mmc0: new high speed SDHC card at address 0007
[    2.650732] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[    2.652700]  mmcblk0: p1
[    2.660149] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB, uses new timings mode
[    2.672457] sunxi-mmc 1c11000.mmc: no support for card's volts
[    2.672463] mmc2: error -22 whilst initialising SDIO card
[    3.980770] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    5.603862] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro

And same with kernel 4.19.63 :

[    2.045784] sunxi-mmc 1c0f000.mmc: Linked as a consumer to regulator.2
[    2.046171] sunxi-mmc 1c10000.mmc: Linked as a consumer to regulator.9
[    2.046234] sunxi-mmc 1c10000.mmc: Linked as a consumer to regulator.14
[    2.046246] sunxi-mmc 1c11000.mmc: Linked as a consumer to regulator.1
[    2.064426] sunxi-mmc 1c0f000.mmc: Got CD GPIO
[    2.064924] sunxi-mmc 1c10000.mmc: allocated mmc-pwrseq
[    2.089729] sunxi-mmc 1c10000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.089733] sunxi-mmc 1c11000.mmc: initialized, max. request size: 2048 KB
[    2.089738] sunxi-mmc 1c0f000.mmc: initialized, max. request size: 16384 KB, uses new timings mode
[    2.095163] sunxi-mmc 1c11000.mmc: no support for card's volts
[    2.095169] mmc2: error -22 whilst initialising SDIO card
[    2.129603] mmc0: new high speed SDHC card at address 0007
[    2.131212] mmcblk0: mmc0:0007 SD32G 29.2 GiB
[    2.132370]  mmcblk0: p1
[    6.442146] EXT4-fs (mmcblk0p1): mounted filesystem with writeback data mode. Opts: (null)
[    8.030362] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,errors=remount-ro

mossroy

About the date bug and a new image released by Olimex, we're now on "Soon™ + 3 months"...

LubOlimex

I am not convinced that the timer issue persists with our software images in first place. I could not replicate the problem in my tests so far. Ran both Olimex images Ubuntu and Debian and couldn't replicate it. Also ran my tests on couple of A64 boards. Can you do the following:

Use of the official images:

ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/bionic/images/

or

ftp://staging.olimex.com/Allwinner_Images/a64-olinuxino/linux/1.latest_images/buster/images/

Run this test program: https://github.com/apritzel/pine64/blob/master/tools/test_timer.c

Download it to a directory on the board, compile it with:

gcc -o test_timer test_timer.c

and run it with

./test_timer

I got the software from here: https://patchwork.kernel.org/patch/10392891/#21914739

These are my results with Ubuntu (Debian is the same):
root@olinuxino:/home/test_timer# date
Wed Jan 29 12:59:03 UTC 2020
root@olinuxino:/home/test_timer# ./test_timer
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
ok 2 native counter reads are monotonic # 0 errors
# min: 6, avg: 6, max: 11406
ok 3 Linux counter reads are monotonic # 0 errors
# min: 458, avg: 490, max: 64625
# core 0: counter value: 181435460523 => 7559 sec
# core 0: offsets: back-to-back: 9, b-t-b synced: 6, b-t-b w/ delay: 8
# core 1: counter value: 181435462090 => 7559 sec
# core 1: offsets: back-to-back: 8, b-t-b synced: 7, b-t-b w/ delay: 8
# core 2: counter value: 181435463546 => 7559 sec
# core 2: offsets: back-to-back: 8, b-t-b synced: 10, b-t-b w/ delay: 8
# core 3: counter value: 181435464907 => 7559 sec
# core 3: offsets: back-to-back: 12, b-t-b synced: 6, b-t-b w/ delay: 9
1..3
Technical support and documentation manager at Olimex

mossroy

It's true that I did not have this issue again since October 2019. Even if my boards are running 24/7.

I compiled and ran the test_timer.c for a few hours (in a loop) on 2 A64-OLinuXino-2Ge8G-IND boards.
It gave similar results as yours, and the date did not change :
TAP version 13
# number of cores: 4
ok 1 same timer frequency on all cores
# timer frequency is 24000000 Hz (24 MHz)
ok 2 native counter reads are monotonic # 0 errors
# min: 6, avg: 6, max: 887
ok 3 Linux counter reads are monotonic # 0 errors
# min: 458, avg: 483, max: 75667
# core 0: counter value: 20877018122385 => 869875 sec
# core 0: offsets: back-to-back: 15, b-t-b synced: 7, b-t-b w/ delay: 12
# core 1: counter value: 20877018123819 => 869875 sec
# core 1: offsets: back-to-back: 14, b-t-b synced: 7, b-t-b w/ delay: 9
# core 2: counter value: 20877018128599 => 869875 sec
# core 2: offsets: back-to-back: 12, b-t-b synced: 6, b-t-b w/ delay: 9
# core 3: counter value: 20877018129983 => 869875 sec
# core 3: offsets: back-to-back: 8, b-t-b synced: 7, b-t-b w/ delay: 9
1..3

So I'd say it's encouraging.

On the other hand, I was already using the latest image you mention (Armbian_5.92.1_Olinuxino-a64_Debian_buster_next_5.2.5.7z) when I had the issue. See my first post of this topic.
So I don't see what could have fixed the issue in between (except some software update?).

Reading https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/page/4/, some people seem to say that keeping systemd-timesyncd service disabled helps (this service is not running on my boards, because /usr/sbin/ntpd does not exist). But it's an old thread.

The thread https://forum.armbian.com/topic/7423-pine64-massive-datetime-clock-problem/page/3/ is more recent, and they have made a kernel patch that seems to work well (based on the comments).