Some temperature sensors are missing in recent images

mossroy · March 23, 2021, 03:43:25 PM

In older images provided by Olimex, several temperature sensors were exposed. In particular some coming from the CPU.

Example with a board installed with A64-OLinuXino-buster-minimal-20200601-131837.img (running with kernel 5.8.18) :

Code Select

$ sensors
axp813_adc-isa-0000
Adapter: ISA adapter
temp1:        +35.7°C  

gpu1_thermal-virtual-0
Adapter: Virtual device
temp1:        +48.3°C  

cpu0_thermal-virtual-0
Adapter: Virtual device
temp1:        +48.0°C  (crit = +90.0°C)

axp20x_battery-isa-0000
Adapter: ISA adapter
in0:          +0.00 V  
curr1:        +0.00 A  

axp813_ac-isa-0000
Adapter: ISA adapter
in0:              N/A  (min =  +4.00 V)

gpu0_thermal-virtual-0
Adapter: Virtual device
temp1:        +48.7°C

But most of these sensors have disappeared in more recent images.

For example, on a board installed with A64-OLinuXino-buster-minimal-20201207-193928.img (upgraded to kernel 5.10.23) :

Code Select

$ sensors
axp813_adc-isa-0000
Adapter: ISA adapter
temp1:        +36.7°C

The CPU sensors are valuable to check it's not overheating. In particular now that the CPU governor is set to "performance" by default

mossroy · May 15, 2021, 09:31:18 PM

This is unfortunately not fixed with an upgrade to the latest kernel provided in the olimex repo (5.10.36).

Could you please fix this regression? (and/or tell us if there is a workaround)

jch · May 26, 2021, 05:53:17 PM

They're now managed by the thermal subsystem, which allows the kernel to throttle the CPU when it overheats. You can get the values from sysfs:

Code Select

$ for t in /sys/devices/virtual/thermal/thermal_zone*; do cat $t/type $t/temp; done
cpu0-thermal
43972
gpu0-thermal
44323
gpu1-thermal
44206

mossroy · May 30, 2021, 11:59:53 AM

Thanks for the info.

Unfortunately, this is not natively supported by the tools I was happily using so far :

sensors command-line (even after running sensors-detect)
prometheus
glances
and probably more

All were working fine with previous versions of the kernel.

And all were working fine with the (unstable) mainline debian bullseye I managed to install (see https://www.olimex.com/forum/index.php?msg=31474), based on kernel 5.10.x too.

It's good news that the CPU is throttled in case of overheat, but there has been a regression that seems specific to recent Olimex images

jch · May 30, 2021, 05:38:48 PM

Right. I've filed an issue request at https://github.com/OLIMEX/olimage/issues/4

mossroy · June 04, 2021, 04:26:01 PM

About the CPU throttling when overheating, it seems to be not always working.

Today, I have put a very heavily load on one of my A64 boards, and it has shut down with the following syslog message :

Quotekernel:[ 5506.549851] thermal thermal_zone0: critical temperature reached (90 C), shutting down

This board had been very recently installed with latest image A64-OLinuXino-buster-minimal-20210513-112230.img , with kernel 5.10.36. It's in a room with average temperature

I would have much preferred a CPU throttling than a brutal shutdown that forces a manual restart.

jch · June 04, 2021, 07:16:24 PM

> About the CPU throttling when overheating, it seems to be not always working.

Strange, it works for me. At four threads, the board will reliably throttle at 70°C, and temperature remains stable.

> kernel:[ 5506.549851] thermal thermal_zone0: critical temperature reached (90 C), shutting down

My understanding is that the kernel should start throttling the CPU at 70°C, throttle it further at 80°C, and shut it down at 90°C. If the CPU reached 90°C, then shutting down is the correct behaviour, but it shouldn't have happened in the first place. Could you please check the value of

Code Select

cat /sys/devices/virtual/thermal/thermal_zone0/trip_point_*_temp

It should say 70000 80000 90000, if that's not the case, there's something wrong with your device tree.

Can you please confirm that throttling is happening? Run

Code Select

watch cat /sys/devices/virtual/thermal/thermal_zone0/temp /sys/devices/virtual/thermal/cooling_device0/cur_state

You should see the second value switch to 1 when the temperature goes above 70000.

mossroy · June 06, 2021, 10:57:50 AM

Code Select

cat /sys/devices/virtual/thermal/thermal_zone0/trip_point_*_tempgives your expected output

Regarding the throttling, I'll have to generate a heavy load again to check

mossroy · June 07, 2021, 10:42:09 AM

After generating some heavy load again, I confirm that your second value switches to 1 each time the temperature exceeds 70000, and switches back to 0 when it comes back below.

jch · June 09, 2021, 03:11:30 PM

Then it looks like everything is working like intended — I've got no explanation for what could have gone wrong before. If you find a reliable way to reproduce the issue, I'll be glad to have a look.

mossroy · June 10, 2021, 11:14:25 AM

The heavy load was produced by a compilation, probably multi-threaded on the 4 cores.

My board was inside the official metal box https://www.olimex.com/Products/OLinuXino/A64/BOX-A64-BLACK/ , with a heatsink https://www.olimex.com/Products/Components/Misc/ALUMINIUM-HEATSINK-20x20x6MM/ , in an apartment room with average spring temperature.