Olimex STMP1 SOM - Error starting up

Started by SteffenFuchs, January 29, 2024, 11:53:57 PM

Previous topic - Next topic

SteffenFuchs

End of last year we purchased approx. 30 of the STM32MP1 SOM to use in a pilot series of our measuring equipment.

We encountered the following issue that we wanted to bring to your attention and ask,

  • If you experience the same behavior, and
  • If you can advise us how to avoid the issue
   
Our observation has been:

  • When powering up the SOM the system sometimes does not boot
  • Connecting to the provided UART connector and observing the output, it stops after only a few lines, the last one printed being Init AXP209 PMIC

This happens:
  • with at least one of the boards purchased all the time,
  • some more boards when they have been disconnected from power supply for less than a minute, and
  • nearly all boards when disconnected from power for only a few (e.g. less than 3) seconds, but not all the boards all the time
The behavior was the same with different setups connecting the SOM to:
  • Our application board
  • Your Olimex STMP1(A13)-EVB
  • A simple board just supplying GND and 5 V to the LCD-TR1 connector
  • The behavior has been the same with out-of-the-box boards that have never been connected to our application board, to rule out any possible defect caused by us
The software revisions used are your images STM32MP1-OLinuXino-SOM-bullseye-minimal both from
  • July 25th 2023
  • December 22nd 2023

In order to investigate further I added a few printf statements to the U-Boot code from December 22nd to see what is happening, and found this (all mentioned files relative to UBoot root directory):
  • board/st/stm32mp1/spl.c, function board_early_init_f: calls axp_init
  • arch/arm/mach-stm32mp/axp209.c, function axp_init: calls axp_bus_init
  • arch/arm/mach-stm32mp/axp209.c, function axp_bus_init:
            first calls uclass_det_device_by_seq, which is OK,
            then calls dm_i2c_probe
  • drivers/i2c/i2c-uclass.c, function dm_i2c_probe: calls i2c_probe_chip
  • drivers/i2c/i2c-uclass.c, function i2c_probe_chip:
            eventually calls function pointer ops->xfer, with one message of zero length to probe is chip answers to its address on the I²C-bus
            ops-xfer happens to point to stm32_i2c_xfer, as defined in drivers/i2c/stm32f7_i2c.c, struct dm_i2c_ops stm32_i2c_ops
  • drivers/i2c/stm32f7_i2c.c, function stm32_i2c_xfer:
            first calls stm32_i2c_check_device_busy, which is OK
            then calls stm32_i2c_message_xfer
  • drivers/i2c/stm32f7_i2c.c, function stm32_i2c_message_xfer:
            first calls stm32_i2c_message_start, which puts the address on the I²C-bus and returns OK
            since the message has zero length, the while loop with all calls to readb and writeb is skipped
            finally stm32_i2c_check_end_of_message is called
  • drivers/i2c/stm32f7_i2c.c, function stm32_i2c_check_end_of_message:
            calls stm32_i2c_wait_flags, which waits for one of these flags to be set: STM32_I2C_ISR_ERRORS | STM32_I2C_ISR_NACKF | STM32_I2C_ISR_STOPF
            then it checks for these flags and additionally for STM32_I2C_ISR_ARLO
            If the error occurs, it finds flag STM32_I2C_ISR_NACKF set
            the function then returns -EIO, which is -5
The returned value is propagated all the way back to board_early_init_f, where it is ignored.

In board_early_init_f, in case of error, one of the subsequent calls to axp_set_aldo3 and axp_set_aldo2 halts the board, which I did not further investigate as the previous function already showed issues not dealt with board_early_init_f never terminates, startup never continues

Reading up on I²C-bus I found that a NACK directly after address is sent occurs under these circumstances:

  • the I²C data line is driven open-collector by all participants, with a pull-up holding the line at 1 if no participant drives it low
  • normally after a chip has been addressed the master does not drive data low
  • but the slave that has been addressed should, indicating that it is present and ready to receive data
  • in case of the error the AXP209 however does not, which can be due to these issues:
    • The address sent is wrong (at least as far as software goes the value is the same whether the error occurs or not, but I did not oscillograph the actual bus as it is difficult to access)
    • The chip addressed is not present
    • The chip does not listen to the bus
    • The chip is in a different state expecting other data

This is how far we have investigated the issue so far. If you like, we can send you the board that exhibits the error during each start, as well as one of the many boards that show the error only when the power is interrupted very briefly.

Best regards,

Steffen

jper

Hello,

I have experienced similar issue with STMP157-OLinuXino-LIME2 SBC.
The board booted successfully a couple of times when it arrived with different images being burned on the SD card. However, after two days of development it suddenly got stuck when trying to boot up. The only output that i get is:

U-Boot SPL 2021.04+olimex-1-20231222.140935 (Dec 22 2023 - 14:12:07 +0000)
Model: STM32MP1 OLinuXino-LIME
Init AXP209 PMIC
VDD Core set to: 1350 mv


This seems to be similar what SteffenFuchs wrote above. Whatever I change, the SD card, the image (I tried the supplied image as well), nothing seem to be fixing the issue and the board remains unusable stuck in frozen boot. I would appreciate any form of help or advice here.

Kind Regards,
Jacob

LubOlimex

You need to use the base image from here:

https://images.olimex.com/release/stm32-lime/

You need a solid 10W 5V DC power supply attached to the power jack.
Technical support and documentation manager at Olimex

thom_nic

I'm using a rev C STMP1-SOM with a rev B EVB.  The 5V wall wart states 5VDC 2500mA.  I'm seeing the same behavior:

U-Boot SPL 2021.04+olimex-1-20231222.140935 (Dec 22 2023 - 14:12:07 +0000)
Model: STM32MP1XX OLinuXino-SOM
Init AXP209 PMIC

U-Boot SPL 2021.04+olimex-1-20231222.140935 (Dec 22 2023 - 14:12:07 +0000)
Model: STM32MP1XX OLinuXino-SOM
Init AXP209 PMIC

It usually prints twice after I hit the RESET button.  I'm using the stock image downloaded from this link.

SteffenFuchs

Thank you all for the replies! jper and thom_nic, that is exactly What I have encountered with several boards, though not all of them all the time, so it seems to be an issue that comes up more frequently.

In addition to all my tests above I have I can confirm that I used exactly the images as linked above by you LubOlimex - but I have also tried the previous images from July with no difference.

Further the manual for the board found here states on the top of page 7 that a 5 V 1 A power supply is needed  - which is only 5 W instead of the 10 W mentioned above.

However I have encountered the same behavior with a 5 V 3 A 15 W power supply as with a smaller one.

So can you at Olimex please start investigating the issue? I think it comes down to the I²C communication between the STM32MP1 and the AXP209 you use on your board, which fails according to my research and testing as I have written above, but I have no access to any means to go further myself.

thom_nic

I've been pulling my hair out because we're randomly seeing products not boot. Sometimes it comes back after fiddling a bit.  I've been thinking it was some sort of SD card issue (even though no writing occurs to the boot partitions!) now I'm learning maybe it's board/power related.

For our part, we need to get to the bottom of this quickly, we have hundreds of devices out in the field and they are very costly to RMA (not to mention a poor customer experience) when one refuses to boot.

LubOlimex

#6
The problem for us is that we can't replicate this issue. The many boards we have tested always start up. Maybe only some boards experience this problem, maybe in specific conditions. Until we can replicate the issue here we can't solve anything. If you have boards that doesn't start please contact us at support@olimex.com so we can arrange returns to experiment with them. Notice that some reasonable fault rate on the start up hang is expected. Like if it is one board doesn't start once if it is unpowered and repowered 1000 times in 10 days - this is not a thing that we can replicate here.
Technical support and documentation manager at Olimex

SteffenFuchs

Dear LubOlimex,
thank you for your reply.
I did send you that email on January 8th already, also offering to send back some boards we experienced problems with, but have not reveived any reply from support@olimex.com since, so had to go through the forum.

LubOlimex

Maybe it got lost somewhere around here, re-send it - we are interested in the returns of boards that experience the issue regularly so we can see the hang ourselves.
Technical support and documentation manager at Olimex


thom_nic

At the moment I have one that consistently hangs.  I'll get it returned soon.

LubOlimex

Make sure to include a full description of the behavior and steps we should take to replicate it inside so we can pay closer attention. Also request RMA over support@olimex.com (maybe in the e-mail include link to this thread so we can link the request with the behavior described in the forum).
Technical support and documentation manager at Olimex

SteffenFuchs

The two boards I returned should have arrived March 8th. Hope you can reproduce the behavior and find the cause.

LubOlimex

Yeah thanks for the effort, they reached me. So far I can replicate the behavior you described 1:1 with the first board - we are not yet done with analyzing what could cause it but we are on it. Thanks for good packing and description of the behavior.
Technical support and documentation manager at Olimex

Titomax

Hi all,
I can confirm the problem described at the beginning by SteffenFuchs (Steffen thank you for your in deep analysis). We are using many STMP157-SOM-512-EXT on our mainboard and sometimes after normal shutdown the modules don't boot anymore. On the debug UART the boot procedure hangs after this messages:

U-Boot SPL 2021.04+olimex-1-20240312.134658 (Mar 12 2024 - 13:48:01 +0000)
Model: STM32MP1XX OLinuXino-SOM
Init AXP209 PMIC

We initially thought we broke the modules in some way but they started working again after about 20-24 hours after removing them from the motherboard and let them disconnected.

Now only 2 modules on the initial 11 refuse to boot, all the others boot again.

It's alarming that we don't have any guarantee that inside our final product they don't stop to boot again after a system restart.

Please investigate the issue as soos as possible as we need to stop production if an hardware failure is identified.

Thank you in advance