Olimex Support Forum

OLinuXino Android / Linux boards and System On Modules => STMP1 => Topic started by: SteffenFuchs on January 29, 2024, 11:53:57 PM

Title: Olimex STMP1 SOM - Error starting up
Post by: SteffenFuchs on January 29, 2024, 11:53:57 PM
End of last year we purchased approx. 30 of the STM32MP1 SOM to use in a pilot series of our measuring equipment.

We encountered the following issue that we wanted to bring to your attention and ask,

   
Our observation has been:


This happens:
The behavior was the same with different setups connecting the SOM to:
The software revisions used are your images STM32MP1-OLinuXino-SOM-bullseye-minimal both from

In order to investigate further I added a few printf statements to the U-Boot code from December 22nd to see what is happening, and found this (all mentioned files relative to UBoot root directory):
The returned value is propagated all the way back to board_early_init_f, where it is ignored.

In board_early_init_f, in case of error, one of the subsequent calls to axp_set_aldo3 and axp_set_aldo2 halts the board, which I did not further investigate as the previous function already showed issues not dealt with board_early_init_f never terminates, startup never continues

Reading up on I²C-bus I found that a NACK directly after address is sent occurs under these circumstances:


This is how far we have investigated the issue so far. If you like, we can send you the board that exhibits the error during each start, as well as one of the many boards that show the error only when the power is interrupted very briefly.

Best regards,

Steffen
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: jper on February 22, 2024, 04:25:48 PM
Hello,

I have experienced similar issue with STMP157-OLinuXino-LIME2 SBC.
The board booted successfully a couple of times when it arrived with different images being burned on the SD card. However, after two days of development it suddenly got stuck when trying to boot up. The only output that i get is:

U-Boot SPL 2021.04+olimex-1-20231222.140935 (Dec 22 2023 - 14:12:07 +0000)
Model: STM32MP1 OLinuXino-LIME
Init AXP209 PMIC
VDD Core set to: 1350 mv


This seems to be similar what SteffenFuchs wrote above. Whatever I change, the SD card, the image (I tried the supplied image as well), nothing seem to be fixing the issue and the board remains unusable stuck in frozen boot. I would appreciate any form of help or advice here.

Kind Regards,
Jacob
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on February 23, 2024, 09:14:41 AM
You need to use the base image from here:

https://images.olimex.com/release/stm32-lime/

You need a solid 10W 5V DC power supply attached to the power jack.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on February 26, 2024, 11:22:58 PM
I'm using a rev C STMP1-SOM with a rev B EVB.  The 5V wall wart states 5VDC 2500mA.  I'm seeing the same behavior:

U-Boot SPL 2021.04+olimex-1-20231222.140935 (Dec 22 2023 - 14:12:07 +0000)
Model: STM32MP1XX OLinuXino-SOM
Init AXP209 PMIC

U-Boot SPL 2021.04+olimex-1-20231222.140935 (Dec 22 2023 - 14:12:07 +0000)
Model: STM32MP1XX OLinuXino-SOM
Init AXP209 PMIC

It usually prints twice after I hit the RESET button.  I'm using the stock image downloaded from this link (http://images.olimex.com/release/stm32-som/).
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: SteffenFuchs on February 27, 2024, 12:33:50 AM
Thank you all for the replies! jper and thom_nic, that is exactly What I have encountered with several boards, though not all of them all the time, so it seems to be an issue that comes up more frequently.

In addition to all my tests above I have I can confirm that I used exactly the images as linked above by you LubOlimex - but I have also tried the previous images from July with no difference.

Further the manual for the board found here (https://www.olimex.com/Products/SOM/STMP1/STMP157-SOM-512/resources/STMP15x-SOM.pdf) states on the top of page 7 that a 5 V 1 A power supply is needed  - which is only 5 W instead of the 10 W mentioned above.

However I have encountered the same behavior with a 5 V 3 A 15 W power supply as with a smaller one.

So can you at Olimex please start investigating the issue? I think it comes down to the I²C communication between the STM32MP1 and the AXP209 you use on your board, which fails according to my research and testing as I have written above, but I have no access to any means to go further myself.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on February 27, 2024, 07:23:45 PM
I've been pulling my hair out because we're randomly seeing products not boot. Sometimes it comes back after fiddling a bit.  I've been thinking it was some sort of SD card issue (even though no writing occurs to the boot partitions!) now I'm learning maybe it's board/power related.

For our part, we need to get to the bottom of this quickly, we have hundreds of devices out in the field and they are very costly to RMA (not to mention a poor customer experience) when one refuses to boot.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on February 28, 2024, 11:50:52 AM
The problem for us is that we can't replicate this issue. The many boards we have tested always start up. Maybe only some boards experience this problem, maybe in specific conditions. Until we can replicate the issue here we can't solve anything. If you have boards that doesn't start please contact us at support@olimex.com so we can arrange returns to experiment with them. Notice that some reasonable fault rate on the start up hang is expected. Like if it is one board doesn't start once if it is unpowered and repowered 1000 times in 10 days - this is not a thing that we can replicate here.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: SteffenFuchs on February 28, 2024, 12:29:36 PM
Dear LubOlimex,
thank you for your reply.
I did send you that email on January 8th already, also offering to send back some boards we experienced problems with, but have not reveived any reply from support@olimex.com since, so had to go through the forum.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on February 28, 2024, 12:47:34 PM
Maybe it got lost somewhere around here, re-send it - we are interested in the returns of boards that experience the issue regularly so we can see the hang ourselves.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: SteffenFuchs on February 28, 2024, 01:12:48 PM
Just did.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on March 07, 2024, 06:34:19 AM
At the moment I have one that consistently hangs.  I'll get it returned soon.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on March 07, 2024, 08:13:07 AM
Make sure to include a full description of the behavior and steps we should take to replicate it inside so we can pay closer attention. Also request RMA over support@olimex.com (maybe in the e-mail include link to this thread so we can link the request with the behavior described in the forum).
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: SteffenFuchs on March 11, 2024, 02:15:47 PM
The two boards I returned should have arrived March 8th. Hope you can reproduce the behavior and find the cause.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on March 11, 2024, 02:20:12 PM
Yeah thanks for the effort, they reached me. So far I can replicate the behavior you described 1:1 with the first board - we are not yet done with analyzing what could cause it but we are on it. Thanks for good packing and description of the behavior.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: Titomax on March 27, 2024, 06:17:44 PM
Hi all,
I can confirm the problem described at the beginning by SteffenFuchs (Steffen thank you for your in deep analysis). We are using many STMP157-SOM-512-EXT on our mainboard and sometimes after normal shutdown the modules don't boot anymore. On the debug UART the boot procedure hangs after this messages:

U-Boot SPL 2021.04+olimex-1-20240312.134658 (Mar 12 2024 - 13:48:01 +0000)
Model: STM32MP1XX OLinuXino-SOM
Init AXP209 PMIC

We initially thought we broke the modules in some way but they started working again after about 20-24 hours after removing them from the motherboard and let them disconnected.

Now only 2 modules on the initial 11 refuse to boot, all the others boot again.

It's alarming that we don't have any guarantee that inside our final product they don't stop to boot again after a system restart.

Please investigate the issue as soos as possible as we need to stop production if an hardware failure is identified.

Thank you in advance
 


Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on March 28, 2024, 09:16:09 AM
We are investigating thoroughly thanks to the four boards with this issue that we received back from customers. In such boards it seems that the AXP209 hangs sometimes and the u-boot can't find it or communicate with it so the boot process stops. Our tests and hardware changes so far hadn't provided the definitive answers we are seeking. Currently, we are prototyping a number of test boards around the AXP209 to exhaust all hardware scenarios. I will update this thread when I have some more news but I welcome any feedback or experience or description with the boot of STMP boards - any specific details are helpful.

One workaround is to exclude any AXP209 initialization from u-boot and kernel, the AXP209 still hangs but the board would boot all the time (you'd lose access to any features of the PMU tho, like power management, battery status, etc). It is not clear if it is safe to exclude the AXP209 entirely since it controls a lot of things - like voltage on chip, voltage on HDMI, etc. It might be unsafe to disable the AXP209. Disabling the AXP209 from u-boot or kernel might lead to bigger issues and problems in the long run. More empirical tests are needed to confirm if it is safe.

We will post here some more details on how to exclude the AXP209 if you wish to test that.

Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on March 28, 2024, 08:44:08 PM
Quote from: Titomax on March 27, 2024, 06:17:44 PMWe initially thought we broke the modules in some way but they started working again after about 20-24 hours after removing them from the motherboard and let them disconnected.

This is exactly the same behavior we've observed as well.  After attempting many times to boot, sometimes leaving the device powered off overnight - I pull off the SOM and replace with a different one (same SD card) and it boots. Must be a bad SOM, right? 

Then a few days later, I take another look at the "bad" SOM on the EVB or I throw it back in the product and now it boots!
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on March 29, 2024, 01:41:42 PM
Here, this image has AXP209 and PMU removed from u-boot and kernel, give it a try:

https://ftp.olimex.com/TEMP/SOM-NO-AXP209/STM32MP1-OLinuXino-SOM-bullseye-minimal-20240328-133932.img.7z

Again:

QuoteOne workaround is to exclude any AXP209 initialization from u-boot and kernel, the AXP209 still hangs but the board would boot all the time (you'd lose access to any features of the PMU tho, like power management, battery status, etc). It is not clear if it is safe to exclude the AXP209 entirely since it controls a lot of things - like voltage on chip, voltage on HDMI, etc. It might be unsafe to disable the AXP209. Disabling the AXP209 from u-boot or kernel might lead to bigger issues and problems in the long run. More empirical tests are needed to confirm if it is safe.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on March 29, 2024, 08:09:33 PM
Quote from: LubOlimex on March 29, 2024, 01:41:42 PMHere, this image has AXP209 and PMU removed from u-boot and kernel, give it a try:

Thank you Lub. Could you push your branches to https://github.com/OLIMEX/u-boot-olinuxino and https://github.com/OLIMEX/linux-olimex so we can see the changeset?
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on April 01, 2024, 10:51:56 AM
I can give you the diff files with the changes, find them here:

https://ftp.olimex.com/TEMP/SOM-NO-AXP209/DIFF-files/

Not sure if we will make a fork or branch in the public repo without AXP209 since it is a workaround that might suit people with startup issues but would hinder people that need AXP209 tools.

Removing the AXP209 for the boards with HDMI connector is especially risky (like the STMP157-OLinuXino-LIME2 and STMP157-BASE-SOM-EVB).
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on April 02, 2024, 07:30:00 PM
Thank you.  Is there a plan, or ongoing investigation to solve the root cause and get the AXP209 reliably working again?
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on April 03, 2024, 08:42:46 AM
Yes, we are working on it. As I wrote previously: "Currently, we are prototyping a number of test boards around the AXP209 to exhaust all hardware scenarios." These PCBs were manufactured and tests are currently carried away. This can't be done very quick since there is a lot of hardware tinkering and soldering and empirical testing to reach to the bottom of the issue.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on April 17, 2024, 10:33:59 AM
An update on this issue so far.

Summary:

This appears to be somehow rare issue since we couldn't find any such boards in our warehouse. Meaning not all boards are affected, we guess that maybe different AXP209 chips have different tolerances - and those combined with tolerances of all other related components lead to only small percentage of boards affected by start up issues. Furthermore, these start up issues differ in intensity - some boards might fail to start in 10% of power ups, other in 50% of power ups and some rarely can start up at all.

We got 4 boards back from two customers posting in this thread (each sent us two boards). This helped a lot, thanks again. All four boards had more or less start up issues upon power up. Two of the boards were incredibly bad cases only very rarely starting up - we managed to get both these boards boot via hardware changes, one board had the AXP209 changed and started working fine, the other started booting after changing R64 from 10k to 100k.

The nature of the issue is that the I2C communication between the AXP209 and the main chip dies. Since the official Olimage images have some AXP209 support, when the software can't find the AXP209 over the I2C - the Linux image won't start. It is not clear what causes that hang in the I2C communication and why only some boards are affacted. We are still working on some hardware solutions for future hardware revisions.

The workardounds we found so far:

1. Remove all drivers for AXP209 from Linux, this will make the board boot but all AXP209 control over I2C would be gone. Some functions would be missing (like reading battery charge, setting the board in low power modes, etc). We already created such a Linux image without any initialization of the AXP209, it is available here:

https://ftp.olimex.com/TEMP/SOM-NO-AXP209/STM32MP1-OLinuXino-SOM-bullseye-minimal-20240328-133932.img.7z

Differences can be seen here:

https://ftp.olimex.com/TEMP/SOM-NO-AXP209/DIFF-files/

2. Hardware workaround - with some hardware delay by increasing the value of resistor R62 from 10k to 100k, the boards don't hang on power up (as long as there was at least two seconds power down between power ups). Keep in mind R62 is a bit hard to replace (size 0402 and location under plastic connector), we bend the plastic connector before replacing it then bend it back. So far this seems like acceptable workaround and all boards manufactured since start of April will leave the factory with R64 = 100k R62 = 100k.

If the software workaround doesn't work for your goals, or if you are unable to perform the hardware workaround, please contact us over support@olimex.com - we will arrange return and replacement of R64 with 100k.

We are still running hardware tests and different solutions hope to fix this issue completely in next hardware revision. I will update you when that happens.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on April 18, 2024, 04:54:33 PM
Thanks very much for the update Lub.  Couple questions:

- Did you rev the hardware with the resistor change? We will not make hardware mods to a third-party component including the Olimex. It must come with the fix.  Can you please confirm the YYWW date code and lot code when this change was made?  We will likely inform our CMs to stop using older SOMs and replace them ASAP.

- Can you please provide a full list of lost capability due to disabling the AXP209 in software?  You said battery charging, low power states, and I think you said elsewhere that some video output capability would be lost.  Is that everything?  I just want to ensure I fully understand the possible effects of disabling the PMIC.

That aside, I am still struggling to understand the root cause of the issue.  Virtually every STMP1 SOM we have on hand in engineering, we can cause this in about 10 minutes or less of power cycling (1 minute off between each power up.)  After it gets to this state, the unit may be left off for hours (overnight) and still will not boot.  Swapping the SOM always resolves it.  It seems like infrequently a "bad" SOM will boot again after it has been pulled off the mainboard for a while.  But usually it will hang again after the next  power cycle or two.

Have you already or do you plan to also add this improvement (https://www.olimex.com/forum/index.php?msg=37863) to the next board revision?
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on April 18, 2024, 05:32:47 PM
Lub I'm looking at the diff files you posted above (https://ftp.olimex.com/TEMP/SOM-NO-AXP209/DIFF-files/).  Both diffs appear to be from the same file in the linux kernel source.  Should one of them be for the u-boot source?  I am guessing modules/stmp1-boot/u-boot-olimex/board/olimex/stm32mp1_olinuxino/spl.c
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on April 19, 2024, 08:43:37 AM
Good catch, my bad! Proper diff files are now uploaded at the same location:

https://ftp.olimex.com/TEMP/SOM-NO-AXP209/DIFF-files/

- Can you please provide a full list of lost capability due to disabling the AXP209 in software?  You said battery charging, low power states, and I think you said elsewhere that some video output capability would be lost.  Is that everything?  I just want to ensure I fully understand the possible effects of disabling the PMIC.

It is mainly the battery-related software functions. The HDMI should be fine for the SOM board (I mainly mentioned it if somebody performs similar removal of AXP support for the LIME or BASE SOM images). Not much should be lost from lack of it. We are not fully aware ourselves of all implications of removing AXP209 driver from the Linux, but on first and second look and tests so far it seems fine.

Drop me an e-mail at support@olimex.com to discuss the details about the affected boards and how to ensure future purchases have the hardware fix applied (and also to discuss returning the two boards that you sent back).
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on April 19, 2024, 09:40:50 AM
The boards with 100k resistor would be revision D1, indicated only on the box (the PCB marking would still says D). All boards purchased directly from our web-shop will be revision D1. If you purchase boards from our-web shop they will have the 100k resistor.

I will update the schematic later today.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on April 22, 2024, 09:21:46 PM
Edit: disregard, I was looking at the wrong SOM product page.

Hi Lub - I do not see the updated PDF schematic on the product page (https://www.olimex.com/Products/SOM/STMP1/STMP157-BASE-SOM-EXT/) and I also cannot find any repo on https://github.com/OLIMEX/ for the STMP1 SOM hardware.  In addition to just schematic it would be helpful to know where to find the hardware changelog for this SOM.  For example many of the SOMs are here (https://github.com/OLIMEX/SOM/tree/master) but not the STMP1.  Thanks.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: thom_nic on April 27, 2024, 04:44:20 PM
Lub at the bottom of your above message you wrote:

Quoteall boads manfuactured since start of April will leave the factory with R64 = 100k.

Did you mean R62?
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on April 29, 2024, 08:24:19 AM
Yes, R62. I have edited previous post.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: Gaël on May 02, 2024, 10:51:04 AM
Hi!

I don't know if it's the same problem or if I have to create a new topic.

I have random boot error at starting up with SMTP1 LIME.

On thunderstorm power cuts, some boards don't restart.
On 20 STMP157-OLinuXino-LIME2H-EXT (Rev_B) boards in production approximately 5 boards remain frozen (its are distributed in several places so its do not suffer exactly the same power cuts).

I never have this problem when I reboot a board with a "shutdown -r now" (or "shutdown -r 00:00").

I did some tests and after some power cuts, the board stay frozen on :

U-Boot SPL 2021.04+olimex-1-20240312.134658 (Mar 12 2024 - 13:48:01 +0000)
Model: STM32MP1 OLinuXino-LIME
Init AXP209 PMIC
VDD Core set to: 1350 mv

When the board is frozen, the reset button don't help, the error returns.
A quick power cut doesn't help either.
For the error to disappear, the board need to be unplugged several seconds.
Quick power cuts seems to cause the problem every time.

NB : I tried with 1 A and 2 A power supply.
Title: Re: Olimex STMP1 SOM - Error starting up
Post by: LubOlimex on May 07, 2024, 09:05:15 AM
It sounds like the same issue. You can easily test if it is - try the image without AXP initialization just to confirm. If these five boards that never boot, boot with it, then it is the same issue. Image is here:

https://ftp.olimex.com/TEMP/SOM-NO-AXP209/STM32MP1-OLinuXino-SOM-bullseye-minimal-20240328-133932.img.7z

Contact us at support@olimex.com about the issue with the boards that won't start when power supply is applied.

> Quick power cuts seems to cause the problem every time.

This is expected by design. All boards should fail to start if there was not enough time for the capacitors to discharge. There should be at least 2 seconds between power ups.