ESP32-POE will not reliably boot and unable to flash

Started by justlikeef, November 21, 2022, 02:39:51 AM

Previous topic - Next topic

justlikeef

I have two boards that I successfully flashed esphome to, but after running for several minutes, they will no longer boot and will not flash.

one of them crashes immediately after boot:
ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8
[I][logger:258]: Log initialized
[C][ota:469]: There have


and the other does this over and over:
rst:0x10 (RTCWDT_RTC_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
ets Jul 29 2019 12:21:46

rst:0x10 (RTCWDT_RTC_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
ets Jul 29 2019 12:21:46

rst:0x10 (RTCWDT_RTC_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff
ets Jul 29 2019 12:21:46

rst:0x10 (RTCWDT_RTC_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
invalid header: 0xffffffff
invalid header: 0xffffffff
invalid header: 0xffffffff



Attempting to flash either gives one of several errors:

A fatal error occurred: Failed to connect to ESP32: No serial data received.
Invalid head of packet (0x0D): Possible serial noise or corruption.




I have tried various baud rates and placing various combinations of the 10mf cap and 2K resister between en and 3.3 and en and ground.  I don't have the capability of replacing the surface mount resister, but I can work with someone to troubleshoot.

I have replaced the USB cable and tried a powered hub.  Putting a scope on the 5V pin looks pretty stable.

LubOlimex

Software downloaded usually doesn't affect the ability to program via the USB. Try another USB port, try another software for programming. Maybe re-install the CH340T drivers, you can find suitable ones at the product page.

You don't need to solder around the EN, if you wish to have a boot button, we have provided a way to transform user button BUT to boot button. Unsolder resistor R47 and solder it on the pads of resistor R49. Refer to the schematic in the "Buttons" area to understand better.
Technical support and documentation manager at Olimex

justlikeef

I have tried on multiple machines and they behave the same.

I don't have the capability of soldering the SMD components.  I am using the headers.

LubOlimex

But did you try different software tool?

Maybe try with Arduino for ESP32, since it is well detailed how to install. Just try to upload anything to see if the upload works.
Technical support and documentation manager at Olimex

justlikeef

I have tried the esphome web tool and the command line esptool.  I get a similar response from both.

LubOlimex

Now that I look your older replies about forcing bootloader mode without soldering:

- you can force bootloader mode by connecting GPIO0 to GND (and if that doesn't work also connect GPIO2 to GND); this should act as having pressed BOOT button

- leave EN as it is, in this design it is down to GPIO0 state not to EN.

Both GPIO0 and GPIO2 can be found at EXT1, GPIO0 is pin #5. GPIO2 is pin #7. GND is pin #3.

Remember to leave forced bootloader mode you need to disconnect these connection afterwards.

Try to force this mode and if that allows for programming.
Technical support and documentation manager at Olimex

justlikeef

I didn't have time to look at them today.  Will do so tomorrow EST.

justlikeef

#7
Finally figured out what is going on.  I have 5 boards that are now dead under the same circumstances.

They run on POE (ran one untouched for more than a day) until I solder headers on them. After soldering the headers on and plugging the ethernet cable back in, they run for a while and then stop responding.  The charge and power light stay on, but the link and activity light go off.

To troubleshoot the issue, I turned POE off on the switch and plugged the network and USB cables in.  If I let the board sit for a while (I am assuming something is cooling off), it will boot, and the link and activity light will come on for somewhat less than a second, then go back off, repeatedly.  After a while, the serial console starts printing garbage, then eventually locks up.  Let the board sit for a while, and the process will repeat.  If you don't let it sit (unplug the USB cable and then immediately plug it back in), there is nothing printed to the serial console.

Device logs

justlikeef


LubOlimex

Interesting, so the problem appears after soldering the headers? Maybe you put too much solder and it went below and caused some short-circuit? Or maybe you heat the solder for too long and accidentally damaged some component?

There is nothing attached to the headers between the moment they were soldered and when the moment that appeared?

What happens if you desolder and remove the header?

Also you said you connect USB after failure happens on PoE, but were you careful not to have both USB powering and PoE powering attached to the ESP32-PoE at the same time?
Technical support and documentation manager at Olimex

justlikeef

Quote from: LubOlimex on December 05, 2022, 01:18:35 PMInteresting, so the problem appears after soldering the headers? Maybe you put too much solder and it went below and caused some short-circuit? Or maybe you heat the solder for too long and accidentally damaged some component?

Not out of the question, but I've soldered boards for 35 years and not had a consistent problem like this.

There are some things VERY close to the copper pads.  My guess is a bridge between GPIO0 and D9 or GPIO3 or 4 and the via that is in the middle of the GPIO4 label.  I'll find out for sure once I remove them.  Does a solder bridge to either of those make sense?

Quote from: LubOlimex on December 05, 2022, 01:18:35 PMThere is nothing attached to the headers between the moment they were soldered and when the moment that appeared?

Correct

Quote from: LubOlimex on December 05, 2022, 01:18:35 PMWhat happens if you desolder and remove the header?

Next on my list. 

Quote from: LubOlimex on December 05, 2022, 01:18:35 PMAlso you said you connect USB after failure happens on PoE, but were you careful not to have both USB powering and PoE powering attached to the ESP32-PoE at the same time?

Correct.
Quote from: justlikeef on December 05, 2022, 03:20:53 AMTo troubleshoot the issue, I turned POE off on the switch and plugged the network and USB cables in.

LubOlimex

Notice something - these headers were meant to be soldered on the other side of the board (at the bottom). The idea was that these connectors can be used to place the board on top of another board. But we've also had headers installed like you did, on the top side for a customer that requested it, and it also worked fine. So the way you placed them on top shouldn't be an issue.
Technical support and documentation manager at Olimex

justlikeef

I've added pictures of the board after removing the headers. I see no evidence of solder bridging or damaged components.  I have plugged it back into POE to see how it behaves now.

justlikeef

The problem is on the EXT1 side.  Removing the EXT2 side made no difference.  Once I removed the EXT1 side, the problem goes away.

LubOlimex

It looks like a there might have been lose solder forming a connection between one of the pads of the EXT1 connector and nearby components. Under the plastic of the connector. Around GPIO0, GPIO1, GPIO2 pads - things are not so clear, looks like traces of left out solder. Probably connection between GPIO0 and D9 or GPIO0/GPIO1 and the big C25. But D2 and R16 are also possible.

I tested it today. Got the same revision I board and tested it with the Ethernet code from the Arduino IDE package for ESP32 (File -> Examples -> Ethernet -> ETH_LAN8720). It connects to Google every ten seconds and reports something like:

13:15:58.483 -> connecting to google.com
13:15:58.671 -> HTTP/1.1 301 Moved Permanently
13:15:58.671 -> Location: http://www.google.com/
13:15:58.671 -> Content-Type: text/html; charset=UTF-8
13:15:58.671 -> Cross-Origin-Opener-Policy-Report-Only: same-origin-allow-popups; report-to="gws"
13:15:58.671 -> Report-To: {"group":"gws","max_age":2592000,"endpoints":[{"url":"https://csp.withgoogle.com/csp/report-to/gws/other"}]}
13:15:58.671 -> Date: Wed, 07 Dec 2022 11:15:57 GMT
13:15:58.671 -> Expires: Fri, 06 Jan 2023 11:15:57 GMT
13:15:58.718 -> Cache-Control: public, max-age=2592000
13:15:58.718 -> Server: gws
13:15:58.718 -> Content-Length: 219
13:15:58.718 -> X-XSS-Protection: 0
13:15:58.718 -> X-Frame-Options: SAMEORIGIN
13:15:58.718 ->
13:15:58.718 -> <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
13:15:58.718 -> <TITLE>301 Moved</TITLE></HEAD><BODY>
13:15:58.718 -> <H1>301 Moved</H1>
13:15:58.718 -> The document has moved
13:15:58.718 -> <A HREF="http://www.google.com/">here</A>.
13:15:58.718 -> </BODY></HTML>
13:15:58.718 -> closing connection

So I left it working for a while. Then soldered EXT1 header, and left it working for an hour, it is still working fine. Then tried to reprogram it and it still programs fine. Here are the pictures on how I soldered it:

https://imgur.com/a/XU5qBs9

I believe you put too much solder and it flew under the connector and caused un invisible connection under the plastic.
Technical support and documentation manager at Olimex