Olimex Support Forum

Microcontrollers => ESP32 => Topic started by: Symonn on August 06, 2020, 03:29:52 PM

Title: ESP32-POE/ISO unstable after 10 days of power up
Post by: Symonn on August 06, 2020, 03:29:52 PM
Hi, i have one ESP32-POE and ESP32-POE and i'm using the Olimex RS485 module to read a power meter that is placed near the ESP32 (5 meters of CAT5 cable - 9600 8N1).

My device has this workflow (it's powered via PoE using a Cisco 2960x switch):


This is the main logic that "blocks" the loop and wait for some minutes:


void loop() {
  wdt_reset();
  MQTT_connect();

  client_subscriber_loop(); // loop subscriber mqtt

  unsigned long currentMillis = millis();
  if (currentMillis - previousMillisTaskDataPublisher > MQTT_PUBLISH_EVERY) {
    if (DEBUG == 1)
      Serial.printf("Free internal heap on Loop Enter %u\n", ESP.getFreeHeap());
    //  OTA Check
    trueverit_ota_check_update();
    wdt_reset();

    publish_data();
    if (DEBUG == 1)
      Serial.printf("Free internal heap on Loop Enter %u\n", ESP.getFreeHeap());

    previousMillisTaskDataPublisher = currentMillis;
  }
}



As you can see, i've also have Watchdog timer enabled and it is feeded in the loop or function that have to feed it (Modbus read, etc...)

Now, here comes the issues, this works exactly for 10 days and after that the MCU goes in loop, the LAN green led will flash indefinitely and nothing work, leaving the ESP32 in a undefined state.

This is the debug message that i see on arduino console:

(https://i.imgur.com/K3IV2fe.png)

There is no way to make it work a part of power off and power on the device.

This is extremely uncomfortable...in my case i have 25 of those devices placed far away from my office (600km) and i can't go every time to reboot them every ten days.

Any toughts?
Title: Re: ESP32-POE/ISO unstable after 10 days of power up
Post by: JohnS on August 06, 2020, 07:57:22 PM
If it is repeatable and consistent as you say, it's 99.99% likely to be software so go hunting around your code & the Arduino code.

John
Title: Re: ESP32-POE/ISO unstable after 10 days of power up
Post by: kyrk.5 on August 07, 2020, 11:21:33 AM
First of all I am not an ESP expert :) However I usually get called when something does not work with an embedded device :)

For me it looks like from the screenshoot as the ESP would try to update itself. Since it ends up with an error it does a reset and try it again. Endless. If you would disable this updating feature the endless loop would be broken. Or you have to make sure that this update always happens.

The second question is why the ESP is entering this loop at all? I guess the watchdog does not gets triggered. It could be also an other kind of reset cause, like short power failure. To check if this is a watchdog reset, deactivate the watchdog, wait 10 day and check. If there is no reset then yes it is a watchdog reset.

Let us assume this was a watchdog reset. The third question is why does not the watchdog get not triggered. Here we have a problem. Either you need Chuck Norris who starr down the code as long the code does not confess every bug or you have to find them self. Since the ESP is flashed over bootloader and people seem to forget what a debugger is, it is not so easy. If you would have a debugger, I would suggest to deactivate the watchdog, wait 10 days and press stop on the debugger and check the status of your software and registers. But I guess you do not have a debugger. I think flashing is not posibble over JTAG but debugging is possible. I am not familiar with the arduino so I guess it is not possible to debug it so easy.

So now the question is how to find the root cause why the watchdog does not gets triggered. Try to search forums. Maybe this is a know limitation of the ESP firmware so that it does get hangs every 10 days and people just accept this and live with this. Check you software if you have somewhere a timer or counter that can count up to max 10 day and then it does overflows. Is it 10 days exactly? Maybe 12.8 day? Or something power of 2. This would also give a hint where to look.

My opinion is: I think the ESP is good for play and for learning and experiment. But building a product on it is quite risky. Since there is limited debug option, it can become a disaster when you have already items in the field and bug happens. And then you have nothing in your hand to find the problem and then fix it. The firmware might also become a nightmare. I guess it is only a binary blob, so if there is a problem, there is no way to look inside and analyse it. Maybe fix it on your own risk, or ask the vendor to fix the problem.


Title: Re: ESP32-POE/ISO unstable after 10 days of power up
Post by: Symonn on August 08, 2020, 05:29:15 PM
I agree with kyrk.5, the ESP32POE/POE-ISO definively not fit very well with production projects.

I'm investigating on that, anyway, i think i'll switch on another board (wESP32) that costs two times more but has support for JTAG debug interface and full compliant IEEE 802.3at Type 1 Class 0 PoE with 12W of available power at 12V.

Title: Re: ESP32-POE/ISO unstable after 10 days of power up
Post by: olimex on August 11, 2020, 09:29:18 PM
looks like software memory leak issue
I would not trust Arduino IDE project for reliable project, Espressif SDK is also with lots of hidden mines, you have to go through your code very carefully

I know software guys always first blame the hardware and I'm interested to see what the result with the other board will be
Title: Re: ESP32-POE/ISO unstable after 10 days of power up
Post by: kyrk.5 on August 16, 2020, 01:42:55 PM
Maybee we should take a look at the complete source code to find the problem