August 27, 2017
ESP8266 Watchdogs in Arduino A Better ESP8266 Loop Watchdog and Better Recovery

The ESP8266 contains two watchdog timers: a hardware timer with a 7 to 8 seconds lifespan and a software timer with a shorter lifespan slightly over 3 seconds long. These watchdogs are the subject of a previous post. Just like others, I have decided that a third watchdog is a good idea for some of my ESP8266/Arduino based projects. That's because it is very easy to write code that feeds the watchdogs but nevertheless goes off the deep end.

A variant of this third watchdog will be added to two projects based on the ESP8266. The first is a Sonoff with custom firmware that, interestingly enough, will be used as an external watchdog for our ISP provided modem which has not been reliable lately. I have had to trek down to the basement on numerous occasions to power cycle that device. The modem was connected to a mechanical timer which turned it off for about a quarter of an hour at 4am every day while we were away for a couple of weeks. Having an assurance that the home automation system which runs on the modem's WiFi network would not be out of commission for more than 24 hours was comforting. However the timing was put off when power to the house was lost for a while. The Sonoff will replace the mechanical timer and toggle the power to the modem only when necessary and for much shorter intervals.

The second project is based on the Wemos D1 mini which will act as garage door monitor. I have mentioned this project before and it is coming close to completion.

Table of Contents

  1. Need for a Third Watchdog Timer?
  2. Adding a Loop Watchdog (lwdt)
  3. Setting an Appropriate lwdt timeout interval
  4. Improved lwdt Watchdog

  1. Need for a Third Watchdog Timer?
  2. All that is needed to freeze the micro-controller is an endless loop that updates the ESP watchdogs. This is what happens in case '3' of the following sketch (available for download: esp_3rd_watchdog_01.ino

    /* * esp_3rd_watchdog_01.ino */ extern "C" { #include "user_interface.h" } int getBootDevice(void) {  int bootmode;  asm (    "movi %0, 0x60000200\n\t"    "l32i %0, %0, 0x118\n\t"    : "+r" (bootmode) /* Output */    : /* Inputs (none) */                    : "memory" /* Clobbered */             );  return ((bootmode >> 0x10) & 0x7); } void setup() {    Serial.begin(115200);  Serial.printf("\n\nReason for reboot: %s\n", ESP.getResetReason().c_str());  Serial.println("----------------------------------------------");  if ( getBootDevice() == 1 ) {    Serial.println("\nThis sketch has just been uploaded over the UART.");    Serial.println("The ESP8266 will freeze on the first restart.");    Serial.println("Press the reset button or power cycle the ESP now");    Serial.println("and operation will be resumed thereafter.");    while (1) { yield(); }  } } char user_input() {  while (1) {    Serial.println();    Serial.println("This is a test of the ESP watchdogs. In the top text box enter a number (1, 2, or 3)");    Serial.println("  1 - to go endless loop that does not feed both ESP watchdogs");    Serial.println("  2 - to go endless loop that does not feed the software ESP watchdog");    Serial.println("  3 - to go endless loop that does feed ESP watchdog");    Serial.println("and then press the Enter key or click on the [Send] button");    Serial.println();      while (!Serial.available()) {      delay(1);    }    char ch = Serial.read();    if (ch >= '1' && ch <= '3') {      return ch;    } else if (ch != '\n') {      Serial.println("\nInvalid entry");    }  }   } void loop() {  switch (user_input()) {    case '1':      Serial.println("\n\nStarting endless loop without feeding both ESP watchdogs");      Serial.println("The software watchdog should timeout after 3 seconds and reset the ESP...");      while (1) {} // software watchdog timeout            break;    case '2':      Serial.println("\n\nStarting endless loop without feeding the software ESP watchdogs");      Serial.println("The hardware watchdog should timeout after 7 seconds and reset the ESP...");      ESP.wdtDisable();      while (1) {} // hardware watchdog timeout      break;    case '3':        Serial.println("\n\nStarting endless loop feeding ESP watchdogs");      Serial.println("Wait at least 10 seconds to verify that the watchdogs will not catch");      Serial.println("the problem and then press the reset button to break out of this loop!");      while (1) { delay(2); };      break;    defaut:      delay(1000);      break;     }   }<

    The only way to break the ESP out of its endless loop is to press its reset button or to power it off. For the garage door monitor that would mean climbing about 12 feet up a ladder to reach the device or toggling off the correct circuit breaker at the electrical panel. Neither of these two methods is particularly appealing.

    There is a better way.

  3. Adding a Loop Watchdog
  4. Markus (Links2004) added a sketch managed watchdog timer to the built-in ESP8266 watchdogs which will prevent the type of endless loop created in the previous example. Its operation is illustrated in a modified version of the previous sketch.

    /* * esp_3rd_watchdog_02.ino */ extern "C" { #include "user_interface.h" } #include <Ticker.h> Ticker lwdTicker; #define LWD_TIMEOUT  15*1000  // Reboot if loop watchdog timer reaches this time out value unsigned long lwdTime = 0; /* *  Returns the number of milliseconds elapsed since  start_time_ms. */   unsigned long elapsed_time(unsigned long start_time_ms) {  return millis() - start_time_ms; } /* * lwdTicker interrupt service routine (ISR) */ void ICACHE_RAM_ATTR lwdtISR(void) {  if (elapsed_time(lwdTime) > LWD_TIMEOUT)  {    // could perform other actions    ESP.restart();    } } int getBootDevice(void) {  int bootmode;  asm (    "movi %0, 0x60000200\n\t"    "l32i %0, %0, 0x118\n\t"    : "+r" (bootmode) /* Output */    : /* Inputs (none) */                    : "memory" /* Clobbered */             );  return ((bootmode >> 0x10) & 0x7); } void setup() {    Serial.begin(115200);  Serial.printf("\n\nReason for reboot: %s\n", ESP.getResetReason().c_str());  Serial.println("----------------------------------------------");  if ( getBootDevice() == 1 ) {    Serial.println("\nThis sketch has just been uploaded over the UART.");    Serial.println("The ESP8266 will freeze on the first restart.");    Serial.println("Press the reset button or power cycle the ESP now");    Serial.println("and operation will be resumed thereafter.");    while (1) { yield(); }  }    lwdTime = millis();  lwdTicker.attach_ms(LWD_TIMEOUT, lwdtISR); // attach lwdt interrupt service routine to ticker } char user_input() {  while (1) {    Serial.println();    Serial.println("This is a test of the ESP watchdogs and the lwdt watchdog.");    Serial.println("In the top text box enter a number (1, 2, 3, or 4)");    Serial.println("  1 - to go endless loop that does not feed any watchdog");    Serial.println("  2 - to go endless loop that does not feed the software ESP watchdog");    Serial.println("  3 - to go endless loop that does not feed the ESP watchdogs");    Serial.println("  4 - to go endless loop that does feed ESP and lwdt watchdogs");    Serial.println("and then press the Enter key or click on the [Send] button");    Serial.println();      while (!Serial.available()) {      lwdTime = millis();      delay(10);    }    char ch = Serial.read();    if (ch >= '1' && ch <= '4') {      return ch;    } else if (ch != '\n') {      Serial.println("\nInvalid entry");    }  }   } void loop() {  lwdTime = millis(); // feed loop watchdog (restart timeout timer)    switch (user_input()) {    case '1':      Serial.println("\n\nStarting endless loop without feeding any watchdog");      Serial.println("The ESP software watchdog should timeout in 3 seconds and reset the ESP...");      while (1) {} // software watchdog timeout      break;    case '2':      Serial.println("\n\nStarting endless loop without feeding the software ESP watchdog");      Serial.println("The ESP hardware watchdog should timeout in 7 seconds and reset the ESP...");      ESP.wdtDisable();      while (1) {} // hardware watchdog timeout      break;    case '3':        Serial.println("\n\nStarting endless loop without feeding ESP watchdogs");      Serial.println("The lwdt watchdog should timeout in 15 seconds and reset the ESP...");      while (1) { delay(2); }; // lwdt watchdog timeout      break;    case '4':        Serial.println("\n\nStarting endless loop feeding ESP and lwdt watchdogs");      Serial.println("Will need to press the reset button to break out of this loop!");      while (1) { delay(2); lwdTime = millis(); }; // will not timeout      break;    defaut:      delay(1000);      break;     }   }

    Available for download: esp_3rd_watchdog_02.ino.

    The loop watchdog timer (lwdt) is implemented using a Ticker object, lwdTicker which will invoke a callback function attached to it every 15 seconds. The callback routine, lwdtcb() checks the elapsed time since the last time the watchdog was fed and if it is greater than the timeout period, it restarts the ESP8266.

  5. Setting an Appropriate lwdt timeout interval
  6. In principle, the lwdt should be fed at one point only: at the start of the program loop code. Thus the lwdt is a fail-safe mechanism that ensures that the ESP8266 is continuously executing the main program loop. Clearly, the timeout period of the lwdt should be greater than that of the built-in ESP8266 watchdogs and greater than the worst case scenario for executing the program loop. It is good practice to add some extra time above that. In our case 15 seconds seemed a reasonable timeout period.

    Even this simple example shows that some care must be exercised when feeding the lwdt. The program waits for user input in the function cleverly named user_input(). This could take longer than 15 seconds. Accordingly, all watchdogs (the built in software and hardware and the loop watchdog) are fed in the loop waiting for user input:

       while (!Serial.available()) {
          lwdTime = millis();
          delay(10); 
        }

    But the last case ('4') shows that feeding the lwdt elsewhere than at the top of the main program loop can be a problem. That loop feeding all watchdogs is effectively disabling them. Hardly a fail-safe mechanism!

    Its best to stick to the recommendations:

    1. Set the lwdt timeout period long enough to accomplish all the operations in the main program loop. Be careful, when estimating the time required for blocking operations (network, serial, file) which are unpredictable in length.
    2. Don't feed the lwdt except for the require one time feeding in each main program loop (usually at the top of the loop).

  7. Improved lwdt Watchdog
  8. What happens if the newly flashed firmware goes rogue and starts clobbering memory? As pointed out by , it could fill the lwdTime variable with a value. This is one instance where an incrementing timer is nominally better than a decrementing one. If a decrementing watchdog timer is constantly set to the value 2345688799 by a runaway program, it will never bite since it will never reach 0. Presumably, at some point millis() will be greater than 2345688799 + LWD_TIMEOUT so the loop watchdog timer will bite. But that can be cold comfort since the elapsed time rollover period is more than 49 days.

    A simple improvement would be to use two global variables for the timer: lwdTime which as before holds the time the watchdog was last fed and lwdTimeout which will contain the value in lwdTime plus a constant, which I have chosen to be LWD_TIMEOUT. Then the Ticker callback routine checks that the watchdog has been fed in the last LWD_TIMEOUT period as before and check that the difference between the two global variable+s is still LWD_TIMEOUT. What are the chances that a rogue program changing the value of lwdTime would change the value of lwdTimeout correctly at the same time ?

    Here are the changes to make.

    unsigned long lwdTime = 0; unsigned long lwdTimeout = LWD_TIMEOUT; void ICACHE_RAM_ATTR lwdtcb(void) {  if ((millis() - lwdTime > LWD_TIMEOUT) || (lwdTimeout - lwdTime != LWD_TIMEOUT))  {    ESP.restart();    } } void lwdtFeed(void) {  lwdTime = millis();  lwdTimeout = lwdTime + LWD_TIMEOUT; } void loop() {  lwdtFeed();  ... }

    The complete sketch is available for download: esp_3rd_watchdog_03.ino.

    Now that updating the lwdt watchdog timer is more complex, a lwdtFeed() function has been added. When adding this to a sketch it may be tempting to replace the test in the lwdt callback function with the similar

      if ((millis() - lwdTime > LWD_TIMEOUT) || (lwdTimeout - lwdTime != LWD_TIMEOUT))
    but that would be a bad idea. The test millis() > lwdTimeout will not give the expected result when roll over occurs if lwdTime > 0xFFFFFFFF - LWD_TIMEOUT. At that point lwdTimeout (which would be greater than 0xFFFFFFFF) becomes a small value less than LWD_TIMEOUT. The inequality will be true, the watchdog will bite even though the timeout period has not been reached.

    This "improvement" may be over the top. Markus did suggest that his simpler version was useful. On the other hand, this change is based on only one measure proposed by Niall Murphy and Jack Ganssle for increasing the reliability of software watchdogs. I may come back to this subject, if I find there is a need to follow their advice.

    References:

    Markus (Links2004) (2016), Watchdog like functionality for the Arduino loop.
    Murphy, Niall (2000), Watchdog Timers.
    Ganssle, Jack (2016), Great Watchdog Timers for Embedded Systems.

ESP8266 Watchdogs in Arduino A Better ESP8266 Loop Watchdog and Better Recovery