A Third ESP8266 Watchdog, Final Version

June 24, 2018
Update: June 27, 2018

A Better ESP8266 Loop Watchdog with Better Recovery

I have just finished rewriting the third and final post in French on watchdogs for the ESP8266. Those three posts were not a translation of the three posts on the same subject published in English back in August of 2017 but a complete rewrite. I am overdosing on watchdogs and will not be translating the last three posts anytime soon. Instead, here a quick introduction to the new version of the loop watchdog. It is now available as a library, which in itself represents an improvement I hope.

Here is the content of mdEspRestart.h, the header file for the mdEspRestart library.

uint8_t restartReason_t; typedef uint16_t restartCount_t; typedef uint32_t restartData_t; #define DEFAULT_RESTART_ADDRESS 0xFF #define LWD_TIMEOUT 12000 #define LOOP_END 0xFFFFFFFF #define REASON_USER_RESET 7 #define REASON_USER_RESTART 8 #define REASON_LWD_RST 9 #define REASON_LWD_LOOP_RST 10 #define REASON_LWD_OVW_RST 11 void lwdtInit(unsigned long timeout = LWD_TIMEOUT); void lwdtStamp(restartData_t data = LOOP_END); void lwdtFeed(void); restartReason_t getRestartReason(restartCount_t &count, restartData_t &data); void userReset(restartData_t data = 0); void userRestart(restartData_t data = 0); bool setRestartRtcAddress(uint8_t addr = DEFAULT_RESTART_ADDRESS);

The loop watchdog must be initialised with the lwdtInit() function. There is an optional parameter: the wait time before the watch dog bites if not fed. The default value is LWD_TIMEOUT equal to 12 seconds is double the wait time of the hardware watchdog of the ESP8266. The initialisation of the watchdog should be done near the end of the setup() function of an Arduino sketch. This is because it starts the timeout counter and in all likelihood it would not do for the loop watchdog to restart the ESP during long one-time set up procedures.

The loop watchdog must be fed at the beginning of the loop() function with the lwdtFeed() function. This is the only place where this function should appear. The function lwdtStamp() (or lwdtStamp(LOOP_END)) must be the last statement of the loop() function. In the previous version, LOOP_END was called LOOP_START which might not have been the best choice although, in my defence, the start and the end of a loop are obviously the same thing.

At the start of each important step in the sketch, progress is marked with the lwdtStamp(id) function where id is a unique numeric value between 0 and 2^32-2 = 0xFFFFFFFF-1 = 4,294,967,294 inclusively. The biggest int is reserved for LOOP_END. I called the id value the module identifier because, in my sketches, each task completed in loop() is done by a function called a module. As before, these id values must be defined in the sketch. And, as before, it is not mandatory to divide the main sketch loop into separate modules or subroutines. However, it remains a good idea to place loop watch dog stamps with unique identifiers at strategic places in a long sequence of instructions in the loop() function to know where the loop watchdog bites should that ever happen.

The getRestartReason() function is used near the beginning of the setup() function to handle the startup cycle. It returns the cause of the current restart and the number of consecutive times that the ESP was started for the same reason in the variable count. In addition to the seven codes for restarts already defined in the Espressif SDK, the library adds five. Three are associated with the loop watchdog while two are user-initiated ESP resets or restarts. It will be possible to distinguish between a reset and a restart when using the functions userReset() and userRestart() to reboot the device instead of ESP.reset() and ESP.restart() directly. Also those functions provide a simple mechanism to save an int or other 32 bit value in RTC memory before rebooting the ESP. The getRestartReason returns the saved value in data the variable after the reboot.

The value returned in the data variable by getRestartReason contains useful information when an exception or the loop watch dog is the cause of the restart.

Reason	Value	Description
REASON_DEFAULT_RST	0	normal startup by power on
REASON_WDT_RST	1	hardware watch dog reset
REASON_EXCEPTION_RST	2	exception reset (¹)
REASON_SOFT_WDT_RST	3	software watch dog reset
REASON_SOFT_RESTART	4	software restart or reset
REASON_DEEP_SLEEP_AWAKE	5	wake from deep sleep
REASON_EXT_SYS_RST	6	external system reset
REASON_USER_RESET	7	reboot with `userReset()` (³)
REASON_USER_RESTART	8	reboot with `userRestart()` (³)
REASON_LWD_RST	9	loop watch dog reset
REASON_LWD_LOOP_RST	10	loop watch dog reset outside of loop()
REASON_LWD_OVW_RST	11	loop watch dog reset because it was overwritten

Notes

	(¹)	The `data` variable contains the exception number.
	(²)	The `data` variable contains the last module identifier entered with `lwdtStamp()`.
	(³)	The `data` variable contains the value given as parameter for `userReset()` or `userRestart()`.

This table reflects changes made on June 27, 2018. There is also a new function, getResetReasonEX() that returns a String with more information about the system restart than the equivalent ESP.getResetReason(). Since I am not sure of its value, I have not included it in the library but it is shown in the stand-alone examples esp_boot_lwdt and esp_boot_lwdt_sf. Inclusion of the function is controlled with the INCLUDE_GETRESETREASONEX directive in the mdEspRestart.h and mdEspRestartSF.h header files.

You can download the mdEspRestart.zip library that can be loaded into the Arduino IDE with the library manager (menu: Sketch/Include Library/Add .ZIP Library...).

If you prefer to test it before installing it in the IDE, download esp_boot_lwdt.zip.

For situations where the RTC memory is almost entirely used for other purposes, there is a light version of the library, mdEspRestartSF.zip (SF = small footprint), which only takes 4 bytes. The esp_boot_lwdt_sf.zip example allows testing before installing in the IDE. To reduce the size of the record, the restartCount_t type is one byte and the restartData_t type occupies only two bytes. Finally, only three bits of restart.reason are used to confirm that restart is valid, the fourth is used as an additional startup reason flag. As a consequence the risk of mistaking garbage RTC memory for a valid restart struct goes from less than 4 hundredth of 1% to around 9.4%.

A Better ESP8266 Loop Watchdog with Better Recovery