In a previous post (Arduino Sketch Managed ESP8266 Watchdog) in this series, I talked about using a Sonoff WiFi switch as a router watchdog. It was working in a fashion, but I decided that I had to improve the firmware and hardware.
Some way of suspending monitoring had to be added. I found that the Sonoff would power cycle the router while I was trying to change some of the latter's settings. Each time that happened, I had to trudge up and down two flights of stairs to physically remove the Sonoff. After a few rounds of this dance, I removed the switch completely which defeated the whole purpose, of course.
Secondly, the Sonoff/watchdog restarted the router only when the local Wi-Fi network went down. There was no check for loss of connection to the Internet. While the former is a necessary element of the home automation system which runs mostly on the local wireless network, the later is also important for personal and professional reasons.
Table of Contents
- The Router Monitor
- What Others Have Done
- Is the Network Down?
- Conflicting Goals
- Network Monitoring State Machine
- Support Capabilities
- Hardware Interface
- Commands
- Installing Router Monitor
The Router Monitor
I will not go into the details about the hardware. The case of a Sonoff Basic was shortened by sawing off the ends meant to clamp the input and output wires. It was then hot glued to the bottom plate of a plastic wall mounted device box. Although not visible on the photographs, holes were drilled in the bottom plate to give access to the Sonoff tactile switch and the LED. A push button switch was installed along the top of the box. A standard North American duplex receptacle was installed inside the device box. One receptacle is controlled by the Sonoff Basic and that is where the router's AC adapter is plugged into. Power is always available at the other receptacle. The following photographs should make all this clear.
What Others Have Done
Not long after my experiment with a Sonoff Basic, Charlie Romer (It Kinda Works) produced a couple of videos about rebooting a router when the Internet connection is lost. He used the same technique of turning the router off and on as I had, but it was Internet access which was monitored instead of just Wi-Fi connection. I would suggest listening to the Charlie Romer videos before reading on: Router Booter - Never reboot your router again! and the update Router Booter - What went wrong? He responded to critics that suggested that he should use a Sonoff to do this by saying that he preferred building the hardware using a Wemos D1 mini, a relay board and a power supply because more is learned that way. Fair enough. I happen to think the software is the more interesting aspect of this project so that I concentrated my efforts on creating more sophisticated firmware for the Sonoff. If you want, there is nothing to prevent you from using my firmware with Charlie Romer type hardware. As a mater of fact, I developed the firmware on a Wemos D1 mini emulating a Sonoff.
A crowdfunding project back in 2016, WiReboot has a slightly different take. It is an ESP8266 based gadget that sits between the router power supply and the router and interrupts the power when the connection is lost.
My original router monitor and these two devices are rather dumb brutes that only know one way of dealing with a perceived network problem. They turn the power off to the router, wait a short while and then turn the power back.
Some router makers provide a software watchdog based on ICMP requests. See ICMP Watchdog in the Ubiquiti Networks devices and Manual:System/Watchdog on the MikroTik Wiki. There is a page about Hardware Watchdog on the OpenWrt site. It goes on to discuss using a USB watchdogs which could be of interest if you can tinker with both the software and hardware of the router. If more information is needed about these cheap devices look at How to create your own usb watchdog script by David Gouveia. Such devices would have the advantage of performing system restarts before going to the drastic step of pulling the plug.
Is the Network Down?
Let's start by discussing how to test if the router is performing its job correctly. Here I propose to test the typical box supplied by an Internet service provider (ISP). These are typically multifunction devices acting as a bridge to the ISP servers and thence on to the rest of the Internet, as Ethernet switches, as Wi-Fi hotpoints, DHCP server, firewall, and so on. Since I have decided to use an ESP8266 based device to act as the router watchdog, lots of things done by the router will not be tested. Actually only two things are tested:
- The 2.4GHz Wi-Fi network.
- Internet access.
Testing the 2.4 GHz Wi-Fi network is particularly important because it is the backbone of the home automation system. I am keen on having this system up and running dependably especially when I am away from the house. There is no need to explain why Internet access is important anymore. I will add that I have a vested interest in the proper functioning of the network because others in the household need the Internet for personal and professional reasons.
It may seem incorrect, but I will argue that the Wi-Fi network must be running to get access to the Internet. It is true that when the radio is down, the Ethernet switch part of the router could be functioning, or the 5 GHz radio could still be working. However, there are only a couple of older computers that are wired into the router, all others, plus all tablets, smart phones, etc. use Wi-Fi. And they also use the 2.4 GHz band because the range of the 5 GHz band does not quite cover all area of the house.
That means that the two tested services can be in one of three states:
- The Wi-Fi network is up and the Internet is reachable,
- The Wi-Fi network is up and the Internet is unreachable,
- The Wi-Fi network is down (and consequently the Internet is unreachable).
How can it be determined that the Wi-Fi network is up or down? On an ESP device, this is easily verified.
The test for determining if the Internet in unreachable is a
ping
(ICMP request) to a major site known to be (almost) always
up. While I have no solid data to support this assertion, the router monitor will
often power cycle the router for no good reason when the service is in state
2 with such a cursory test. A host on the Internet could be unreachable
because of all sorts of problems that have nothing to do with the router or
the immediate connection between it and the ISP. Perhaps the site chosen as a
target for an ICMP request is down. Perhaps it is under a denial of service
attack. Perhaps there is a problem with the domain name system and the IP
address of the target site cannot be obtained. Perhaps there is a major power
outage and the only backbone that can be used to get to my isolated location
is not available. Turning the router off and on will not do anything to fix
these problems.
Given the amount of time needed for the router to power up and reboot and for all wireless devices to reconnect, it is clearly desirable to ensure that the router monitor act only when it is certain that it must. Accordingly, Wi-Fi must not be connected for at least 30 seconds before state 3 is declared. In the same vein, I have chosen to ping three different Internet hosts instead of only one. Contact with all of them must be lost for a certain amount of time before the Internet is deemed unreachable. During that grace period, the Internet hosts are regularly pinged.
In other words, two watchdogs are setup. Each time the loop()
function in the sketch is executed, the Wi-Fi watchdog will be fed if the
Wi-Fi network is connected. Similarly the Internet watchdog is fed if a ping
with one of the targets was successful. Here is the function that
tests if the wireless network is functioning and if the Internet can
be reached.
The two variables lastTimeWifiUp
and
lastTimeInternetReached
record the clock tick count when
the Wi-Fi and Internet watchdogs were last fed by the testNetwork()
function. If the time elapsed since the last feeding is greater than
config.wifiDownInterval
or config.internetLostInterval
then the routine returns ad WIFI_DOWN
or INTERNET_UNREACHABLE
value; otherwise a NET_OK
value is returned. The only complications
are the use of multiple ping targets as explained before and the minimum
config.intervalBetweenPings
milliseconds delay between successive
pings. That is implemented with the lastPingTime
timer. This is
to ensure that the router monitor does not overburden the network with ping
requests.
Conflicting Goals
For the sake of the home automation system, the 2.4 GHz Wi-Fi network needs to be running without interruption. Essential parts of this system do not need access to the Internet. In fact, the only Internet-based services that do not have a local backup are weather and tide updates. Clearly loss of Internet access is just an inconvenience for the home automation system. But for humans about the house having a functional Wi-Fi network without access to the Web is not that useful. They would prefer that the router be restarted as often as possible when the Internet is unreachable, which interferes with the home automation system. Such are the conflicting goals associated with having a single Wi-Fi network handle both the Internet of Things and normal Internet usage.
A compromise is implemented. The spinning or cool-down period after turning the router off and then on is shorter when trying to reestablish the Wi-Fi connection and longer when trying to reach the Internet. That way the home automation system will be able to operate more or less unimpeded for longer periods of time when it is only access to the Internet is lost.
The better solution would be to operate two distinct Wi-Fi networks. That has been in the plans for quite a long while but I am not sure I can justify the expense of setting up an edge router, intelligent switches and separate Wi-Fi networks.
Network Monitoring State Machine
It seems a bit pompous to talk about a state machine in this case as
the router monitor can be in one of only four states. In my defence, the
testNetwork
function described in the previous section
was actually implemented in the state machine in the first version of this
sketch.
Hopefully, the state machine will spend most of its time in the
monitoring
state in which all it does is check if the
network is down. If it should happen to be down, the time-out period
will be set according to the reason for the loss of network access and then
the machine will move on into the cycling state. This is a short period
when the power to the router is turned off. Once power is restored the
state machine will be idle for quite a while, giving the router ample time
to boot and some respite before potential follow up power cycles.
As can be seen, the state machine will never reach the DISABLED
state on its own. That state can only be entered and left as a result of a
command from the user.
This is a very simple routine. Amazingly, given the size of the sketch, this is all there is to the core function of the router monitor. The rest of the code provides support routines.
Support Capabilities
There are a number of auxiliary functions in the sketch some of which will be briefly described below.
Hardware Facilities
The Sonoff Basic LED is used to report the current state of the monitor. The Sonoff tactile button and another push button can be used to control the relay, enable or disable the monitoring function, initiate an over-the-air update of the firmware and restart the device. Details are provided in the next section.
MQTT Functionality
The primary method to interact with the router monitor is through an
MQTT broker. The ESP subscribes to the routermon-1/command
topic and respond by publishing messages to the routermon-1/response
topic.
Here is an example of how this works. First open a terminal and subscribe to all topics related to the device.
Then open a second terminal and publish a message, in this case the
help
command.
The command and the response will be displayed in the first terminal.
Some details about all the commands are given in a section further down.
Logging Facilities
The source code contains numerous logging messages. Some of these are used for debugging purposes but most are informational messages as will be explained. Logging messages are sent to four destinations.
- the serial port of the ESP8266 (and displayed on any terminal such as the Arduino IDE serial monitor connected to the UART port),
- a Syslog server through the monitored Wi-Fi network.
- a MQTT broker through the monitored Wi-Fi network.
The files logging.h
and logging.ino
contain the
code that performs the logging functions. All logging messages are
accompanied with a logging level parameter which will determine if the
message is actually sent on to each of the logging destinations. Please
note that this is only a partial implementation of the usual Syslog protocol.
It is not possible to pick and chose that only "alert" and "warning"
messages will be displayed. Each destination has a threshold level and
all messages with that priority or higher will be sent to the destination.
It is important to realize that all messages sent to the MQTT broker are in fact sent through the logging function. In order to use MQTT to control the router monitor as explained above, the threshold log level for MQTT logging should thus be set at "info" or "debug".
Command Processing
The commands.h
and commands.ino
files contain
the command interpreter. My apologies for the quality of the code. It more or
less grew in size as commands were added without much regard for an overall
design. It is clearly in need of refactoring. Indeed, I suspect there are
better ways of implementing the interpreter altogether. Nevertheless it does
work and it does incorporate a minimal error reporting mechanism which
hopefully will help the user understand why a command was not executed.
OTA Updates
This firmware is a work in progress. So it was important to include
a mechanism to update the firmware without needed to take the device apart.
So an update
command exists which will download a new firmware file
from a web server. The network on which the web server can be found
can be specified with the ota
command, and the URL of the firmware file
can be set using the url
command with the (ota
option).
I have included my own ESP8266 watchdog
routines to avoid infinite restart loops that could be introduced by a
wayward firmware update. A known "good" version of the firmware will be
reloaded over the air if a restart loop is detected. It uses the same web
server as that used for the update
command. The URL of the good
version of the firmware can be specified with the url
command
using the auto
option this time.
Persistent Settings
If the name or password of the monitored Wi-Fi network needs to be
changed, then in all likelihood, this should be permanent. This can be
accomplished by saving all the important settings in persistent memory on the
ESP8266. This is done with the command config save
.
All the settings are in a structure called config
which
is defined in the config.h
file. The default values for
all the settings are also defined in the same file. The code implementing
the functions that save and load the settings from persistent memory,
erases the latter, reloads the default values are in the file
named config.ino
.
Hardware Interface
The Sonoff Basic LED can be observed through a hole in the bottom of the device. It displays various flashing patterns depending on the state of the monitor.
- Heartbeat: two 2/10th second flashes with a short off time between repeated every two seconds.
- The router monitor is in
MONITORING
state, which means that the Wi-Fi 2.4 GHz network is up and the Internet can be reached or any loss of these functions has been for too short a period to confirm the loss. - 50% duty cycle: 1/2 second on, 1/2 second off.
- The router monitor is turning the power off to the router for 10 seconds.
- Double heartbeat: four 1/10th second flashes with a short off time between repeated every three seconds.
- The router monitor is in
SPINNING
state, which giving the router and wireless device time to recover from the restart of the router.. - Almost always on: very short 2/100th second interruptions every two seconds.
- The router is powered up, and the router monitor is disabled.
- Almost always off: very short 2/100th second flashes every two seconds.
- The router is not powered, and the router monitor is disabled.
- Always off.
- The firmware is being updated.
Either the Sonoff push button or another push button wired across the ESP8266 GPIO14 pin and ground can be used to physically control the device to some extent.
- Single button click.
- This toggles the state of the Sonoff Basic relay (i.e. power to the router). When power to the router is controlled manually in this fashion, the router monitor is disabled.
- Two button clicks.
- This toggles the state of the router monitor. If it was disabled,
the router monitor is put in
MONITORING
mode. It the state machine was enabled (no matter if it was inMONITORING
,CYCLING
orSPINNING
state), it is disabled. This does not affect the Sonoff relay. - Four or more button clicks.
- This launches an over-the-air update of the router monitor firmware.
- Long button press.
- Restarts the device. This is more or less the equivalent of removing power from the Sonoff device and then powering it up again.
Commands
Much finer control can be achieved with commands that can be transmitted to the ESP8266 by a serial connection or a through an MQTT broker. Of course the serial connection with the ESP8266 UART is not very practical and its main purpose is to facilitate software development. Here is the typical output displayed in the serial monitor of the Arduino IDE as the ESP is powered up.
Everything up to the last four lines is output by the firmware. The first
two of the last four lines are in response to a help
command
send via the serial connection. First the firmware echoes the command
preceding it with the source which could be uart
, as in this
case, or mqtt
if the command had been send as an MQTT message.
The help
commmand is executed which in this case amount
to listing all the commands known to the monitor. Details about each command
can be obtained by entering the command after help
. The last
two lines of the output correspond to such a command.
I used a simplified Backus-Naur form akin to the Wirth syntax notation to describe the options of each command. The principal parts of the notation are:
- Mandatory element: (a | b ) exactly one of a or b must be provided
- Optional element: [a | b ] at most one of a or b can be provided
- User specified value: <string>
There can be more than two elements in the ( ) or [ ] list if necessary.
- clientip [-clear | <ip>, <gateway>, <mask>]
- Reports (no options) or sets the client IP address, gateway and subnet mask.
These must be valid IPv4 address such as192.168.0.99
.-clear
will let the monitored network assign the IP, gateway subnet mask (DHCP). - config (save|load|default|erase)
- Manages the configuration.
save
saves the current configuration to persistent memory.load
replaces the current configuration with the saved configuration in persistent memory.default
replaces the current configuration with default values (does not change any saved configuration in persistent memory).erase
removes any saved configuration in persistent memory. Default values will be used on the next restart.
load
ordefault
will not take effect. It may be necessary to save the configuration to persistent memory and then restart the ESP8266. (This is not the best approach and needs at a minimum better documentation.) - cycle [<ms>]
- Turns router off and then back on after
ms
milliseconds. Ifms
is not specified or set to 0, the wait will be the same as when Wi-Fi is down (10 seconds by default). - help [<command>]
- Display succinct help messages.
help
with no command displays a list of command.help
with a command displays the command options. - log (uart|mqtt|syslog) [<level>]
- Reports (no level option) or sets the level of the specified log output.
The level can be specified numerically or by name:- 0 - emerg
- 1 - alert
- 2 - crit
- 3 - err
- 4 - warning
- 5 - notice
- 6 - info
- 7 - debug
- monitor [on|off]
- Reports (no options) or sets Wi-Fi/Internet monitoring.
- mqtt [<ip>|<port>]
- Reports (no options) or sets the MQTT broker IP address and port number.
- name
- Not yet implemented.
- net [<ssid> [<password>]]
- Reports (no options) or sets the monitored Wi-Fi network point credentials.
If the password is not specified, the Wi-Fi network must be open to all. If the password is specified, it must contain at least 8 characters and no spaces. The password is never reported. - ota [-clear|<ssid> [<password>]]
- Reports (no option) or sets the credentials of the Wi-Fi network used for over-the-air update of this device firmware.
If the password is not specified, the OTA Wi-Fi network must be open to all. If the password is specified, it must contain at least 8 characters and no spaces. The password is never reported. - ping <host>
- Pings a specified host.
- reach [(1|2|3) (<host>)]
- Reports (no option) or sets the hosts that are pinged to verify if the Internet can be reached.
- restart
- Restarts this device.
- router [(on|off|toggle)]
- Reports (no option) or sets the router power outlet on or off.
Toggling the power outlet on or off can also be done with a single button press. If the state is changed, monitoring is turned off. It can be turned back on with two buttons presses or the monitor command. Be careful if giving this command through an MQTT broker as the Wi-Fi connection will be lost. Thecycle
command might be more appropriate in that case. - syslog [<ip>|<port>]
- Reports (no options) or sets the Syslog server IP address and port number.
- time (wifi|internet|ping|longwait|shortwait|cycling|connect|mqtt [<ms>])
- Reports or sets time intervals in milliseconds.
- topic (in|out) [<topic>]
- Reports or sets MQTT topics.
in
is the topic to which the device is subscribed,out
is the topic used to publish to the broker. Send commands to thein
topic and subscribe to theout
topic to see the result. - update [<url>]
- Flashes devices firmware. If the
url
is not given, uses theota
url. - url (ota|auto) [<url>]
- Reports (no
url
given) or sets the firmware url. Theota
url is the default url for the ota, theauto
url is used when the devices boots if it is trapped in a boot cycle. - version
- Reports the current firmware version.
Installing Router Monitor
The firmware routermon_1 is an Arduino sketch for ESP8266 based devices with at least 1 MB of flash memory. ESP8285 devices should also work.
Before flashing the firmware on a device, some defines in config.h
should be modified.
- HOST_NAME
- The host name is used as the prefix for MQTT in and out topics. The same name is assigned to the ESP8266 Wi-Fi module. Valid host names are comprised of letters (upper and lower case) and digits. The hyphen "-" may also be used but must not be at the start or end of the name. It is best to limit the length of the name to 31 characters.
- NET_SSID
- The name of the 2.4 GHz Wi-Fi network to be monitored.
- NET_PSK
- The password of the monitored Wi-Fi network.
- MQTT_HOST
- The IP address or domain name of the MQTT broker. The default MQTT port can be changed if necessary. For the time being a secure connection to the MQTT broker is not implemented.
- OTA_URL
- The URL of the binary file to be downloaded and flashed on the
ESP8266 when an
update
command is given. It is possible to bypass this URL by specifying an optional URL in theupdate
command. - AUTO_URL
- The URL of the binary file to be downloaded and flashed on the ESP8266 when a restart loop is detected by the ESP8266 loop watchdog.
You may also want to change other default values. For example, instead
of using google.com
and other well-known web sites as targets
to check if the Internet is reachable, I prefer using major DNS servers with
fixed IP addresses. Pinging these is not noticeably faster, but it does
bypass the domain name system which could be a fault even if technically
the Internet is reachable. A web search will quickly yield good targets.
It is also possible to set a static IP address to be used instead of relying on a dynamically assigned IP address by the DHCP server on the monitored network. This is not that significant in this current version of the router monitor, but there should be a Web server in a future version. In that case, it would be nice to have a static address to reach the router monitor web page. And even in this version it could be helpful to be able to ping the device at a known address.
The Arduino sketch can be downloaded by clicking on the following link: routermon_1 (v 0.2.4).
Of course ESP8266 libraries, such as the Wi-Fi, UDP and HTTP Update libraries are used. These will have been installed in the Arduino IDE when the ESP8266 Core was added with the Boards manager. I am using three libraries of my own and these will have to be downloaded from here and installed in the Arduino IDE. There is a tutorial on how to install additional Arduino libraries.
Finally, third party libraries are used.
- PubSubClient by Nick O'Leary
- ESP8266Ping by Daniele Colanardi
PubSubClient
can be installed with the Arduino IDE Library
Manager. ESP8266Ping
will have to be downloaded and installed
manually in the same way as my own libraries have to be added to the
Arduino IDE.