2024-10-23
md
Monitoring Temperatures in an openmediavault NAS

temp widgetThere was a little problem monitoring the CPU temperature of a network attached storage device based on an up-to-date version of openmediavault. The temperature widget was stuck at 27.8 °C as shown on the right, which was obviously wrong. The fix was easy enough as will be shown, but that is only part of the monitoring done. It is more useful to transmit the CPU temperature to a home automation server at regular intervals. While doing that, an email notification will be sent if the temperature is over a specified threshold.

Table of Content

  1. Installing the CPU Temperature Widget
  2. Reading the Correct Thermal Zone
  3. Monitoring More Than One Thermal Zone
  4. AMD Ryzen CPU
  5. Remote Monitoring
    1. Python Scripts
    2. Cron Job

Installing the CPU Temperature Widget toc

First thing first. How is that temperature widget added to the dashboard? That requires an omv-extras plugin to be installed.

The installed plugin must now be enabled.

The CPU temperature widget will then be displayed in the dashboard. Some may be lucky and the widget will display the correct temperature. As already explained this was not the case with our Intel N5105 based NAS.

Reading the Correct Thermal Zone toc

As far as I know, it is necessary to open an SSH session on the NAS to change the ovm-cputemp settings. This means that the SSH service has to be enabled in OMV.

ssh service in OMV

Running a little bash script that I wrote with the help of the usual suspects on the Web, the problem became obvious.

michel@vault:~$ cputemp/get_thermal_zones /sys/devices/virtual/thermal/thermal_zone0: 27°C acpitz /sys/devices/virtual/thermal/thermal_zone1: 42°C x86_pkg_temp

The plugin is reading thermal zone 0, while the needed CPU temperature is available in thermal zone 1. The zone can be changed with an environmental variable.

michel@vault:~$ sudo omv-env set "OMV_CPU_TEMP_COMMAND" "cat /sys/devices/virtual/thermal/thermal_zone1/temp" michel@vault:~$ sudo omv-salt stage run prepare debian: ... Summary for debian ------------ Succeeded: 6 (changed=5) Failed: 0 ------------ Total states run: 6 Total run time: 17.358 s michel@vault:~$ sudo omv-mkworkbench all michel@vault:~$ sudo monit restart omv-engined

Be patient with the Salt prepare command, it took 17 seconds without any feedback. Once all the steps are followed, the CPU temperature widget displays a more accurate value.

temp widget fixed

Monitoring More Than One Thermal Zone toc

In April 2024 additional temperature widgets were added in version 7.0.1 of the omv-cputemp plugin. Since there is no other thermal zone on our NAS, I will illustrate this functionality by installing thermal zone 0 again.

michel@vault:~$ sudo omv-env set "OMV_CPU_TEMP_COMMAND2" "cat /sys/devices/virtual/thermal/thermal_zone0/temp" michel@vault:~$ sudo omv-salt stage run prepare michel@vault:~$ sudo omv-mkworkbench all michel@vault:~$ sudo monit restart omv-engined

In truth, thermal_zone0 is the default value and it would not be necessary to set it as above. Now, it's a matter of enabling the second widget in the dashboard settings.

Here is the result.

I had to play around to get the two temperature widgets to be next to each other. Perhaps that would happen automatically when opening a new web client after closing all web clients that were open beforehand. I found that enabling the CPU widget and then removing it did the trick.

When removing a temperature widget, the corresponding environment variable should probably be removed also.

michel@vault:~$ sudo omv-env unset "OMV_CPU_TEMP_COMMAND2" michel@vault:~$ sudo omv-salt stage run prepare michel@vault:~$ sudo omv-mkworkbench all michel@vault:~$ sudo monit restart omv-engined

By the way, just running these commands can easily push the Intel N5105 CPU temperature to 58° C.

AMD Ryzen CPU toc

If the CPU is an AMD Ryzen, then thermal zones will probably not be updated correctly. The lm-sensors package may provide the needed information. And it will be necessary to write some scripts to get the data from the sensors utility that comes with the package. It's explained in Guide - Custom cpu temp script for openmediavault-cputemp plugin by Aaron Murray (ryecoaaron).

Remote Monitoring toc

The fact is, the OMV Web management page is very rarely consulted. Most of the time, the NAS does its work quietly in the background and I never really think about checking the CPU temperature and so on. And when I do, it's to used the Domoticz home automation system that shows the NAS CPU temperature and power usage.

Domoticz NAS sensors

Furthermore, Domoticz logs the values written to those sensors and produces graphs that are entertaining and that may yet prove useful.

Domoticz Log

The NAS is powered from an Itead Sonoff Pow smart Wi-Fi switch. The open source Tasmota firmware on the switch transmits data about power consumption to Domoticz through an MQTT broker. This is all easily put together and relatively low cost. It or something similar has been described in previous posts.

What follows are the bits of "glue", to update the NAS_Temperature virtual sensor in Domoticz. This is quite easy to do with Domoticz API/JSON URL's (sic, I have doubts about the apostrophe), but it could also be done with MQTT messages.

Python Scripts toc

Transmitting the CPU temperature is done with a couple of Python scripts that are executed at regular intervals by a cron task.

The first Python script is called cputemp.

#!/usr/bin/python3 # coding: utf-8 # Python script to read the CPU temperature and send the results to # a Domoticz server. Meant to be executed by cron at regular intervals. # # Requires: Python 3 # # Version: 2.1 # Date: 2024-08-20 # License: 0BSD@https://spdx.org/licenses/0BSD.html # Needed module here because LOG_XXX constants are used in parameters from syslog import * # User defined parameters ------------------------------------------------------- alertTemp = 60 # Hot cpu core temperature threshold cpu_zone = "/sys/class/thermal/thermal_zone1/temp" cpu_temp_idx = 211 # Domoticz device index for cpu temperature sensor on vault.local domoticzJson = "http://192.168.1.22:8080/json.htm?type=command&param=udevice&idx={}&nvalue=0&svalue={}" domoticzTimeout = 15 # HTTP request timeout in seconds alertTitle = 'NAS Temperature Alert' alertMsg = "The temperature of the NAS CPU, {}°C, is above the alert threshold {}°C" verbose = 0 # 0 quiet, 1 to echo log messages with priority consoleloglevel = LOG_ERR # log level for messages printed to the console sysloglevel = LOG_ERR # log level for messages sent to syslog logAlertLevel = LOG_ALERT # priority level for e-mail temperature alerts # ------------------------------------------------------------------------------- # Needed modules in addition to syslog import sys import os from pymail import send_email from urllib.request import urlopen # with python 3.x, use "from urllib2 import urlopen" with python 2.7 if verbose > 0: from datetime import datetime # Routine to send messages to syslog and echo it to the console def log(level, msg): syslog(level, msg) if (verbose) and (level <= consoleloglevel): print(datetime.now().strftime('%Y-%m-%d %H:%M:%S ') + msg) # Setup syslog openlog(ident='NAS') setlogmask(LOG_UPTO(sysloglevel)) # HTTP Get request def httpRequest(url): try: log(LOG_DEBUG, 'CPU: Domoticz GET request: {}'.format(url)) # use context magager which takes care of closing hf with urlopen(url, timeout=domoticzTimeout) as hf: # get Domoticz' response and broadcast it response = hf.read().decode('utf-8') if ('"ERR"') in response: llevel = LOG_ERR else: llevel = LOG_DEBUG log(llevel, 'Domoticz response: {}'.format(response)) except Exception as e: log(LOG_ERR, 'Exception: {}'.format(e)) # Read and report cpu temperature with open(cpu_zone) as f: cpuTemp = "{0:0.1f}".format(int(f.read())/1000.0) log(LOG_DEBUG, 'CPU: temp = {}°C'.format(cpuTemp)) httpRequest(domoticzJson.format(cpu_temp_idx, cpuTemp)) # Raise alert if needed if float(cpuTemp) > alertTemp: message = alertMsg.format(cpuTemp, alertTemp) log(LOG_DEBUG, "Alert, title: '{}', message: '{}'".format(alertTitle, message)) log(logAlertLevel, message) log(LOG_DEBUG, 'Sending email notification') send_email(alertTitle, 'temperature.alert@gmail.com', message)

Hopefuly, that script is straight forward. It reads the value in /sys/class/thermal/thermal_zone1/temp and then divides it by 1000 to get the CPU temperature in degrees centigrade. This value is passed on to the Domoticz server in an HTTP request. Furthermore, if the temperature is above the threshold alterTemp, then an e-mail alert is sent. Here is my rather simple module for sending emails, pymail.py.

#Import smtplib for the actual sending function import smtplib, ssl # Import the email modules we'll need from email.mime.text import MIMEText SRC = '***me@**host_url****' SMPT = '***mail@**host_url****' PORT = 465 PWD = '****************' def send_email(subject, dest, message): print('send_email({}, {}, {})'.format(subject, dest, message)) # Create the message msg = MIMEText(message) msg['Subject'] = subject msg['From'] = SRC msg['To'] = dest context = ssl.create_default_context() with smtplib.SMTP_SSL(SMPT, PORT, context=context) as server: server.login(SRC, PWD) server.sendmail(SRC, dest, msg.as_string()) print('msg:', msg['Subject'], msg['From'], msg['To']) # ref: https://realpython.com/python-send-email/ #

Of course, the values of SRC, SMPT, PORT and PWD will have to be adjusted. It would have been possible to have Domoticz take care of the notification when the temperature is above a threshold, but I thought it better to send the email alert directly from the NAS. While writing this, it seems that it might be a good idea to have warnings coming from Domoticz also, but only if the problem persists over a longer period to avoid flooding my email account.

These two scripts and the `get_thermal_zones` script are available in a GithHub Gist: sigmdel/cputemp.py.

Cron Job toc

Initially, a cron job was set up to run every five minutes.

# Edit this file to introduce tasks to be run by cron. ... # For more information see the manual pages of crontab(5) and cron(8) # # m h dom mon dow command */5 * * * * /home/michel/cputemp/cputemp

That turned out to be an unwise choice. There would be no problem 11 times out of 12 each hour. However there was a error, a timeout, when cputemp was run at the start of each hour. Here is one of the email messages sent every hour by cron.

Subject: [vault.local] Cron /home/michel/cputemp/cputemp Message: 2024-10-21 01:00:11 Exception: timed out

As can be seen from the power usage graph, these timeouts corresponded to increased energy usage and presumably higher temperatures. While trying to make sense of these, obviously, non-random errors, I remembered that automatic backups of the Domoticz database were enabled. Listing the hourly backups showed that they occurred at the start of the hour.

nestor@domo:~$ ls -l domoticz/backups/hourly total 73024 ... -rwxrwxrwx 1 root root 3088384 Oct 21 08:00 backup-hour-08-Domoticz.db -rwxrwxrwx 1 root root 3096576 Oct 21 09:00 backup-hour-09-Domoticz.db -rwxrwxrwx 1 root root 3096576 Oct 21 10:00 backup-hour-10-Domoticz.db ...

Looking at the approximately 40 logs messages between 8:59 and 9:01 clinched it. Here are the three pertinent messages.

domo@domoserver:~$ journalctl -S "2024-10-21 08:59:55" -U "2024-10-21 09:01:00"
... Oct 21 09:00:01 domoserver domoticz[2221929]: 2024-10-22 09:00:01.091 Status: Starting automatic database backup procedure... ... Oct 21 09:00:12 domoserver domoserver[2314143]: Exception: timed out ... Oct 21 09:00:39 domoserver domoticz[2221929]: 2024-10-22 09:00:39.591 Status: Ending automatic database backup procedure...

The HTML request sent to Domoticz by the cputemp script at the turn of the hour was timing out because Domoticz was busy backing up its database. The spikes in power usage show that the NAS was getting hot under the collar waiting for a reply from the otherwise busy Domoticz server. The solution was to change the timing of the cputemp cron job.

# Edit this file to introduce tasks to be run by cron. ... # For more information see the manual pages of crontab(5) and cron(8) # m h dom mon dow command 3-59/5 * * * * /home/michel/cputemp/cputemp # # Reference: Run Cron job every N minutes plus offset # @https://stackoverflow.com/questions/12786410/run-cron-job-every-n-minutes-plus-offset