Updated: April 11, 2018
The installation of two Python libraries for voice recognition is taken up below. First I will go back to the hotword recognition engine snowboy from KITT.AI. Rather than relying on the work of a third party as in a previous post, it is installed from its repository on GitHub.
Secondly, the SpeechRecognition library by Anthony Zhang (Uberi) will be installed. This is a well made library that provides a uniform Python interface with many speech recognition engines. I used it to access two online services: Google Speech Recognition and Microsoft Bing Voice Recognition as well as with the off line engine Pocket Sphynx from Carnegie Mellon University. Mr Zhang has also considerably simplified adding three languages (French, Italian and Mandarin) to that engine.
Table of Contents
- Starting Point
- Installing Python 3
- Installing Development Prerequisites
- Installing Audio Prerequisites
- Installing snowboy
- Installing SpeechRecognition
dietpi@domopiz:~$ mkdir speechrecognition dietpi@domopiz:~$ cd speechrecognition dietpi@domopiz:~/speechrecognition$ mkvenv pvenv creating virtual environment /home/dietpi/speechrecognition/pvenv updating virtual environment /home/dietpi/speechrecognition/pvenv dietpi@domopiz:~/speechrecognition$ ve pvenv (pvenv) dietpi@domopiz:~/speechrecognition$ pip install -U PyAudio ... Successfully installed PyAudio Cleaning up... (pvenv) dietpi@domopiz:~/speechrecognition$ pip install -U SpeechRecognition Successfully installed SpeechRecognition Cleaning up... (pvenv) dietpi@domopiz:~/speechrecognition$
Now we do not have the example scripts.
(pvenv) dietpi@domopiz:~/speechrecognition$ wget https://github.com/Uberi/speech_recognition/archive/master.zip (pvenv) dietpi@domopiz:~/speechrecognition$ unzip -j master.zip "speech_recognition-master/examples/*" -d examples ... (pvenv) dietpi@domopiz:~/speechrecognition$ cd examples (pvenv) dietpi@domopiz:~/speechrecognition/examples$ python calibrate_energy_threshold.py ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ... Say something! I said: "Hello, how's it going" Google Speech Recognition thinks you said hello how's it going (pvenv) dietpi@domopiz:~/speechrecognition/examples$However
(pvenv) dietpi@domopiz:~/speechrecognition/examples$ python microphone_recognition.py ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline Say something! I said: "Hello, how's it going" but the program did not respond ^CTraceback (most recent call last): File "microphone_recognition.py", line 11, inaudio = r.listen(source) File "/home/dietpi/speechrecognition/pvenv/lib/python3.4/site-packages/speech_recognition/__init__.py", line 652, in listen buffer = source.stream.read(source.CHUNK) File "/home/dietpi/speechrecognition/pvenv/lib/python3.4/site-packages/speech_recognition/__init__.py", line 161, in read return self.pyaudio_stream.read(size, exception_on_overflow=False) File "/home/dietpi/speechrecognition/pvenv/lib/python3.4/site-packages/pyaudio.py", line 608, in read return pa.read_stream(self._stream, num_frames, exception_on_overflow) KeyboardInterrupt Looking at the two scripts, it became obvious that the difference was in a single line found in
calibrate_energy_threshold.py
just before recording sounds which was missing frommicrophone_recognition.py
.#!/usr/bin/env python3 # NOTE: this example requires PyAudio because it uses the Microphone class import speech_recognition as sr # obtain audio from the microphone r = sr.Recognizer() with sr.Microphone() as source: r.adjust_for_ambient_noise(source) # listen for 1 second to calibrate the microphone print("Say something!") audio = r.listen(source)With that simple addition,
microphone_recognition.py
worked.(pvenv) dietpi@domopiz:~/speechrecognition/examples$ nano microphone_recognition.py (pvenv) dietpi@domopiz:~/speechrecognition/examples$ python microphone_recognition.py ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline Say something! Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly. Google Speech Recognition thinks you said hi what day are we Traceback (most recent call last): File "/home/dietpi/speechrecognition/pvenv/lib/python3.4/site-packages/speech_recognition/__init__.py", line 885, in recognize_google_cloud try: json.loads(credentials_json) File "/usr/lib/python3.4/json/__init__.py", line 318, in loads return _default_decoder.decode(s) File "/usr/lib/python3.4/json/decoder.py", line 343, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode raise ValueError(errmsg("Expecting value", s, err.value)) from None ValueError: Expecting value: line 1 column 1 (char 0)The error is caused by the missing keys for online voice recognition. Currently, only Google Speech Recognition is available freely without prior registration. I added my key for Microsoft Bing and commented out the other engines and renamed the script. I also changed the script to get a better idea of how long the speech recognition takes to analyze the sound. The ellipsis is the last thing printed before asking the engine to convert speech to text, "you said..." is printed when the results come back.
(pvenv) dietpi@domopiz:~/speechrecognition/examples$ python microphone_recognition_en.py ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline Say something! a prize if you guess what I said Sphinx thinks ...Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly. Google Speech Recognition thinks ... you said good evening it's Saturday night Microsoft Bing Voice Recognition thinks ... you said Good evening it's Saturday night.There remains the task of installing PocketSphinx. This is easily done although it takes a while. The screen will be filled many times over as both sphinxbase and pocketsphinx are compiled.
(pvenv) dietpi@domopiz:~/speechrecognition/examples$ pip install -U pocketsphinx Downloading/unpacking pocketsphinx ... Successfully installed pocketsphinx Cleaning up... (pvenv) dietpi@domopiz:~/speechrecognition/examples$Now the script will run without complaining.
(pvenv) dietpi@domopiz:~/speechrecognition/examples$ python microphone_recognition_en.py ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline Say something! Sphinx thinks ... you said: good evening if the night Google Speech Recognition thinks ... you said: good evening it's Saturday night Microsoft Bing Voice Recognition thinks ... you said: Good evening it's Saturday night.It is painfully obvious that Pocket Sphinx runs too slowly on the Orange Pi Zero to be used in real time applications. And the combination of the quality of the microphone, my mumbling, and the power of the software gives less than optimal results. However, Google Speech Recognition and Microsoft Bing Voice Recognition work quite well, in regards to both speed and accuracy.
- More Languages in SpeechRecognition
Doing speech recognition in other languages than (American) English is rather easy with Google Speech Recognition and Microsoft Bing Voice Recognition. All that needs to be done is to add a parameter to the
r.recognize_xxxxx
where xxxx is the engine. I copiedmicrophone_recognition.py
tomicrophone_recognition_en.py
and tomicrophone_recognition_fr.py
. Then I edited the latter.# recognize speech using Sphinx try: print("Sphinx thinks ...", end="", flush=True) print(" you said: " + r.recognize_sphinx(audio)) became # recognize speech using Sphinx try: print("Selon Sphinx ...", end="", flush=True) print(" vous avez dit : " + r.recognize_sphinx(audio, language="fr-FR"))And I did the same for the two online engines also.
(pvenv) dietpi@domopiz:~/speechrecognition/examples$ python microphone_recognition_fr.py ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ... Parlez! Selon Sphinx ...Sphinx error; missing PocketSphinx language data directory: "/home/dietpi/speechrecognition/pvenv/lib/python3.4/site-packages/speech_recognition/pocketsphinx-data/fr-FR" Selon Google Speech Recognition ... vous avez dit : bonjour C'est Dimanche matin Selon Microsoft Bing Voice Recognition ... vous avez dit : Bonjour c'est dimanche matin.As you would expect that did not work for CMU Sphinx. But we got some useful information in the error message: the location of the French dictionary
The instructions for Installing other languages are not up to date, but the links to the French, Italian and Manadarin Chinese language packs are valid. So get the archive from Google Drive repository and extract the language pack to the directory kindly identified by the error message.(pvenv) dietpi@domopiz:~/speechrecognition/examples$ python microphone_recognition_fr.py ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline Parlez! Selon Sphinx ... vous avez dit : cette pour aujourd'hui Selon Google Speech Recognition ... vous avez dit : c'est tout pour aujourd'hui Selon Microsoft Bing Voice Recognition ... vous avez dit : Météo pour aujourd'hui.And the prize goes to Google. A long time ago, I had a boss known as "the Maltese Mumbler" among his staff. He probably came from Malta, but there was no doubt that he was very difficult to understand over a telephone. Perhaps I picked up his diction.
I started from a clean installation of DietPi available at dietpi.com/. Two sections describe in a previous post contain details: 2. Installing DietPi and 4. Testing the OPiZ Audio.
A word of caution. If you followed along with my previous posts and Google Assistant is running as a service on the OPiZ, it must be disabled since hot word detection will now be performed by dietpi@domopiz:~$ sudo systemctl stop google-assistant-demo.service dietpi@domopiz:~$ sudo systemctl disable google-assistant-demo.service Removed symlink /etc/systemd/system/google-assistant.service. Removed symlink /etc/systemd/system/multi-user.target.wants/google-assistant-demo.service.
The Armbian image from DietPi does not contain Python. So version 3 will be installed along
with the virtual environment module, venv
. I used the default
Debian package manager to install everything.
There is a newer version of Python 3, (version 3.6), but hopefully version 3.4 will be recent enough for my purposes.
I also installed the tools for creating and updating virtual environments described in Python 3 virtual environments. All Python code will be installed in virtual environments.
A number of packages are needed including the GNU compilers and make utility.
The Simplified Wrapper and Interface Generator (SWIG) is needed to create some Python wrappers of C/C++ libraries. It turns out that snowboy needs version 3.0.10 which is not contained in the Jessie repository.
Luckily, a newer version is available in the backports repository. Details about adding the repository can be found here. Once the package signing keys and the repository listingls have been added, the list of packages has to be updated. Then a check can be made that the available version of SWIG is recent enough before finally installing it.
I think the symbolic link swig
to swig3.0
is not created if apt-get
is asked to install
swig3.0
. I had to create that link in a previous run at this.
While is possible to get the latest version of a project on
github with wget
or by other means, I
find it useful to install the version control system git.
If installing Pocket Sphinx then a recent version of Pulse development library is needed. Again this package must be obtained from the backports repository.
As can be seen that pulled in many packages including Python 2.7.9 which I did not want... oh well.
If compiling the snowboy python wrapper from
source, then ATLAS (Automatically Tuned Linear
Algebra Software) must be installed. It automatically generates an optimized
Basic Linear Algebra Subroutines (
The module PyAudio, used by VoiceRecognition and snowboy, is a Python wrapper for the cross-platform audio I/O library PortAudio which must be installed.
VoiceRecognition also requires FLAC (Free Lossless Audio Codec). It is an open source lossless alternative to MP3.
At last, snowboy can be installed. First we get the source from the GitHub repository. Then a virtual environment will be created in that project. It is a good idea to exclude that environment from the version control system.
Activate the virtual environment and install the Python PyAudio module.
A look at the Python 3 examples shows that the three demo
programs import snowboydecoder.py
which in turn
imports snowboydetect.py
. The latter is the Python wrapper for the
shared library _snowboydetect.so
. These two files, which
should be in the swig/Python3
directory, are missing.
Time to create the missing shared library and Python wrapper.
When make
invokes the C++ compiler, g++
,
the command line displayed on the console, makes it clear that it is
the snowboy C/C++ library for the Raspian that is converted to a Python library by
SWIG. No problem, it works quite well with Armbian on the Orange Pi Zero.
Aside from a warning, everything seemed to go well, so I backed out of
the swig
directory and changed to the examples/Python3
directory, to try out the demonstration scripts.
That was a bit of an anti-climax. However, it was relatively easy to
fix the problem: just remove the relative path to snowboydetect.py
in the import
command of snowboydecoder.py
.
Now all three demonstration scripts work: