Offline audio to text (Speech Recognition)

Offline audio to text (Speech Recognition) - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Offline audio to text (Speech Recognition) (/thread-6710.html)

Pages: 1 2

Offline audio to text (Speech Recognition) - jehoshua - Dec-04-2017

I have hundreds of audio files (mp3) of a teaching course and because of copyright,etc, we are not permitted to upload the files. Therefore, I need to be able to convert the audio/speech to text offline.

I have recently installed the "Uberi" Speech Recognition package. There were a number of problems I initially encountered, but that was due to ensuring the correct packages had been installed. Also, some issue to do with using python2 or python3, pip or pip3,etc.

The system details are:
Kubuntu 16.04.3 LTS (xenial), kernel 4.4.0-101-generic
Python 2.7.12 and 3.5.2 (both installed)
pip 9.0.1 from /home/********/.local/lib/python2.7/site-packages (python 2.7)
Speech_Recognition 3.7.1
PyAudio 0.2.11

When I run

python -m speech_recognition

and speak a few words or many words, the test displayed is either perfect or _almost_ perfect. I later realised by examining the code that is used there, that the Google services are used. Hence the output is very good/accurate.

As the requirement is to do this offline, I have tested the sample python script in the /examples path .. audio_transcribe.py The input file is english.wav , but the output is just 'garbage'.

Being new to python (but not to programming), I'm currently unable to follow what to change to get the SpeechRecognition package to do this offine (not Google,IBM, Bing,etc).

I notice that pip, python2 and python3 are located in ~/.local/bin and ~/.local/lib , as per the problems with installing and not knowing where packages should be installed. Would also prefer to only run python3, but see there are a number of Kubuntu packages that rely on python2.

If I run

python3 audio_transcribe.py

there are no errors, other than the garbage output. If I run

python audio_transcribe.py

, there is an error message

Quote:Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly.

RE: Offline audio to text (Speech Recognition) - hbknjr - Dec-04-2017

Check Out: CMUSphinx

Pocketsphinx works offline but may not be as good as google STT.

RE: Offline audio to text (Speech Recognition) - jehoshua - Dec-04-2017

(Dec-04-2017, 11:07 AM)hbknjr Wrote: Check Out: CMUSphinx

Pocketsphinx works offline but may not be as good as google STT.

Thanks, my understanding at present is that CMUSphinx == Pocketsphinx , however I will do some more research.

RE: Offline audio to text (Speech Recognition) - snippsat - Dec-04-2017

(Dec-04-2017, 10:34 AM)jehoshua Wrote: Would also prefer to only run python3, but see there are a number of Kubuntu packages that rely on python2.

You can look at Linux Python 3 environment.

RE: Offline audio to text (Speech Recognition) - jehoshua - Dec-05-2017

(Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment.

Great, thanks. Looks like an easy to follow guide, and I like the idea of setting up seperate environments.

RE: Offline audio to text (Speech Recognition) - jehoshua - Dec-06-2017

(Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment.

Is pip a reflection of python dependancies ? I ran the following

pip list

Quote:DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the "list" section) to disable this warning.
netifaces (0.10.4)
pip (9.0.1)
PyAudio (0.2.11)
pygobject (3.20.0)
setuptools (20.7.0)
SpeechRecognition (3.7.1)
vboxapi (1.0)
wheel (0.29.0)
youtube-dl (2017.11.15)

pip3 list

Quote:DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the "list" section) to disable this warning.
apt-xapian-index (0.47)
apturl (0.5.2)
chardet (2.3.0)
command-not-found (0.3)
defer (1.0.6)
language-selector (0.1)
pexpect (4.0.1)
Pillow (3.1.2)
pip (9.0.1)
pocketsphinx (0.1.3)
ptyprocess (0.5)
PyAudio (0.2.8)
pycups (1.9.73)
pycurl (7.43.0)
pygobject (3.20.0)
python-apt (1.1.0b1)
python-debian (0.1.27)
python-systemd (231)
reportlab (3.3.0)
requests (2.9.1)
setuptools (20.7.0)
six (1.10.0)
SpeechRecognition (3.7.1)
ssh-import-id (5.5)
ubuntu-drivers-common (0.0.0)
ufw (0.35)
unattended-upgrades (0.1)
urllib3 (1.13.1)
wheel (0.29.0)
xkit (0.0.0)

pip and pip3 are both showing as version 9.0.1

The small script that I'm using is based on the example script from https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py

#!/usr/bin/env python3

import speech_recognition as sr

# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")

# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file

# recognize speech using Sphinx
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

and most of it is inaccurate, some words are okay though. I was advised to try and break things up into sentences. This can be done by repeatedly calling r.listen instead of r.record. So, 'r.record' in the above code reads the entire file and then does the sphinx processing. I need to change the script to do the r.listen.

Have done a bit of searching and realise it is dome within a 'while' loop, but just can't find the exact code. Psuedo code would be something like ..

Quote:Set the audio file path
use the audio file as the source
Read a part of the file
if EOF end
else
Call 'r.listen'
Call sphinx functions
End

RE: Offline audio to text (Speech Recognition) - jehoshua - Jan-16-2018

(Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment.

Have recently setup a 'bare bones' laptop and use it as a test web server. There are 2 "deepspeech-server" packages that I wish to setup/test and evaluate, so the Python 3 environment seems ideal for that. The packages are https://github.com/MainRo/deepspeech-server and https://github.com/ashwan1/django-deepspeech-server

The test laptop has Kubuntu 17.10.1 , python 2.7.14 and python 3.6.3

I assume that these packages will use python 3.6.3 as a default, or is there something I need to specify to force the use of python 3.6.3 only ?

RE: Offline audio to text (Speech Recognition) - Larz60+ - Jan-16-2018

pip list

shows all of the packages loaded for a (your) specific version of python.
To find out which one, type:

pip -V

RE: Offline audio to text (Speech Recognition) - wavic - Jan-16-2018

Reading the part of the file is easy but what happens if the chunk ends in the middle of a word? You have to determine somehow where to cut.

Fast googling gave me this:
https://stackoverflow.com/questions/36458214/split-speech-audio-file-on-words-in-python
https://github.com/antiboredom/audiogrep

I hope it will give you some guidance.

I found this in pypi: https://pypi.python.org/pypi/SpeechRecognition/
It has can do it offline too. Give it a try

RE: Offline audio to text (Speech Recognition) - jehoshua - Jan-16-2018

Have installed the following packages after reading docs on here and elsewhere.

python3-venv
python3-pip

Then as these test servers will be accessed from another computer on the LAN, I assumed the $HOME/Public path was going to be the place to setup the 2 virtual environments. So ran this

python3 -m venv /home/********/Public/Servers/django-deepspeech-server

looked in that path and it all looks okay. But really, I would have no idea...lol

(Jan-16-2018, 03:16 AM)Larz60+ Wrote:
pip list
shows all of the packages loaded for a (your) specific version of python.
To find out which one, type:
pip -V

pip3 list

DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the

section) to disable this warning.
apt-xapian-index (0.47)
asn1crypto (0.22.0)
certifi (2017.4.17)
chardet (3.0.4)
command-not-found (0.3)
cryptography (1.9)
cupshelpers (1.0)
distro-info (0.17)
httplib2 (0.9.2)
idna (2.5)
keyring (10.4.0)
keyrings.alt (2.2)
language-selector (0.1)
olefile (0.44)
pexpect (4.2.1)
Pillow (4.1.1)
pip (9.0.1)
pycrypto (2.6.1)
pycups (1.9.73)
pygobject (3.24.1)
python-apt (1.4.0b3)
python-debian (0.1.30)
pyxdg (0.25)
PyYAML (3.12)
reportlab (3.4.0)
requests (2.18.1)
SecretStorage (2.3.1)
setuptools (36.2.7)
six (1.10.0)
systemd-python (234)
ubuntu-drivers-common (0.0.0)
ufw (0.35)
unattended-upgrades (0.1)
urllib3 (1.21.1)
wheel (0.29.0)
xkit (0.0.0)
```
pip3 -V
```
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

(Jan-16-2018, 05:29 AM)wavic Wrote: Reading the part of the file is easy but what happens if the chunk ends in the middle of a word? You have to determine somehow where to cut.

Fast googling gave me this:
https://stackoverflow.com/questions/36458214/split-speech-audio-file-on-words-in-python
https://github.com/antiboredom/audiogrep

I hope it will give you some guidance.

It sure will, thanks. I needed to somehow work out how to split some audios into single words, to make the 'learning' a lot easier. The packages that I'm going to test are meant to be rated at error less than 10% as it is based on Mozilla DeepSpeech, but it would be nice to add some extra learning based on single words.

(Jan-16-2018, 05:29 AM)wavic Wrote: I found this in pypi: https://pypi.python.org/pypi/SpeechRecognition/
It has can do it offline too. Give it a try

Thanks, yes that was the one I initially tried, and there was a sample python script to test audio to text. It was setup to use the default, which is Sphinx/Pocketsphinx, or you can use Google, IMB/Watson or the http://wt.ai site. The transcription error rate was higher from the non Google/Youtube ones, and Google only allowed a file of very small size. So the online ones like IMB, Google, etc, you have to pay and the file has to sit on their servers. The offline ones I tested had much higher transcription errors, so not really suitable for me.

Following the tutorial at https://python-forum.io/Thread-Basic-Part-1-Linux-Python-3-environment?pid=18261#pid18261 , I'm not sure how to equate the following

Quote:# Make virtualenv that use Python 3.5
mint@mint ~/Desktop/my_env $ virtualenv -p /usr/bin/python3.5 my_env
mint@mint ~ $ cd my_env

for Python 3.6 ?