Bottom Page

Thread Rating:
  • 2 Vote(s) - 2.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Offline audio to text (Speech Recognition)
#1
I have hundreds of audio files (mp3) of a teaching course and because of copyright,etc, we are not permitted to upload the files. Therefore, I need to be able to convert the audio/speech to text offline.

I have recently installed the "Uberi" Speech Recognition package. There were a number of problems I initially encountered, but that was due to ensuring the correct packages had been installed. Also, some issue to do with using python2 or python3, pip or pip3,etc.

The system details are:
Kubuntu 16.04.3 LTS (xenial), kernel 4.4.0-101-generic
Python 2.7.12 and 3.5.2 (both installed)
pip 9.0.1 from /home/********/.local/lib/python2.7/site-packages (python 2.7)
Speech_Recognition 3.7.1
PyAudio 0.2.11

When I run
python -m speech_recognition
and speak a few words or many words, the test displayed is either perfect or _almost_ perfect. I later realised by examining the code that is used there, that the Google services are used. Hence the output is very good/accurate.

As the requirement is to do this offline, I have tested the sample python script in the /examples path .. audio_transcribe.py The input file is english.wav , but the output is just 'garbage'.

Being new to python (but not to programming), I'm currently unable to follow what to change to get the SpeechRecognition package to do this offine (not Google,IBM, Bing,etc).

I notice that pip, python2 and python3 are located in ~/.local/bin and ~/.local/lib , as per the problems with installing and not knowing where packages should be installed. Would also prefer to only run python3, but see there are a number of Kubuntu packages that rely on python2.

If I run
python3 audio_transcribe.py
there are no errors, other than the garbage output. If I run
python audio_transcribe.py
, there is an error message
Quote:Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly.
Quote
#2
Check Out: CMUSphinx

Pocketsphinx works offline but may not be as good as google STT.
Quote
#3
(Dec-04-2017, 11:07 AM)hbknjr Wrote: Check Out: CMUSphinx

Pocketsphinx works offline but may not be as good as google STT.

Thanks, my understanding at present is that CMUSphinx == Pocketsphinx , however I will do some more research.
Quote
#4
(Dec-04-2017, 10:34 AM)jehoshua Wrote: Would also prefer to only run python3, but see there are a number of Kubuntu packages that rely on python2.
You can look at Linux Python 3 environment.
Quote
#5
(Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment.

Great, thanks. Looks like an easy to follow guide, and I like the idea of setting up seperate environments.
Quote
#6
(Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment.

Is pip a reflection of python dependancies ? I ran the following

pip list
Quote:DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the "list" section) to disable this warning.
netifaces (0.10.4)
pip (9.0.1)
PyAudio (0.2.11)
pygobject (3.20.0)
setuptools (20.7.0)
SpeechRecognition (3.7.1)
vboxapi (1.0)
wheel (0.29.0)
youtube-dl (2017.11.15)

pip3 list
Quote:DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the "list" section) to disable this warning.
apt-xapian-index (0.47)
apturl (0.5.2)
chardet (2.3.0)
command-not-found (0.3)
defer (1.0.6)
language-selector (0.1)
pexpect (4.0.1)
Pillow (3.1.2)
pip (9.0.1)
pocketsphinx (0.1.3)
ptyprocess (0.5)
PyAudio (0.2.8)
pycups (1.9.73)
pycurl (7.43.0)
pygobject (3.20.0)
python-apt (1.1.0b1)
python-debian (0.1.27)
python-systemd (231)
reportlab (3.3.0)
requests (2.9.1)
setuptools (20.7.0)
six (1.10.0)
SpeechRecognition (3.7.1)
ssh-import-id (5.5)
ubuntu-drivers-common (0.0.0)
ufw (0.35)
unattended-upgrades (0.1)
urllib3 (1.13.1)
wheel (0.29.0)
xkit (0.0.0)

pip and pip3 are both showing as version 9.0.1

The small script that I'm using is based on the example script from https://github.com/Uberi/speech_recognit...nscribe.py

#!/usr/bin/env python3

import speech_recognition as sr

# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")

# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file

# recognize speech using Sphinx
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))
and most of it is inaccurate, some words are okay though. I was advised to try and break things up into sentences. This can be done by repeatedly calling r.listen instead of r.record. So, 'r.record' in the above code reads the entire file and then does the sphinx processing. I need to change the script to do the r.listen.

Have done a bit of searching and realise it is dome within a 'while' loop, but just can't find the exact code. Psuedo code would be something like ..

Quote:Set the audio file path
use the audio file as the source
Read a part of the file
if EOF end
else
Call 'r.listen'
Call sphinx functions
End
Quote
#7
(Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment.

Have recently setup a 'bare bones' laptop and use it as a test web server. There are 2 "deepspeech-server" packages that I wish to setup/test and evaluate, so the Python 3 environment seems ideal for that. The packages are https://github.com/MainRo/deepspeech-server and https://github.com/ashwan1/django-deepspeech-server

The test laptop has Kubuntu 17.10.1 , python 2.7.14 and python 3.6.3

I assume that these packages will use python 3.6.3 as a default, or is there something I need to specify to force the use of python 3.6.3 only ?
Quote
#8
pip list
shows all of the packages loaded for a (your) specific version of python.
To find out which one, type:
pip -V
Quote
#9
Reading the part of the file is easy but what happens if the chunk ends in the middle of a word? You have to determine somehow where to cut.

Fast googling gave me this:
https://stackoverflow.com/questions/3645...-in-python
https://github.com/antiboredom/audiogrep

I hope it will give you some guidance.

I found this in pypi: https://pypi.python.org/pypi/SpeechRecognition/
It has can do it offline too. Give it a try
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Quote
#10
Have installed the following packages after reading docs on here and elsewhere.

python3-venv
python3-pip

Then as these test servers will be accessed from another computer on the LAN, I assumed the $HOME/Public path was going to be the place to setup the 2 virtual environments. So ran this

python3 -m venv /home/********/Public/Servers/django-deepspeech-server
looked in that path and it all looks okay. But really, I would have no idea...lol

(Jan-16-2018, 03:16 AM)Larz60+ Wrote:
pip list
shows all of the packages loaded for a (your) specific version of python.
To find out which one, type:
pip -V

pip3 list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the
  • section) to disable this warning.
    apt-xapian-index (0.47)
    asn1crypto (0.22.0)
    certifi (2017.4.17)
    chardet (3.0.4)
    command-not-found (0.3)
    cryptography (1.9)
    cupshelpers (1.0)
    distro-info (0.17)
    httplib2 (0.9.2)
    idna (2.5)
    keyring (10.4.0)
    keyrings.alt (2.2)
    language-selector (0.1)
    olefile (0.44)
    pexpect (4.2.1)
    Pillow (4.1.1)
    pip (9.0.1)
    pycrypto (2.6.1)
    pycups (1.9.73)
    pygobject (3.24.1)
    python-apt (1.4.0b3)
    python-debian (0.1.30)
    pyxdg (0.25)
    PyYAML (3.12)
    reportlab (3.4.0)
    requests (2.18.1)
    SecretStorage (2.3.1)
    setuptools (36.2.7)
    six (1.10.0)
    systemd-python (234)
    ubuntu-drivers-common (0.0.0)
    ufw (0.35)
    unattended-upgrades (0.1)
    urllib3 (1.21.1)
    wheel (0.29.0)
    xkit (0.0.0)

    pip3 -V
    pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

    (Jan-16-2018, 05:29 AM)wavic Wrote: Reading the part of the file is easy but what happens if the chunk ends in the middle of a word? You have to determine somehow where to cut.

    Fast googling gave me this:
    https://stackoverflow.com/questions/3645...-in-python
    https://github.com/antiboredom/audiogrep

    I hope it will give you some guidance.

    It sure will, thanks. I needed to somehow work out how to split some audios into single words, to make the 'learning' a lot easier. The packages that I'm going to test are meant to be rated at error less than 10% as it is based on Mozilla DeepSpeech, but it would be nice to add some extra learning based on single words.

    (Jan-16-2018, 05:29 AM)wavic Wrote: I found this in pypi: https://pypi.python.org/pypi/SpeechRecognition/
    It has can do it offline too. Give it a try

    Thanks, yes that was the one I initially tried, and there was a sample python script to test audio to text. It was setup to use the default, which is Sphinx/Pocketsphinx, or you can use Google, IMB/Watson or the http://wt.ai site. The transcription error rate was higher from the non Google/Youtube ones, and Google only allowed a file of very small size. So the online ones like IMB, Google, etc, you have to pay and the file has to sit on their servers. The offline ones I tested had much higher transcription errors, so not really suitable for me.

    Following the tutorial at https://python-forum.io/Thread-Basic-Par...1#pid18261 , I'm not sure how to equate the following

    Quote:# Make virtualenv that use Python 3.5
    mint@mint ~/Desktop/my_env $ virtualenv -p /usr/bin/python3.5 my_env
    mint@mint ~ $ cd my_env

    for Python 3.6 ?
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Tensorflow offline build from source on CentOS 7 riotto 3 770 Mar-21-2019, 07:16 PM
Last Post: riotto
  [Plot a stacked bar graph using plotly offline mode] niks250891 1 2,222 Apr-22-2018, 02:11 PM
Last Post: niks250891
  AttributeError: module 'plotly' has no attribute 'offline' charlesczc 8 6,426 Jan-21-2018, 08:34 AM
Last Post: buran

Forum Jump:


Users browsing this thread: 1 Guest(s)