Offline audio to text (Speech Recognition) - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Offline audio to text (Speech Recognition) (/thread-6710.html) Pages:
1
2
|
Offline audio to text (Speech Recognition) - jehoshua - Dec-04-2017 I have hundreds of audio files (mp3) of a teaching course and because of copyright,etc, we are not permitted to upload the files. Therefore, I need to be able to convert the audio/speech to text offline. I have recently installed the "Uberi" Speech Recognition package. There were a number of problems I initially encountered, but that was due to ensuring the correct packages had been installed. Also, some issue to do with using python2 or python3, pip or pip3,etc. The system details are: Kubuntu 16.04.3 LTS (xenial), kernel 4.4.0-101-generic Python 2.7.12 and 3.5.2 (both installed) pip 9.0.1 from /home/********/.local/lib/python2.7/site-packages (python 2.7) Speech_Recognition 3.7.1 PyAudio 0.2.11 When I run python -m speech_recognitionand speak a few words or many words, the test displayed is either perfect or _almost_ perfect. I later realised by examining the code that is used there, that the Google services are used. Hence the output is very good/accurate. As the requirement is to do this offline, I have tested the sample python script in the /examples path .. audio_transcribe.py The input file is english.wav , but the output is just 'garbage'. Being new to python (but not to programming), I'm currently unable to follow what to change to get the SpeechRecognition package to do this offine (not Google,IBM, Bing,etc). I notice that pip, python2 and python3 are located in ~/.local/bin and ~/.local/lib , as per the problems with installing and not knowing where packages should be installed. Would also prefer to only run python3, but see there are a number of Kubuntu packages that rely on python2. If I run python3 audio_transcribe.pythere are no errors, other than the garbage output. If I run python audio_transcribe.py, there is an error message Quote:Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly. RE: Offline audio to text (Speech Recognition) - hbknjr - Dec-04-2017 Check Out: CMUSphinx Pocketsphinx works offline but may not be as good as google STT. RE: Offline audio to text (Speech Recognition) - jehoshua - Dec-04-2017 (Dec-04-2017, 11:07 AM)hbknjr Wrote: Check Out: CMUSphinx Thanks, my understanding at present is that CMUSphinx == Pocketsphinx , however I will do some more research. RE: Offline audio to text (Speech Recognition) - snippsat - Dec-04-2017 (Dec-04-2017, 10:34 AM)jehoshua Wrote: Would also prefer to only run python3, but see there are a number of Kubuntu packages that rely on python2.You can look at Linux Python 3 environment. RE: Offline audio to text (Speech Recognition) - jehoshua - Dec-05-2017 (Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment. Great, thanks. Looks like an easy to follow guide, and I like the idea of setting up seperate environments. RE: Offline audio to text (Speech Recognition) - jehoshua - Dec-06-2017 (Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment. Is pip a reflection of python dependancies ? I ran the following pip list Quote:DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the "list" section) to disable this warning. pip3 list Quote:DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the "list" section) to disable this warning. pip and pip3 are both showing as version 9.0.1 The small script that I'm using is based on the example script from https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py #!/usr/bin/env python3 import speech_recognition as sr # obtain path to "english.wav" in the same folder as this script from os import path AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac") # use the audio file as the audio source r = sr.Recognizer() with sr.AudioFile(AUDIO_FILE) as source: audio = r.record(source) # read the entire audio file # recognize speech using Sphinx try: print("Sphinx thinks you said " + r.recognize_sphinx(audio)) except sr.UnknownValueError: print("Sphinx could not understand audio") except sr.RequestError as e: print("Sphinx error; {0}".format(e))and most of it is inaccurate, some words are okay though. I was advised to try and break things up into sentences. This can be done by repeatedly calling r.listen instead of r.record . So, 'r.record' in the above code reads the entire file and then does the sphinx processing. I need to change the script to do the r.listen .Have done a bit of searching and realise it is dome within a 'while' loop, but just can't find the exact code. Psuedo code would be something like .. Quote:Set the audio file path RE: Offline audio to text (Speech Recognition) - jehoshua - Jan-16-2018 (Dec-04-2017, 11:04 PM)snippsat Wrote: You can look at Linux Python 3 environment. Have recently setup a 'bare bones' laptop and use it as a test web server. There are 2 "deepspeech-server" packages that I wish to setup/test and evaluate, so the Python 3 environment seems ideal for that. The packages are https://github.com/MainRo/deepspeech-server and https://github.com/ashwan1/django-deepspeech-server The test laptop has Kubuntu 17.10.1 , python 2.7.14 and python 3.6.3 I assume that these packages will use python 3.6.3 as a default, or is there something I need to specify to force the use of python 3.6.3 only ? RE: Offline audio to text (Speech Recognition) - Larz60+ - Jan-16-2018 pip listshows all of the packages loaded for a (your) specific version of python. To find out which one, type: pip -V RE: Offline audio to text (Speech Recognition) - wavic - Jan-16-2018 Reading the part of the file is easy but what happens if the chunk ends in the middle of a word? You have to determine somehow where to cut. Fast googling gave me this: https://stackoverflow.com/questions/36458214/split-speech-audio-file-on-words-in-python https://github.com/antiboredom/audiogrep I hope it will give you some guidance. I found this in pypi: https://pypi.python.org/pypi/SpeechRecognition/ It has can do it offline too. Give it a try RE: Offline audio to text (Speech Recognition) - jehoshua - Jan-16-2018 Have installed the following packages after reading docs on here and elsewhere. python3-venv python3-pip Then as these test servers will be accessed from another computer on the LAN, I assumed the $HOME/Public path was going to be the place to setup the 2 virtual environments. So ran this python3 -m venv /home/********/Public/Servers/django-deepspeech-serverlooked in that path and it all looks okay. But really, I would have no idea...lol (Jan-16-2018, 03:16 AM)Larz60+ Wrote:pip listshows all of the packages loaded for a (your) specific version of python. pip3 listDEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the
|