![]() |
Real Time Audio Processing with Python Sound-Device not working - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Real Time Audio Processing with Python Sound-Device not working (/thread-32885.html) |
Real Time Audio Processing with Python Sound-Device not working - Slartybartfast - Mar-13-2021 Hello, Python community! I would like to record audio from my microphone and transcribe it in (almost) real-time via a speech-to-text API. The STT API available to me is VoxSigma by Vocapia. It allows me to send a .wav file with the speech recording and to receive an XML-file with the transcript within seconds: def vocapia(wavfile): cmd = "curl -ksS -u password REST-URL -F method=vrbs_trans -F " \ "model=eng -F audiofile=@" + wavfile + " > ../resources/static/XMLs/dynamic_recording.xml" os.system(cmd) try: # parse xml document to retrieve transcript mydoc = minidom.parse("../resources/static/XMLs/dynamic_recording.xml") words = mydoc.getElementsByTagName('Word') sentence = "" for elem in words: sentence = sentence + elem.firstChild.data[1:] return sentence # catch "xml.parsers.expat.ExpatError: no element found: line 1, column 0" # -> empty xml means nothing was transcribed except ExpatError: print("Nothing was transcribed yet. Resuming...") return ""The problem is that my approach with pythons sound-device constantly writes to a .wav file and only seems to save it after the recording has finished. Because of this, I seemingly do not have access to the recordings in real-time. If I send the .wav file to Vocapia while recording, nothing is transcribed (.wav file is empty). Here is the code that handles the recording: # Thread Function for parallel STT def thread_func(text): while(True): time.sleep(5) # give time for some dialogue to happen new_text = text + vocapia("recording.wav") print("new_text") # sound-device file and queue wav_file = "recording.wav" q = queue.Queue() def callback(indata, frames, time, status): if status: print(status, file=sys.stderr) q.put(indata.copy()) try: # delete any prior recordings os.remove(wav_file) device_info = sd.query_devices(0, 'input') # soundfile expects an int, sounddevice provides a float: samplerate = int(device_info['default_samplerate']) # start thread that handles the stt by sending wav_file to vocapia th = threading.Thread(target=thread_func, args=(session_text,)) th.start() with sf.SoundFile(wav_file, mode='x', samplerate=samplerate, channels=1) as file: with sd.InputStream(samplerate=samplerate, callback=callback, channels=1): print('Started recording. Press Ctrl+C to stop the recording.') while True: file.write(q.get()) except KeyboardInterrupt: print('\nRecording finished.Running this code I will simply receive and catch the "xml.parsers.expat.ExpatError: no element found: line 1, column 0" error every five seconds. As soon as I stop the recording loop though, I can run vocapia(wav_file) and get the full transcript.Any ideas on what I could do to make it work properly? Thanks in advance for any suggestions! RE: Real Time Audio Processing with Python Sound-Device not working - Larz60+ - Mar-13-2021 sending to a wav. file is very inefficient. there are a bunch of packages available https://pypi.org/project/SpeechRecognition/ (I've used but a few, so won't recommend one over the other (except to say that SpeechRecognition https://pypi.org/project/SpeechRecognition/ seems popular) RE: Real Time Audio Processing with Python Sound-Device not working - Slartybartfast - Mar-14-2021 (Mar-13-2021, 04:53 PM)Larz60+ Wrote: sending to a wav. file is very inefficient. Okay, thank you. I looked into the module and it seems promising. Unfortunately, though, there does not seem to be a way for Real Time STT. I need a service that is able to provide me with a live or almost live transcript (like an updated transcript every 5 seconds). Do you have any ideas on that? |