Python Forum
Real Time Audio Processing with Python Sound-Device not working
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Real Time Audio Processing with Python Sound-Device not working
#1
Hello, Python community!

I would like to record audio from my microphone and transcribe it in (almost) real-time via a speech-to-text API. The STT API available to me is VoxSigma by Vocapia. It allows me to send a .wav file with the speech recording and to receive an XML-file with the transcript within seconds:

def vocapia(wavfile):
    cmd = "curl -ksS -u password REST-URL -F method=vrbs_trans -F " \
          "model=eng -F audiofile=@" + wavfile + " > ../resources/static/XMLs/dynamic_recording.xml"
    os.system(cmd)
    try:
        # parse xml document to retrieve transcript
        mydoc = minidom.parse("../resources/static/XMLs/dynamic_recording.xml")
        words = mydoc.getElementsByTagName('Word')
        sentence = ""
        for elem in words:
            sentence = sentence + elem.firstChild.data[1:]
        return sentence

    # catch "xml.parsers.expat.ExpatError: no element found: line 1, column 0" 
    # -> empty xml means nothing was transcribed
    except ExpatError:
        print("Nothing was transcribed yet. Resuming...")
        return ""
The problem is that my approach with pythons sound-device constantly writes to a .wav file and only seems to save it after the recording has finished. Because of this, I seemingly do not have access to the recordings in real-time. If I send the .wav file to Vocapia while recording, nothing is transcribed (.wav file is empty). Here is the code that handles the recording:

# Thread Function for parallel STT
def thread_func(text):
    
    while(True):
        
        time.sleep(5)  # give time for some dialogue to happen
        new_text = text + vocapia("recording.wav")
        print("new_text")

# sound-device file and queue
wav_file = "recording.wav"
q = queue.Queue()

def callback(indata, frames, time, status):
    if status:
        print(status, file=sys.stderr)
    q.put(indata.copy())

try:
    # delete any prior recordings
    os.remove(wav_file)

    device_info = sd.query_devices(0, 'input')
    # soundfile expects an int, sounddevice provides a float:
    samplerate = int(device_info['default_samplerate'])

    # start thread that handles the stt by sending wav_file to vocapia
    th = threading.Thread(target=thread_func, args=(session_text,))
    th.start()

    with sf.SoundFile(wav_file, mode='x', samplerate=samplerate, channels=1) as file:
        with sd.InputStream(samplerate=samplerate, callback=callback, channels=1):
            print('Started recording. Press Ctrl+C to stop the recording.')
            while True:
                file.write(q.get())

except KeyboardInterrupt:

    print('\nRecording finished.
Running this code I will simply receive and catch the "xml.parsers.expat.ExpatError: no element found: line 1, column 0" error every five seconds. As soon as I stop the recording loop though, I can run vocapia(wav_file) and get the full transcript.

Any ideas on what I could do to make it work properly?

Thanks in advance for any suggestions!
Reply
#2
sending to a wav. file is very inefficient.
there are a bunch of packages available https://pypi.org/project/SpeechRecognition/ (I've used but a few, so won't recommend one over the other (except to say that SpeechRecognition https://pypi.org/project/SpeechRecognition/ seems popular)
Reply
#3
(Mar-13-2021, 04:53 PM)Larz60+ Wrote: sending to a wav. file is very inefficient.
there are a bunch of packages available https://pypi.org/project/SpeechRecognition/ (I've used but a few, so won't recommend one over the other (except to say that SpeechRecognition https://pypi.org/project/SpeechRecognition/ seems popular)

Okay, thank you. I looked into the module and it seems promising. Unfortunately, though, there does not seem to be a way for Real Time STT. I need a service that is able to provide me with a live or almost live transcript (like an updated transcript every 5 seconds). Do you have any ideas on that?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  error python audio codiac 3 4,038 Mar-30-2023, 03:12 PM
Last Post: deanhystad
  Non-blocking real-time plotting slow_rider 5 3,457 Jan-07-2023, 09:47 PM
Last Post: woooee
  How to change UTC time to local time in Python DataFrame? SamKnight 2 1,528 Jul-28-2022, 08:23 AM
Last Post: Pedroski55
  python audio analysis kiyoshi7 3 1,716 Feb-22-2022, 06:09 PM
Last Post: Axel_Erfurt
  Real time database satyanarayana 3 1,623 Feb-16-2022, 01:37 PM
Last Post: buran
  Real time data satyanarayana 3 20,350 Feb-16-2022, 07:46 AM
Last Post: satyanarayana
  Real time Detection and Display Gilush 0 1,764 Feb-05-2022, 08:28 PM
Last Post: Gilush
  mysql.connector.errors.ProgrammingError: Failed processing format-parameters; Python ilknurg 3 5,467 Jan-18-2022, 06:25 PM
Last Post: ilknurg
  Real-Time output of server script on a client script. throwaway34 2 2,011 Oct-03-2021, 09:37 AM
Last Post: ibreeden
  Sending string commands from Python to a bluetooth device Rovelin 13 9,283 Aug-31-2021, 06:40 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020