Microphone stream manipulation

Microphone stream manipulation - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Microphone stream manipulation (/thread-31878.html)

Microphone stream manipulation - Talking2442 - Jan-07-2021

Hello and happy new year!

I working with Vosk speech-to-text engine that already works but i want to improve the microphine input.

My thoughts are around:
- denoise stream
- gain stream

I've read much exaples but most one are just for wav files and not for microphone streaming

My code Looks lije this:

#!/usr/bin/env python3

import json
import os
import pyaudio
from vosk import Model, KaldiRecognizer


lang = en-US
DEBUG = True


class Vosk:
    def __init__(self, language):
        if not os.path.exists("models/vosk/" + language):
            print("Please download the model from https://alphacephei.com/vosk/models and unpack as "
                  "'" + language + " ' in 'models/vosk'.")
            exit(1)

        model = Model("models/vosk/" + language)
        self.rec = KaldiRecognizer(model, 16000)

        p = pyaudio.PyAudio()
        self.stream = p.open(format=pyaudio.paInt16,
                             channels=1,
                             rate=16000,
                             input=True,
                             frames_per_buffer=8000)
        self.stream.start_stream()

    def run(self):
        print("Listening...")
        while True:
            data = self.stream.read(4000)
            if len(data) == 0:
                break
            if self.rec.AcceptWaveform(data):
                res = json.loads(self.rec.FinalResult())
                if DEBUG:
                    print("Text:", res['text'])

                res = json.loads(self.rec.Result())
                if DEBUG:
                    print(res['text'])

        res = json.loads(self.rec.FinalResult())
        if DEBUG:
            print("Listened: " + res['text'])
        return res['text']


if __name__ == '__main__':
    stt = Vosk(lang)
    print(stt.run())

I think right modules are audioop or Pydub but most examples are for WAV files...
Have you better ideas for right modules and then yes can you explain how i should work with the stream?

RE: Microphone stream manipulation - palumanic - Nov-19-2023

Improving microphone input for speech-to-text is a cool project. You're on the right track with denoising and adjusting gain! Streaming might need some real-time processing. Have you checked out https://asmrmicrophones.com/? They've got insights on enhancing microphone quality that could help fine-tune your stream for better speech recognition.