Jan-07-2021, 07:36 PM
Hello and happy new year!
I working with Vosk speech-to-text engine that already works but i want to improve the microphine input.
My thoughts are around:
- denoise stream
- gain stream
I've read much exaples but most one are just for wav files and not for microphone streaming
My code Looks lije this:
Have you better ideas for right modules and then yes can you explain how i should work with the stream?
I working with Vosk speech-to-text engine that already works but i want to improve the microphine input.
My thoughts are around:
- denoise stream
- gain stream
I've read much exaples but most one are just for wav files and not for microphone streaming
My code Looks lije this:
#!/usr/bin/env python3 import json import os import pyaudio from vosk import Model, KaldiRecognizer lang = en-US DEBUG = True class Vosk: def __init__(self, language): if not os.path.exists("models/vosk/" + language): print("Please download the model from https://alphacephei.com/vosk/models and unpack as " "'" + language + " ' in 'models/vosk'.") exit(1) model = Model("models/vosk/" + language) self.rec = KaldiRecognizer(model, 16000) p = pyaudio.PyAudio() self.stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000) self.stream.start_stream() def run(self): print("Listening...") while True: data = self.stream.read(4000) if len(data) == 0: break if self.rec.AcceptWaveform(data): res = json.loads(self.rec.FinalResult()) if DEBUG: print("Text:", res['text']) res = json.loads(self.rec.Result()) if DEBUG: print(res['text']) res = json.loads(self.rec.FinalResult()) if DEBUG: print("Listened: " + res['text']) return res['text'] if __name__ == '__main__': stt = Vosk(lang) print(stt.run())I think right modules are audioop or Pydub but most examples are for WAV files...
Have you better ideas for right modules and then yes can you explain how i should work with the stream?