Apr-03-2024, 12:42 PM
Hello,
I'm quite new to Python, and trying to get a working transcriber with Streamlit and HuggingFace models.
I have an hardware constraint, my script must work on a 4GB Nvidia GPU...
Here is the code at this moment:
Does anyone has an idea what I miss?
Thank you
I'm quite new to Python, and trying to get a working transcriber with Streamlit and HuggingFace models.
I have an hardware constraint, my script must work on a 4GB Nvidia GPU...
Here is the code at this moment:
## Imports ## import torch import io import streamlit as st from pathlib import Path from tempfile import NamedTemporaryFile from transformers import AutoModelForCTC, Wav2Vec2ProcessorWithLM import nemo.collections.asr as nemo_asr import torchaudio ## Initialization ## device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model_name = "bofenghuang/stt_fr_fastconformer_hybrid_large" asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name=model_name) ## Display ## st.title("Transcribe audio to text") col1, col2 = st.columns(2) audio_source=st.sidebar.file_uploader(label="Choose file", type=["wav","m4a","mp3","wma"]) ## Variables ## suffix = "" predicted_text = "" ## Processing ## if audio_source is not None: suffix = Path(audio_source.name).suffix col1.write("Starting process") with NamedTemporaryFile(suffix=suffix) as temp_file: temp_file.write(audio_source.getvalue()) temp_file.seek(0) predicted_text = asr_model.transcribe(temp_file) col2.write("Transcribed text :") col2.write(predicted_text) st.sidebar.download_button(label="Download text", data=predicted_text, file_name="transcript.txt",mime="text/plain")Getting this by a temporary file gives me this error :
TypeError: object of type '_TemporaryFileWrapper' has no len()I guess I should convert my audio content to another type, but I don't know to which type I should convert, how I should convert it, and how I'll get the contents with the new type.
Does anyone has an idea what I miss?
Thank you