
Hello,
What I want to do :
The user uploads an audio record (taken from windows 10/11's audio recorder, or recorded during a Teams meeting, or recorded with a smartphone or whatever) => the script converts it if needed to mp3 (not need for an incredible audio quality, just enough to recognize words) => then it transcripts this audio record to text => finally it displays the transcripted text and gives the possibility to the user to download it as a .txt file.
What I currently have : a working script which is able to transcript from a wav or mp3 (not m4a) and then gives a download button with Streamlit.
My working script :
My understanding is that I do not use the good call at this line :
Is that I need to import something more?
Is that "just" that I have to handle one more step before calling my transcriptor function?
Thank you for your help :)
What I want to do :
The user uploads an audio record (taken from windows 10/11's audio recorder, or recorded during a Teams meeting, or recorded with a smartphone or whatever) => the script converts it if needed to mp3 (not need for an incredible audio quality, just enough to recognize words) => then it transcripts this audio record to text => finally it displays the transcripted text and gives the possibility to the user to download it as a .txt file.
What I currently have : a working script which is able to transcript from a wav or mp3 (not m4a) and then gives a download button with Streamlit.
My working script :
import torch from tempfile import NamedTemporaryFile from datasets import load_dataset from transformers import pipeline ## Initialize environment ## device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device) pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe") ## Display ## st.title("Télécharger un enregistrement de réunion pour obtenir sa transcription en texte") col1, col2 = st.columns(2) audio_source=st.sidebar.file_uploader(label="Choisir votre fichier", type=["wav","mp3"]) ## Variables ## suffix = "" predicted_sentence = "" ## Processing ## if audio_source is not None: waveform = audio_source.getvalue() predicted_sentence = pipe(waveform, max_new_tokens=225) col1.write("Transcription :point_right:") col2.write(predicted_sentence) result = str(predicted_sentence) col1.download_button(label="Télécharger la transcription", data=result, file_name="transcript.txt",mime="text/plain")My non-working code :
## Imports ## from pathlib import Path import streamlit as st import torch from tempfile import NamedTemporaryFile from datasets import load_dataset from transformers import pipeline from pydub import AudioSegment ## Initialize environment ## device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device) pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe") ## Functions ## def convtomp3(file): """Convert an audio file from m3a, wav or wma to mp3""" match Path(file.name).suffix: case "m4a": wav_audio = AudioSegment.from_file(file, format="m4a") result = wav_audio.export("audio1.mp3", format="mp3") return result case "wav": wav_audio = AudioSegment.from_file(file, format="wav") result = wav_audio.export("audio1.mp3", format="mp3") return result case "wma": wav_audio = AudioSegment.from_file(file, format="wma") result = wav_audio.export("audio1.mp3", format="mp3") return result def transcriptor(file): """Transcript from an audio file to a string""" if(Path(file.name).suffix != "mp3"): file=convtomp3(file) waveform = file.getvalue() predicted_sentence = pipe(waveform, max_new_tokens=225) return str(predicted_sentence) ## Display ## st.title("Convertisseur / Transcripteur") audio_source = st.sidebar.file_uploader(label = "Fichiers audio uniquement", type=["mp3","m4a","wav","wma"]) if audio_source is not None: st.toast("Lancement de la transcription") with NamedTemporaryFile(suffix=Path(audio_source.name).suffix) as temp_file: temp_file.write(audio_source.getvalue()) temp_file.seek(0) cpte_rendu = transcriptor(temp_file) st.write("Transcription : ") st.write(cpte_rendu) st.sidebar.download_button(label = "Télécharger le compte-rendu", data = cpte_rendu, file_name = "cr.txt", mime = "text/plain")My error message :
Error:AttributeError: 'NoneType' object has no attribute 'getvalue'
Traceback:
File "/home/ild/.local/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
exec(code, module.__dict__)
File "/home/ild/conv-cripteur.py", line 57, in <module>
cpte_rendu = transcriptor(temp_file)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ild/conv-cripteur.py", line 44, in transcriptor
waveform = file.getvalue()
^^^^^^^^^^^^^
I'm getting the same error with an m4a and an mp3 file.My understanding is that I do not use the good call at this line :
cpte_rendu = transcriptor(temp_file)But I don't know enough Python to fix what I missed.
Is that I need to import something more?
Is that "just" that I have to handle one more step before calling my transcriptor function?
Thank you for your help :)