[Solved] I'm not getting the good type

slain · (This post was last modified: Apr-11-2024, 01:36 PM by slain.)

(Apr-11-2024, 01:06 PM)Gribouillis Wrote: Don't manipulate the garbage collector, it is completely useless for what you are doing.

OK, I didn't see any difference with or without garbage collector, so I removed it.

(Apr-11-2024, 01:06 PM)Gribouillis Wrote: On the other hand yes, calling Path('audio.mp3').unlink(missing_ok=True) after the call to transcriptor() should work.

OK, working with Path("audio1.mp3").unlink(missing_ok=True) in line 61.
This one removes the audio1.mp3, and on line 62 Path(temp_file.name).unlink(missing_ok=True) removes (or seems to) any uploaded audio file (no trace about my m4a file)

Here is my full code, in my opinion it works - maybe it could be optimmized, but I don't know enough about Python to say that :

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Upload an audio file, if needed it will be converted to mp3                #
# Then it will be transcripted using bofenghuang/whisper-small-cv11-french   #
# Finally it will give you possibility to download transcription             #
#    in .txt format                                                          #
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

## Imports ##
from pathlib import Path
import streamlit as st
import torch

from tempfile import NamedTemporaryFile
from datasets import load_dataset
from transformers import pipeline
from pydub import AudioSegment

## Initialize environment ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

## Functions ##
def convtomp3(file):
    """Convert an audio file from m3a, wav or wma to mp3"""
    match Path(file.name).suffix:
        case ".m4a":
            wav_audio = AudioSegment.from_file(file, format="m4a")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case ".wav":
            wav_audio = AudioSegment.from_file(file, format="wav")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case ".wma":
            wav_audio = AudioSegment.from_file(file, format="wma")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case _:
            raise ValueError(f"Fichier invalide : {file.name!r}")
    return file # do nothing if no match in the suffix

def transcriptor(file):
    """Transcript from an audio file to a string"""
    if(Path(file.name).suffix != ".mp3"):
        file=convtomp3(file)
    waveform = file.read()
    predicted_sentence  = pipe(waveform, max_new_tokens=225)
    return str(predicted_sentence)

## Display ##
st.title("Convertisseur / Transcripteur")
audio_source = st.sidebar.file_uploader(label = "Fichiers audio uniquement", type=["mp3","m4a","wav","wma"])
if audio_source is not None:
    st.toast("Lancement de la transcription")
    
    with NamedTemporaryFile(suffix=Path(audio_source.name).suffix) as temp_file:
        temp_file.write(audio_source.getvalue())
        temp_file.seek(0)    
        cpte_rendu = transcriptor(temp_file)
        Path("audio1.mp3").unlink(missing_ok=True)
        Path(temp_file.name).unlink(missing_ok=True)
    st.write("Transcription : ")
    st.write(cpte_rendu)
    st.sidebar.download_button(label = "Télécharger le compte-rendu", data = cpte_rendu, file_name = "cr.txt", mime = "text/plain")
    torch.cuda.empty_cache()

Not sure about the future, maybe I'll try with a copy of my script and directly instruct cpte_rendu = trancriptor(audio_source.getvalue())
For my use case, this question is solved.
Thank you for your patience and your help Smile

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] Good way to handle input args?	Winfried	2	2,122	May-18-2021, 07:33 PM Last Post: Winfried
	Type hinting - return type based on parameter	micseydel	2	2,545	Jan-14-2020, 01:20 AM Last Post: micseydel

[Solved] I'm not getting the good type

User Panel Messages

Announcements