Python Forum - [Solved] I'm not getting the good type

Pages: 1 2

Hello,

What I want to do :
The user uploads an audio record (taken from windows 10/11's audio recorder, or recorded during a Teams meeting, or recorded with a smartphone or whatever) => the script converts it if needed to mp3 (not need for an incredible audio quality, just enough to recognize words) => then it transcripts this audio record to text => finally it displays the transcripted text and gives the possibility to the user to download it as a .txt file.

What I currently have : a working script which is able to transcript from a wav or mp3 (not m4a) and then gives a download button with Streamlit.

My working script :

import torch

from tempfile import NamedTemporaryFile
from datasets import load_dataset
from transformers import pipeline

## Initialize environment ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

## Display ##
st.title("Télécharger un enregistrement de réunion pour obtenir sa transcription en texte")
col1, col2 = st.columns(2)
audio_source=st.sidebar.file_uploader(label="Choisir votre fichier", type=["wav","mp3"])

## Variables ##
suffix = ""
predicted_sentence = ""

## Processing ##
if audio_source is not None:
    waveform = audio_source.getvalue()
    predicted_sentence  = pipe(waveform, max_new_tokens=225)
    col1.write("Transcription :point_right:")
    col2.write(predicted_sentence)
    result = str(predicted_sentence)
    col1.download_button(label="Télécharger la transcription", data=result, file_name="transcript.txt",mime="text/plain")

My non-working code :

## Imports ##
from pathlib import Path
import streamlit as st
import torch

from tempfile import NamedTemporaryFile
from datasets import load_dataset
from transformers import pipeline
from pydub import AudioSegment

## Initialize environment ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

## Functions ##
def convtomp3(file):
    """Convert an audio file from m3a, wav or wma to mp3"""
    match Path(file.name).suffix:
        case "m4a":
            wav_audio = AudioSegment.from_file(file, format="m4a")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case "wav":
            wav_audio = AudioSegment.from_file(file, format="wav")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case "wma":
            wav_audio = AudioSegment.from_file(file, format="wma")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result

def transcriptor(file):
    """Transcript from an audio file to a string"""
    if(Path(file.name).suffix != "mp3"):
        file=convtomp3(file)
    waveform = file.getvalue()
    predicted_sentence  = pipe(waveform, max_new_tokens=225)
    return str(predicted_sentence)

## Display ##
st.title("Convertisseur / Transcripteur")
audio_source = st.sidebar.file_uploader(label = "Fichiers audio uniquement", type=["mp3","m4a","wav","wma"])
if audio_source is not None:
    st.toast("Lancement de la transcription")
    
    with NamedTemporaryFile(suffix=Path(audio_source.name).suffix) as temp_file:
        temp_file.write(audio_source.getvalue())
        temp_file.seek(0)    
        cpte_rendu = transcriptor(temp_file)
        st.write("Transcription : ")
        st.write(cpte_rendu)
        st.sidebar.download_button(label = "Télécharger le compte-rendu", data = cpte_rendu, file_name = "cr.txt", mime = "text/plain")

My error message :

Error:AttributeError: 'NoneType' object has no attribute 'getvalue'
Traceback:
File "/home/ild/.local/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
    exec(code, module.__dict__)
File "/home/ild/conv-cripteur.py", line 57, in <module>
    cpte_rendu = transcriptor(temp_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ild/conv-cripteur.py", line 44, in transcriptor
    waveform = file.getvalue()
               ^^^^^^^^^^^^^

I'm getting the same error with an m4a and an mp3 file.
My understanding is that I do not use the good call at this line :

        cpte_rendu = transcriptor(temp_file)

But I don't know enough Python to fix what I missed.
Is that I need to import something more?
Is that "just" that I have to handle one more step before calling my transcriptor function?

Thank you for your help :)

If the filename's suffix is not one of m4a, wav, wma, the function convtomp3() will return None, this is probably what happens in your case.

(Apr-10-2024, 08:48 AM)Gribouillis Wrote: [ -> ]If the filename's suffix is not one of m4a, wav, wma, the function convtomp3() will return None, this is probably what happens in your case.

Hello and thank you for your answer.

I tried adding return file after the match pattern in the function.
I thought it would not work because the suffix is limited once by the file_uploader button (only mp3, m4a, wav, wma accepted) and because I call the convtomp3() only if the suffix is not mp3.

So here is the modified code (this type I let the docstrings, otherwise the line numbers won't match):

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
# Upload an audio file, if needed it will be converted to mp3                #
# Then it will be transcripted using bofenghuang/whisper-small-cv11-french   #
# Finally it will give you possibility to download transcription             #
#    in .txt format                                                          #
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

## Imports ##
from pathlib import Path
import streamlit as st
import torch

from tempfile import NamedTemporaryFile
from datasets import load_dataset
from transformers import pipeline
from pydub import AudioSegment

## Initialize environment ##
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = pipeline("automatic-speech-recognition", model="bofenghuang/whisper-small-cv11-french", device=device)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

## Functions ##
def convtomp3(file):
    """Convert an audio file from m3a, wav or wma to mp3"""
    match Path(file.name).suffix:
        case "m4a":
            wav_audio = AudioSegment.from_file(file, format="m4a")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case "wav":
            wav_audio = AudioSegment.from_file(file, format="wav")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
        case "wma":
            wav_audio = AudioSegment.from_file(file, format="wma")
            result = wav_audio.export("audio1.mp3", format="mp3")
            return result
    return file # do othing if no match in the suffix

def transcriptor(file):
    """Transcript from an audio file to a string"""
    if(Path(file.name).suffix != "mp3"):
        file=convtomp3(file)
    waveform = file.getvalue()
    predicted_sentence  = pipe(waveform, max_new_tokens=225)
    return str(predicted_sentence)

## Display ##
st.title("Convertisseur / Transcripteur")
audio_source = st.sidebar.file_uploader(label = "Fichiers audio uniquement", type=["mp3","m4a","wav","wma"])
if audio_source is not None:
    st.toast("Lancement de la transcription")
    
    with NamedTemporaryFile(suffix=Path(audio_source.name).suffix) as temp_file:
        temp_file.write(audio_source.getvalue())
        temp_file.seek(0)    
        cpte_rendu = transcriptor(temp_file)
        st.write("Transcription : ")
        st.write(cpte_rendu)
        st.sidebar.download_button(label = "Télécharger le compte-rendu", data = cpte_rendu, file_name = "cr.txt", mime = "text/plain")

And ... here is the new error :

AttributeError: '_io.BufferedRandom' object has no attribute 'getvalue'
Traceback:
File "/home/ild/.local/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
    exec(code, module.__dict__)
File "/home/ild/conv-cripteur.py", line 58, in <module>
    cpte_rendu = transcriptor(temp_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ild/conv-cripteur.py", line 45, in transcriptor
    waveform = file.getvalue()
               ^^^^^^^^^^^^^
File "/home/ild/miniconda3/lib/python3.12/tempfile.py", line 494, in __getattr__
    a = getattr(file, name)
        ^^^^^^^^^^^^^^^^^^^

For me, the biggest problem of my code is in the function transcriptor (more precisely at line 45): waveform = file.getvalue()
But I can't see where I made (a) mistake(s) :(
Might this be because I should type if(Path(file.name).suffix != ".mp3"): (the . would be important), or because I have to convert from audio_source to ?? (I don't know what)

Thank you

(Apr-10-2024, 11:49 AM)slain Wrote: [ -> ]Might this be because I should type if(Path(file.name).suffix != ".mp3"): (the . would be important)

That's probably the reason

>>> from pathlib import Path
>>> Path('foo.mp3').suffix
'.mp3'
>>>

The case statements probably need modification as well.
Instead of return file you could write raise ValueError(f"Invalid file nane {file.name!r}")

I added the . at the beginning of each suffix, il will be cleaner like that.

It still throws me this error:

AttributeError: '_io.BufferedRandom' object has no attribute 'getvalue'
Traceback:
File "/home/ild/.local/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
    exec(code, module.__dict__)
File "/home/ild/conv-cripteur.py", line 58, in <module>
    cpte_rendu = transcriptor(temp_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ild/conv-cripteur.py", line 45, in transcriptor
    waveform = file.getvalue()
               ^^^^^^^^^^^^^
File "/home/ild/miniconda3/lib/python3.12/tempfile.py", line 494, in __getattr__
    a = getattr(file, name)
        ^^^^^^^^^^^^^^^^^^^

Quote:The case statements probably need modification as well.
Instead of return file you could write raise ValueError(f"Invalid file nane {file.name!r}")

Instead of copy/pasting like a dummy, I would like to understand why this instruction?
If I understand correctly what I googled (English is not my mother tongue, so translation errors are possible), it would throw me an error, but this would stop the script, no?
And, as the error also occurs with an mp3 (so out of the case statements), it wouldn't solve the AttributeError, would it?

(Apr-10-2024, 01:49 PM)slain Wrote: [ -> ]And, as the error also occurs with an mp3 (so out of the case statements), it wouldn't solve the AttributeError, would it?

The problem that we have is to understand why you are having the error. Normally the raise statement should not happen because if your file has a .mp3 suffix, your code does not call the function and if has a .wav, .m4a or .wma suffix, it should be handled by one of the case statements. So the question is why does it occur? Among other advantages, the raise statement tells you for which filename it occurs, that's a fisrt point. It is better to stop the script than to have an abnormal None value returned without understanding why.

Can you post the code as it is now?

I googled on my side and I could resolve a part of my problem.
I found a similar cas on Stack overflow;
With this example, I modified my line 45 : waveform = file.read() (and not getvalue())

Now, I don't have the attributeError anymore, that's a good point.

I tested the script on a mp3 file, it has been able to transcript it.

Then, I tested it with a m4a file, got a big ugly "CUDA out of memory error" Wall

I tried to empty the GPU's memory by killing the streamlit thread (maybe not recommanded, I know), ran streamlit again, and Dance

, it was able to transcript from a m4a file !

I would like to improve this again :
First question, how can I remove all the temporary audio files from the filesystem?

slain@mltest01:~$ ls | find *.mp3
audio1.mp3
tmp6w2pdcgq.mp3
tmpyvgiykmt.mp3
(transcript) slain@mltest01:~$ ls | find *.m4a
tmp9imkhqkg.m4a
tmp9j135bpk.m4a
tmpcj2q64d2.m4a
tmpo1petwo7.m4a
tmptfjlj7et.m4a
tmpuqnee0mf.m4a
tmpzrjoav24.m4a
(transcript) slain@mltest01:~$

Second question, is there a way to "deallocate" graphics memory?
I think it would be really useful just after having transcribed the audio to text, or just after displaying the download_button Think

And then it could allow me to manage another audio file without having to kill/restart streamlit (it's a dirty way to do it)

Do you also see some parts in my code that I could improve to be more "memory-efficient" (my GPU is limited to 4GB and for the moment the NVidia GPUs are awfully expensive here)?

Thanks again :)

Have you tried replacing waveform = file.getvalue() with waveform = temp_file.read()? This reads data from the temp_file file, which has a similar effect to getvalue() in the BytesIO library.

def transcriptor(file):
    """Transcript from an audio file to a string"""
    if(Path(file.name).suffix != "mp3"):
        file=convtomp3(file)
    waveform = file.read()  # Read data from the file
    predicted_sentence  = pipe(waveform, max_new_tokens=225)
    return str(predicted_sentence)

(Apr-10-2024, 02:48 PM)slain Wrote: [ -> ]googled on my side and I could resolve a part of my problem.
I found a similar cas on Stack overflow;
With this example, I modified my line 45 : waveform = file.read() (and not getvalue())

I disapprove the solution that you found in stackoverflow because instead of solving the problem it hides the problem. You still don't know why the file name was not intercepted by the case statements. There is an error in the logic of your program and chose to conceal it instead of resolving it.

For the temporary files, a good solution is to put them all in a temporary directory which is automatically destroyed at the end of a context

>>> from tempfile import TemporaryDirectory, NamedTemporaryFile
>>>
>>> with TemporaryDirectory() as tdir:  # create temporary directory
...     print(tdir)
...     with NamedTemporaryFile(dir=tdir) as f:  # create temporary file inside temporary directory
...         print(f.name)
...     with NamedTemporaryFile(dir=tdir) as f:  # same thing
...         print(f.name)
... 
/tmp/tmpdhu6psra
/tmp/tmpdhu6psra/tmp0b_i1wfp
/tmp/tmpdhu6psra/tmpuvc0lsbc
>>> 
>>> # all the files are gone now

(Apr-10-2024, 02:59 PM)SandraYokum Wrote: [ -> ]Have you tried replacing waveform = file.getvalue() with waveform = temp_file.read()? This reads data from the temp_file file, which has a similar effect to getvalue() in the BytesIO library.
def transcriptor(file):
    """Transcript from an audio file to a string"""
    if(Path(file.name).suffix != "mp3"):
        file=convtomp3(file)
    waveform = file.read()  # Read data from the file
    predicted_sentence  = pipe(waveform, max_new_tokens=225)
    return str(predicted_sentence)

Yes, file.read() is the solution instead of file.getvalue() :)
Now, my script is able to grab the audio file, convert it if its file format is [m4a, wav, wma] and then transcript it.

Pages: 1 2