Python Forum
poplib - parsing message body, could somebody please help explain this code - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: poplib - parsing message body, could somebody please help explain this code (/thread-30207.html)



poplib - parsing message body, could somebody please help explain this code - t4keheart - Oct-12-2020

Hi all, 
We have a python program that works with the ibm watson transcriber to transcribe voicemail and send to appropriate staff. 
It stopped working on 9/22, for seemingly no reason. 

I've tracked down the problem line by printing/logging lines and objects until I found the issue. 
The error I'm getting is that "index is out of range", presumably a list is causing the problem. Specifically, the line where the below function tries to obtain the message UID. 

So basically there's a for loop which takes the number of new messages in the mailbox, and loops through each one to process each email, like this:

for messageNum in range(numberofMessages):
    try:
        processMessage(messagegNum+1)
and this is the function processMessage():
def processMessage(messageNum):
    #assemble message contents
    raw_message = popServer.retr(messageNum)[1]
    str_message = email.message_from_bytes(b'\n'.join(raw_message))
    body = str(str_message.get_payload()[0])

    messageUID = str(popServer.uidl(messageNum))
    messageUID = re.findall('UID\d+-\d+,messageUID,0)[0]   
 


The last line is what I believe is causing the error, but I'm not completely understanding what's going on with the regular expression line, and also why the functions have the [0]'s after them. Forgive me I am still learning. I understand they are lists but I can't exactly figure out what's going wrong here. 

Any input is, of course, appreciated.


RE: poplib - parsing message body, could somebody please help explain this code - bowlofred - Oct-12-2020

You can find the docs for re.findall to see that it returns a list of all the different matches.  [0] gives element zero, or the first element of the list.  If there are no elements, you get an IndexError.

>>> re.findall("b", "bobby") # returns all matches. There are three.
['b', 'b', 'b']
>>> re.findall("b", "bobby")[0] # picks the first of all the matches
'b'
>>> re.findall("x", "bobby")[0]  # tries to return the first of all matches, but with no matches, this is an error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
The messageUID value probably doesn't have the UID string in the expected format.


RE: poplib - parsing message body, could somebody please help explain this code - t4keheart - Oct-12-2020

Thank you for the response... 
This all happened after we migrated the mailbox from one service provider to another, so I'm guessing the format in which the UID string is passed changed. 

When I print the contents of 'messageUID' currently I'm getting:
b' +OK 1 3'
and when I issue popServer.uidl() to get a list of all uid's in the mailbox, I get:
(b' +OK', [b'1 3', b'2 5', b'3 11, b'4 19', ..... b'74 267'], 556)
So I'm confused what the point of 'UID\d+-\d+

is in the last line of my code. I suspect this to be the problem area.
I created a test script to issue the same commands on the old mail server and the return from that re.findall line is like this:

(b' +OK', [b'1 UID38-1726373849', b'2 UID39-28288393', b3 UID40-2883839'], 15746)

So I just need to figure out how to modify that first portion of the last line to accomodate for the change in format. 

Thanks