Python Forum
Python Library for Reading POP Emails?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python Library for Reading POP Emails?
#1
Trying to determine if the poplib Python library is the appropriate one that most Python developers use for reading emails sitting out on a POP3 server?

I've done lots of Python coding, but until now, I've mainly coding for scraping websites. So, reading and parsing through emails using Python is a new one for me.

If it helps, I won't need to *send* any emails using my Python script... just *read* them and parse thru them.

The main difficulty I'm having is that the *body* content that I'm getting is returning all of the HTML and other "non-text" code, whereas I just want the actual text that's in each email's body area.

Here's my very quickly assembled test Python script, which DOES work, but each email body that's returned isn't just the text in the body, but all of the line break characters and everything else HTML in nature surrounding just the plain text I want to capture from the body of each email.

Thanks in advance for any suggestions/help!

# import python poplib module
import poplib

# import time library
import time

# import parse email action required python parser module
from email.parser import Parser
from email.header import decode_header
from email.utils import parseaddr

# The Subject of the message or the name contained in the Email is encoded string
# , which must decode for it to display properly, this function just provide the feature.
def decode_str(s):
    value, charset = decode_header(s)[0]
    if charset:
       value = value.decode(charset)
    return value

# check email content string encoding charset.
def guess_charset(msg):
    # get charset from message object.
    charset = msg.get_charset()
    # if can not get charset
    if charset is None:
       # get message header content-type value and retrieve the charset from the value.
       content_type = msg.get('Content-Type', '').lower()
       pos = content_type.find('charset=')
       if pos >= 0:
          charset = content_type[pos + 8:].strip()
    return charset

# variable indent_number is used to decide number of indent of each level in the mail multiple bory part.
def print_info(msg, indent_number=0):
    if indent_number == 0:
       # loop to retrieve from, to, subject from email header.
       for header in ['From', 'To', 'Subject']:
           # get header value
           value = msg.get(header, '')
           if value:
              # for subject header.
              if header=='Subject':
                 # decode the subject value
                 value = decode_str(value)
              # for from and to header. 
              else:
                 # parse email address
                 hdr, addr = parseaddr(value)
                 # decode the name value.
                 name = decode_str(hdr)
                 value = u'%s <%s>' % (name, addr)
           print('%s%s: %s' % (' ' * indent_number, header, value))
    # if message has multiple part. 
    if (msg.is_multipart()):
       # get multiple parts from message body.
       parts = msg.get_payload()
       # loop for each part
       for n, part in enumerate(parts):
           print('%spart %s' % (' ' * indent_number, n))
           print('%s--------------------' % (' ' * indent_number))
           # print multiple part information by invoke print_info function recursively.
           print_info(part, indent_number + 1)
    # if not multiple part. 
    else:
        # get message content mime type
        content_type = msg.get_content_type() 
        # if plain text or html content type.
        if content_type=='text/plain' or content_type=='text/html':
           # get email content
           content = msg.get_payload(decode=True)
           # get content string charset
           charset = guess_charset(msg)
           # decode the content with charset if provided.
           if charset:
              content = content.decode(charset)
           print('%sText: %s' % (' ' * indent_number, content + '...'))
        else:
           print('%sAttachment: %s' % (' ' * indent_number, content_type))

# input email address, password and pop3 server domain or ip address
email = input('Email: ')
username = input('Username: ')
password = input('Password: ')
pop3_server = input('POP3 server: ')

# connect to pop3 server:
server = poplib.POP3(pop3_server)
# open debug switch to print debug information between client and pop3 server.
server.set_debuglevel(1)
# get pop3 server welcome message.
pop3_server_welcome_msg = server.getwelcome().decode('utf-8')
# print out the pop3 server welcome message.
print(server.getwelcome().decode('utf-8'))

# user account authentication
server.user(username)
server.pass_(password)

# stat() function return email count and occupied disk size
print('Messages: %s. Size: %s' % server.stat())
# list() function return all email list
resp, mails, octets = server.list()
print(mails)

# retrieve the newest email index number
index = len(mails)
# server.retr function can get the contents of the email with index variable value index number.
resp, lines, octets = server.retr(index)

# lines stores each line of the original text of the message
# so that you can get the original text of the entire message use the join function and lines variable. 
msg_content = b'\r\n'.join(lines).decode('utf-8')
# now parse out the email object.
msg = Parser().parsestr(msg_content)

# get email from, to, subject attribute value.
email_from = msg.get('From')
email_to = msg.get('To')
email_subject = msg.get('Subject')
print('From ' + email_from)
print('To ' + email_to)
print('Subject ' + email_subject)

# New by Brad that might get just what we want from the first email in terms of body text w/o HTML
for part in msg.walk():
    if part.get_content_type():
        body = part.get_payload(decode=True)
        print_info(msg, len(msg))
print ("Waiting...")
time.sleep(30)

# Another test section by Brad
print (msg.get_payload())
time.sleep(30)



# Test section by Brad to see if I can print the body of the first email that's found
numMessages = len(server.list()[1])
for i in range(numMessages):
    for j in server.retr(i+1)[1]:
        print(j)

# delete the email from pop3 server directly by email index.
# server.dele(index)
# close pop3 server connection.
server.quit()
Reply
#2
I'm unclear on what your question is, exactly. It would help me to help you if you could be a little more organized, e.g. if you're just having trouble parsing the HTML, please provide a sample string and the result you want to get from it, along with that code attempt and an English explanation of how what you have and what you want are different.

If you can ask your question with 5-10 lines of code, excluding the "input" definition (which in your case I can see being a string, or a class) that would help a lot too, I don't tend to read anything more than 30 lines of code or so on this site unless the question clearly can't be asked in fewer lines. It seems like your question isn't really about POP, so unless I'm mistaken you'll want to remove all that code from what you post here.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [Solved]Help Displaying Emails properly via Python Extra 5 1,125 Sep-28-2022, 09:28 PM
Last Post: deanhystad
  Sending Emails on Autopilot Gyga_Hawk 3 1,626 Mar-15-2022, 08:20 AM
Last Post: Larz60+
  Mark outlook emails as read using Python! shane88 2 6,449 Feb-24-2022, 11:19 PM
Last Post: Pedroski55
  Trying out the parsing/reading of emails from my outlook cubangt 0 6,051 Jan-12-2022, 08:59 PM
Last Post: cubangt
  reading shared outlook emails zarize 0 2,421 Mar-03-2020, 01:47 PM
Last Post: zarize
  Read in trades from emails semantina 2 2,063 Nov-06-2019, 06:12 PM
Last Post: semantina
  No output for the code to read emails avani9659 6 4,145 Aug-14-2018, 08:30 AM
Last Post: avani9659
  Put specific emails in an Excel/.csv CaptainBlueballs 0 2,372 Feb-25-2018, 10:32 AM
Last Post: CaptainBlueballs
  PyInstaller, how to create library folder instead of library.zip file ? harun2525 2 4,740 May-06-2017, 11:29 AM
Last Post: harun2525

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020