Python Forum
Extract PDF Attachment from Gmail - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Extract PDF Attachment from Gmail (/thread-40705.html)



Extract PDF Attachment from Gmail - jstaffon - Sep-10-2023

I would like to have a python script open my Gmail inbox, read each unread email, extract the PDF attachments if they exist and save the attachment. My code below was published on a couple forums and websites. It kind of works in that it opens my email account, only processes new emails and then changes their status to READ. The code currently doesn’t find the attachment. The code returns “text/plain” instead of “application/pdf”. I’ve uncommented the “part.get_content_type() == ‘text/plain’” line and it falls through that “if” statement code like it should but doesn’t match the “application/pdf” line when it is uncommented. The pdf file is a simple one line file that was converted to PDF using the EXPORT feature in Google Drive. I’ve also used pdf files that were created by a company that sends out pdf invoices with the same results. I’ve not installed and used the PyPDF2 library because I’m told that the standard Python libraries should work. I’m thinking it’s related to encryption. Any help is much appreciated. Thanks in advance!

import imaplib
import email
import regex
import re

user = 'user'
password = 'kfkgxpzir'

server = imaplib.IMAP4_SSL('imap.gmail.com')
server.login(user, password)
server.select('inbox')

msg_ids=[]
resp, messages = server.search(None, 'UNSEEN')
for message in messages[0].split():
        typ, data = server.fetch(message, '(RFC822)')
        msg= email.message_from_string(str(data[0][1]))
        #looking for 'Content-Type: application/pdf
        for part in msg.walk():
           #if part.get_content_type() == 'text/plain':
           if part.get_content_type() == 'application/pdf':
              print("Found pdf")
              payload = part.get_payload(decode=True)

              filename = part.get_filename()
              print(filename)

              # Save the file.
              if payload and filename:
                  with open(filename, 'wb') as f:
                      f.write(payload)