Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
URL ReGex missing out URL
#1
Hi Guys,

I cannot figure this issue out, i'm downloading emails via pop3:

code:

def pop3_downloader(username, password, pop3server, port, use_ssl):
    try:
        server = ''
        if use_ssl == "no":
            server = poplib.POP3(pop3server, port)
        elif use_ssl == "yes":
            server = poplib.POP3_SSL(pop3server, port)
        else:
            pass

        server.user(username)
        server.pass_(password)
        numMessages = len(server.list()[1])

        print("--> # Of Messages: " + str(numMessages))

        email_container = []
        for i in range(numMessages) :
            (server_msg, body, octets) = server.retr(i+1)
            for j in body:
                try:
                    msg = email.message_from_string(j.decode("utf-8"))
                    email_body = msg.get_payload()
                    email_extract_urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', email_body)
                    if len(email_extract_urls) > 0:
                        activation_links = "/activate/|registration.activate&token="
                        #if any(s in email_extract_urls for s in activation_links.split('|')):
                        email_container.append(email_extract_urls)
                except:
                    pass
            #server.dele(i+1)
        server.quit()
        return email_container

    except Exception as e: 
        print_exception()
Which is working, a few emails contain:

https://www.site1.com/
https://www.site1.com/wp-login.php?wfls-email-verification=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyIjp7ImRhdGEiOiJCdlZZNHVWXC9Id2tQYStjR2RGOXRYUT09IiwiaXYiOiJmeDVlY1wvbXBib0I0M1VkMlcrb09EUT09In0sIl9leHAiOjE1NjQyNTM5MTV9.ZOpt4jXq5NHdecYygh0EnX5G5v8EMkSMuM2zhuPExmg
In that order, these are extracted fine, some emails contain:

http://site2.com/
[b]http://site2.com/index.php?option=com_users&task=registration.activate&token=xxxxxxxxxxxxxxxxxx[/b] <-- not being extracted
In this case above it will only extract the first url, the one in bold is always missed out, i cannot see why.

any help would be appreciated!

regards

Graham
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  edit text files/ add lines if missing (regex) wardancer84 3 2,802 Nov-08-2018, 02:47 PM
Last Post: wardancer84

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020