Python Forum
Help with regular expresions-SOLVED
Thread Rating:
  • 4 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with regular expresions-SOLVED
#1
I already posted this on Stack Overflow

I have emails which have following in email body:

1 email:

Event demon log entry:

***

2.email:

Event demon log entry:

***

3 email:

Event demon log entry:

***

These emails have attachments which also have job names, i want to get job name for every email once

for emailid in items:
    resp, data = conn.uid("fetch",emailid, "(RFC822)")
    if resp == 'OK':
        email_body = data[0][1].decode('utf-8')
        mail = email.message_from_string(email_body)
        #get all emails with words "PA1" or "PA2" in subject
        if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
                  #search email body for machine name (string after word "MACHINE")
          regex1 = r'(?<!^)MACHINE:\s*(\S+)'

          a=re.findall(regex1 ,email_body)
          print (c)
example of message body from 1st email for MACHINE section:

Quote:MACHINE: =^M
ld***

Email body for 2nd email

Quote:MACHINE: eggs***^M
MACHINE: eggs***<br>^M

The difference is in line break in 1st email body

Current output

Quote:['eggs***', 'eggs***<br>']
['eggs***', 'eggs2***<br>']
['=', '=']

as you can see, i'm getting duplicate jobs and missing job name from 1st email

Desired output

Quote:['eggs***']
['eggs***']
['ld***]

tried
regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)'
but got

['eggs***\r', 'eggs***']
['eggs***\r', 'eggs***']
['\r']
Reply
#2
cross-posted on SO

Please, when you post on multiple sites (cross-posting) let us know - you don't want to waste the time of someone who try to help you and you already got answer elsewhere.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Try something simpler: (?<=MACHINE: )\s*[\w\d]*

With your examples, that regex works. It will drop the line breaks too.
Reply
#4
People from SO helped a lot:


regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)|$'
​machine = re.findall(regex2, email_body)[0]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Thumbs Up Convert ActiveDirectory timestamp into regular one. Arrow (solved) SpongeB0B 2 1,902 Nov-02-2020, 08:34 AM
Last Post: bowlofred
  Using .format() with expresions. Covert 2 1,745 Dec-30-2019, 02:11 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020