Help with regular expresions-SOLVED - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Help with regular expresions-SOLVED (/thread-15268.html) |
Help with regular expresions-SOLVED - dragan979 - Jan-10-2019 I already posted this on Stack Overflow I have emails which have following in email body: 1 email: Event demon log entry: *** 2.email: Event demon log entry: *** 3 email: Event demon log entry: *** These emails have attachments which also have job names, i want to get job name for every email once for emailid in items: resp, data = conn.uid("fetch",emailid, "(RFC822)") if resp == 'OK': email_body = data[0][1].decode('utf-8') mail = email.message_from_string(email_body) #get all emails with words "PA1" or "PA2" in subject if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0: #search email body for machine name (string after word "MACHINE") regex1 = r'(?<!^)MACHINE:\s*(\S+)' a=re.findall(regex1 ,email_body) print (c)example of message body from 1st email for MACHINE section: Quote:MACHINE: =^M Email body for 2nd email Quote:MACHINE: eggs***^M The difference is in line break in 1st email body Current output Quote:['eggs***', 'eggs***<br>'] as you can see, i'm getting duplicate jobs and missing job name from 1st email Desired output Quote:['eggs***'] tried regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)'but got ['eggs***\r', 'eggs***'] ['eggs***\r', 'eggs***'] ['\r'] RE: Help with regular expresions - buran - Jan-10-2019 cross-posted on SO Please, when you post on multiple sites (cross-posting) let us know - you don't want to waste the time of someone who try to help you and you already got answer elsewhere. RE: Help with regular expresions - stullis - Jan-10-2019 Try something simpler: (?<=MACHINE: )\s*[\w\d]* With your examples, that regex works. It will drop the line breaks too. RE: Help with regular expresions - dragan979 - Jan-10-2019 People from SO helped a lot: regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)|$' ​machine = re.findall(regex2, email_body)[0] |