Help with regular expresions-SOLVED - dragan979 - Jan-10-2019

I have emails which have following in email body:

1 email:

Event demon log entry:

[10/01/2019 08:13:44] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2_batch_excel_quants_fx_daily_vol_check_0800 MACHINE: ldnmdsbatchxl01 EXITCODE: 268438455

Event demon log entry:

[10/01/2019 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p2_credit_qv_curve_snap MACHINE: p2prog06

3 email:

Event demon log entry:

[10/01/2019 08:15:03] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p1_credit_qv_curve_snap MACHINE: p1prog06

These emails have attachments which also have job names, i want to get job name for every email once

for emailid in items:
    resp, data = conn.uid("fetch",emailid, "(RFC822)")
    if resp == 'OK':
        email_body = data[0][1].decode('utf-8')
        mail = email.message_from_string(email_body)
        #get all emails with words "PA1" or "PA2" in subject
        if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
                  #search email body for machine name (string after word "MACHINE")
          regex1 = r'(?<!^)MACHINE:\s*(\S+)'

          a=re.findall(regex1 ,email_body)
          print (c)
example of message body from 1st email for MACHINE section:

Quote:MACHINE: =^M

Email body for 2nd email

Quote:MACHINE: p2prog06^M
MACHINE: p2prog06<br>^M

The difference is in line break in 1st email body

Current output

Quote:['p1prog06', 'p1prog06<br>']
['p2prog06', 'p2prog06<br>']
['=', '=']

as you can see, i'm getting duplicate jobs and missing job name from 1st email

Desired output


regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)'
but got

['p1prog06\r', 'p1prog06']
['p2prog06\r', 'p2prog06']

RE: Help with regular expresions - buran - Jan-10-2019

cross-posted on SO

Please, when you post on multiple sites (cross-posting) let us know - you don't want to waste the time of someone who try to help you and you already got answer elsewhere.

RE: Help with regular expresions - stullis - Jan-10-2019

Try something simpler: (?<=MACHINE: )\s*[\w\d]*

With your examples, that regex works. It will drop the line breaks too.

RE: Help with regular expresions - dragan979 - Jan-10-2019

People from SO helped a lot:

regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)|$'
​machine = re.findall(regex2, email_body)[0]