Bottom Page

Thread Rating:
  • 4 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Help with regular expresions-SOLVED
#1
I already posted this on Stack Overflow

I have emails which have following in email body:

1 email:

Event demon log entry:

[10/01/2019 08:13:44] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2_batch_excel_quants_fx_daily_vol_check_0800 MACHINE: ldnmdsbatchxl01 EXITCODE: 268438455

2.email:

Event demon log entry:

[10/01/2019 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p2_credit_qv_curve_snap MACHINE: p2prog06

3 email:

Event demon log entry:

[10/01/2019 08:15:03] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p1_credit_qv_curve_snap MACHINE: p1prog06

These emails have attachments which also have job names, i want to get job name for every email once

for emailid in items:
    resp, data = conn.uid("fetch",emailid, "(RFC822)")
    if resp == 'OK':
        email_body = data[0][1].decode('utf-8')
        mail = email.message_from_string(email_body)
        #get all emails with words "PA1" or "PA2" in subject
        if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
                  #search email body for machine name (string after word "MACHINE")
          regex1 = r'(?<!^)MACHINE:\s*(\S+)'

          a=re.findall(regex1 ,email_body)
          print (c)
example of message body from 1st email for MACHINE section:

Quote:MACHINE: =^M
ldnmdsbatchxl01

Email body for 2nd email

Quote:MACHINE: p2prog06^M
MACHINE: p2prog06<br>^M

The difference is in line break in 1st email body

Current output

Quote:['p1prog06', 'p1prog06<br>']
['p2prog06', 'p2prog06<br>']
['=', '=']

as you can see, i'm getting duplicate jobs and missing job name from 1st email

Desired output

Quote:['p1prog06']
['p2prog06']
['ldnmdsbatchxl01']

tried
regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)'
but got

['p1prog06\r', 'p1prog06']
['p2prog06\r', 'p2prog06']
['\r']
Quote
#2
cross-posted on SO

Please, when you post on multiple sites (cross-posting) let us know - you don't want to waste the time of someone who try to help you and you already got answer elsewhere.
Quote
#3
Try something simpler: (?<=MACHINE: )\s*[\w\d]*

With your examples, that regex works. It will drop the line breaks too.
Quote
#4
People from SO helped a lot:


regex2 = r'\bMACHINE:\s*(?:=.*)?\s*([^<^\n ]+)|$'
​machine = re.findall(regex2, email_body)[0]
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Regular Expressions amitalable 4 175 Mar-14-2019, 04:31 PM
Last Post: DeaD_EyE
  Regular Expression rzbddm 4 403 Oct-30-2018, 04:25 PM
Last Post: stranac
  How to kill a pid [SOLVED] ebolisa 6 516 Oct-28-2018, 01:12 PM
Last Post: ebolisa
  get email text body: very close but I can't get it [SOLVED} Pedroski55 5 912 Oct-07-2018, 11:38 AM
Last Post: j.crater
  Python equivalent to sed [solved] cygnus_X1 2 1,388 Sep-24-2018, 10:13 PM
Last Post: cygnus_X1
  Regular expressions help re.error: multiple repeat at position 23 JoseSalazar1 2 674 Sep-18-2018, 01:29 AM
Last Post: volcano63
  [SOLVED] [Nominatim] How to get "town" from reply? Winfried 0 373 Aug-27-2018, 05:36 AM
Last Post: Winfried
  regular expression question Sanlus 6 550 Aug-04-2018, 06:49 PM
Last Post: volcano63
  [SOLVED] Changing a string to an int Panda 3 825 Jun-23-2018, 08:12 PM
Last Post: gontajones
  [SOLVED] Print an int with a string Panda 2 511 Jun-09-2018, 12:46 PM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)