Python Forum
capture next block of text after finding error in file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: capture next block of text after finding error in file (/thread-22771.html)



capture next block of text after finding error in file - kdefilip2 - Nov-26-2019

I have some code to find an error message in an error log file. When the error is found, the very next block of text will be a path. I need to capture that path.

In other words, I am searching a text file for "reported errors in the". When that string is found in the file, I need the next block of text which will be something like /var/logs/[filename]. Not sure how to accomplish this.

My current code to find the error string is:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# set variables
strFileExt=""
strFile=""
strError =""

# import required modules
import datetime
import os
import subprocess
now = datetime.datetime.now()

# for testing time format
#print (now.strftime("%m%d%y"))

strWhere = "/var/logs/error.log."+(now.strftime("%m%d%y"))
#print (strFileExt)

strWhat = "reported errors in the"
#print (strWhat)
#print (strWhere)
strResult = 0



# read file
try:
    with open(strWhere, "r") as file:
        lines = file.readlines()
except IOError:
    strError = 10
except FileNotFoundError:
    strError = 11
except Exception:
    strError = 12
    if (strError )>5:
        print ( strError )

for line in lines:
    line = line.strip()
    if line.find( strWhat )!= -1:
            strResult = strResult + 1
else:   #do this when the loop is finished
# display results
    print (strResult)
So basically,

if strResult = 1:

Grab the next contiguous block of text after strWhat

Not sure if that is clear, but thanks for any help in advance.


Capture next word after search string - kdefilip2 - Nov-27-2019

Python 2.7

I am searching a file for an error string "reported errors in the".

When this phrase is found, I need to capture the NEXT word immediately after the search string.

Is there a way in Python to accomplish this? In my attempts thus far, I am only able to split and capture words which are in my original search string.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

strFileExt=""
strFile=""
strError = 0
strFoundwords = ""



# import required modules
import datetime
import os
import subprocess
now = datetime.datetime.now()

# for testing timie format
#print (now.strftime("%m%d%y"))


strWhere = "/var/logs/error.log."+(now.strftime("%m%d%y"))
#print (strWhere)

strWhat = "reported errors in the"
#print (strWhat)
#print (strWhere)
strResult = 0



# read file
try:
    with open(strWhere, "r") as Myfile:
        lines = Myfile.readlines()
except IOError:
    strError = 10
except FileNotFoundError:
    strError = 11
except Exception:
    strError = 12
    if (strError )>5:
        print ( strError )

for line in lines:
    line = line.strip()
    if line.find( strWhat )!= -1:
            strResult = strResult + 1
            s = "reported errors in the /var/logs/"
            q = 'reported'
            res = s[s.find(q)+len(q):].split()[+3]
else:   #do this when the loop is finished
# display results
    print (strResult)
    print ( strError)
    print ( res )
The best I have been able to do is collect words that are part of the search string. I need the next word after "/var/logs" .

As is, with split()[+3] gives we what I already know.

Output:
... ... 1 0 /var/logs/
If I try +4, I get an "index out of range" error.

Error:
... Traceback (most recent call last): File "<stdin>", line 7, in <module> IndexError: list index out of range >>>
Any advice would be appreciated.


RE: Capture next word after search string - ThomasL - Nov-27-2019

look at line 50, are you sure you want to slice something out of variable s ?


RE: Capture next word after search string - kdefilip2 - Nov-27-2019

(Nov-27-2019, 05:21 PM)ThomasL Wrote: look at line 50, are you sure you want to slice something out of variable s ?

When it comes to Python, I'm not sure of anything ;)

The way I read that, s= "my entire search string" and q = the starting point of my search string. The split()[+3] takes me to /var/logs/, but it is the next segment of the path that I need to collect as that segment will always be varied and unpredictable.
I think I'm coming to the realization that I can not advance past anything beyond the search string (s).


RE: capture next block of text after finding error in file - buran - Nov-27-2019

just to say that showing sample input text/file may help us enormously to help you. at the moment we just guess how your text looks like.


RE: capture next block of text after finding error in file - kdefilip2 - Nov-27-2019

(Nov-27-2019, 06:25 PM)buran Wrote: just to say that showing sample input text/file may help us enormously to help you. at the moment we just guess how your text looks like.

Thanks Buran.
Here is a segment of the log I am checking, the last line is the error condition I am looking for. The file is much larger than this snip but shows both conditions, error line and non-error lines.


Output:
2019-11-18 01:46:47:INFO:3496839:/var/logs/apc/ : no errors detected. 2019-11-18 01:46:47:INFO:3496839:/var/logs/xyz/ : no errors detected. 2019-11-18 01:46:47:ERROR:3496839:check reported errors in the /var/logs/jkl/ database. These should be rechecked to verify if the errors are accurate.



RE: capture next block of text after finding error in file - buran - Nov-27-2019

error.log
Output:
2019-11-18 01:46:47:INFO:3496839:/var/logs/apc/ : no errors detected. 2019-11-18 01:46:47:INFO:3496839:/var/logs/xyz/ : no errors detected. 2019-11-18 01:46:47:ERROR:3496839:check reported errors in the /var/logs/jkl/ database. These should be rechecked to verify if the errors are accurate. 2019-11-18 01:46:47:ERROR:3496839:check reported errors in the /var/logs/jkl/spam database. These should be rechecked to verify if the errors are accurate.
using just str methods
log_file = 'error.log'

with open(log_file) as lf:
    for line in lf:
        log_date_hour, log_minute, log_seconds, log_type, some_code, info, *rest = line.split(':')
        if log_type == 'ERROR':
            print(info.split(' ')[5])
Output:
/var/logs/jkl/ /var/logs/jkl/spam
using regex

import re
regex = re.compile(r'reported errors in the (?P<path>\S*)', flags=re.MULTILINE)
with open(log_file) as lf:
    logs = lf.read()

paths = regex.findall(logs)
print(paths)
Output:
['/var/logs/jkl/', '/var/logs/jkl/spam']
if file is huge it may be better to read it line by line (like in first example and use regex to parse the line). maybe it's possible to make better regex pattern though

also both snippets will fail if there is space in the path. if you expect these you may have to adjust the code accordingly