Posts: 414
Threads: 111
Joined: Apr 2020
Greetings!
I’m parsing log files, 95% of the files have the lines I’m looking for.
5% does not, and I must get the first line (Line starts with date time “2022-08-14 14:37:46 ”)
And the last line (Line starts with date time “2022-08-14 14:39:00”)
The problem is not all last lines have a Date Time in the string.
I need to read the lines before the last line until I find one that starts with the data time ...
Here is what I got so far, it does not work as I wanted  :
import re
with open(r"C:/01/last_line.txt") as mfiler:
frt_ln = mfiler.readline()
print(f" Fl -> {frt_ln}")
for rn_l in mfiler:
if 'Start' in rn_l :
continue
# do something with the lines
last_line = rn_l
print(f" Last Line ->{last_line}")
if not re.search('^\d+', last_line) :
next
else :
print(f" Line with the DateTime -> {last_line}")
#break Here is a short example of the file:
2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:37:46.523,17784 ,Information,"Bac Info:
[DS_DK] Bac Test Result : Passed
[DS_DK] Bac Iteration Result : Passed
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet
Could you help me with this?
Thank you.
Posts: 6,779
Threads: 20
Joined: Feb 2020
import io
import re
data = io.StringIO("""
This is not the first line
2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:37:46.523,17784 ,Information,"Bac Info:
[DS_DK] Bac Test Result : Passed
[DS_DK] Bac Iteration Result : Passed
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet
""")
date_pattern = re.compile("^\d{4}-\d{2}-\d{2}")
first = last = None
for line in data:
if re.search(date_pattern, line):
last = line
if first is None:
first = line
print(first)
print(last) Output: 2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
Posts: 2,125
Threads: 11
Joined: May 2017
Example with start- and enddate. It's unclear what with tailing lines should happen. For example if the end date were detected, but there are following lines without a date. This example prints the remaining lines without date until a line with date is detected.
import io
import re
from datetime import datetime as DateTime
log_file_like = io.StringIO(
"""2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:37:46.523,17784 ,Information,"Bac Info:
[DS_DK] Bac Test Result : Passed
[DS_DK] Bac Iteration Result : Passed
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet
2022-08-14 14:39:00.033,15060 <-- this line should be excluded from results
"""
)
DATE_REGEX = re.compile(r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{2,8})")
def parse_date(line):
if match := DATE_REGEX.search(line):
return DateTime.fromisoformat(match.group(1))
def find(fd, start_date: DateTime, end_date: DateTime):
"""
Reading file line by line and check for each line the date.
If a date were found and it's later then start_date, those lines are yielded.
If the date reached the end_date, remaining lines until next date are yielded.
"""
start_found = end_found = False
for line in fd:
date = parse_date(line)
if end_found and date:
return
# date could be None
# assigning date to last_date
# for comparison
if date is not None:
start_found = True
last_date = date
if start_found and last_date >= end_date:
end_found = True
yield line
elif start_found and last_date >= start_date:
yield line
# int, int, int, int, int, int, int
# datetime(year, month, day, hour, minute, second, microsecond)
start_date = DateTime(2022, 8, 14, 14, 37, 46, 524 * 1000)
end_date = DateTime(2022, 8, 14, 14, 39, 0, 32 * 1000)
for line in find(log_file_like, start_date, end_date):
print(line, end="")
print() Output: [andre@andre-Fujitsu-i5 ~]$ python xxxxxxxx.py
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet
Posts: 414
Threads: 111
Joined: Apr 2020
I really appreciate the snippets you shared!
Both examples look great.
But I'm not sure how to use it.
The script I'm using has about 300 lines already.
The problem with the "last line" is at the end of it.
The file I'm parsing is already open and a lot of things have happened to it by the time I come to the "last string".
I need to make changes here:
if not re.search('^\d+', last_line) :
next
else :
print(f" Line with the DateTime -> {last_line}")
#break Sorry about that!
Tester_v
Posts: 6,779
Threads: 20
Joined: Feb 2020
Without any information about what kind of processing is performed in your 300 line script it is difficult to determine if either example will work for your problem. This is a WAG for how to use my solution.
for line in data:
if re.search(date_pattern, line):
last = line
if first is None:
first = line
# Additional processing for lines that start with date/time goes here
else:
# Processing for lines that don't start with date/time goes here Or maybe it would work like this:
if re.search(date_pattern, line):
last = line
if first is None:
first = line
# Additional processing for any line goes here
Posts: 414
Threads: 111
Joined: Apr 2020
Thank you guys!
I appreciate your help!
I'm really not that good with programming.
I came up with a simple (for me) solution that seems working fine.
I probably invented a bicycle...
Here is the code:
import re
import linecache
fl = 'C:/01/last_line.txt'
with open(r"C:/01/last_line.txt") as mfiler:
lnc = 0 # <-- Line Numners
frt_ln = mfiler.readline() # <------------------ First Line
print(f" Fl -> {frt_ln}")
for rn_l in mfiler:
lnc+=1
if 'Start' in rn_l :
continue
# do something with the lines
la_line = rn_l
while (lnc) > 0 :
print(f" Number of Ln in the File = {str(lnc)}") # < --- Number of lines in the file
if re.search('^\d+', la_line) :
print(f" LN has DT --> {la_line} ")
lnc = 0
break
else :
lnc = lnc -1
la_line = linecache.getline('C:/01/last_line.txt', lnc) Thank you again!
I love this forum
|