Python Forum
First line with digits before last line - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: First line with digits before last line (/thread-38017.html)



First line with digits before last line - tester_V - Aug-21-2022

Greetings!
I’m parsing log files, 95% of the files have the lines I’m looking for.
5% does not, and I must get the first line (Line starts with date time “2022-08-14 14:37:46 ”)
And the last line (Line starts with date time “2022-08-14 14:39:00”)
The problem is not all last lines have a Date Time in the string.
I need to read the lines before the last line until I find one that starts with the data time ...

Here is what I got so far, it does not work as I wanted Confused :
import re

with open(r"C:/01/last_line.txt") as mfiler:
    frt_ln = mfiler.readline()
    print(f" Fl -> {frt_ln}")
    
    for rn_l in mfiler: 
        if 'Start' in rn_l :
            continue        
            # do something with the lines
        last_line = rn_l
    print(f" Last Line ->{last_line}")
    if not re.search('^\d+', last_line) :
        next
    else :
        print(f" Line with the DateTime -> {last_line}")
        #break
Here is a short example of the file:

2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:37:46.523,17784 ,Information,"Bac Info:
[DS_DK] Bac Test Result : Passed
[DS_DK] Bac Iteration Result : Passed
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet

Could you help me with this?
Thank you.


RE: First line with digits before last line - deanhystad - Aug-22-2022

import io
import re

data = io.StringIO("""
This is not the first line
2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:37:46.523,17784 ,Information,"Bac Info:
[DS_DK] Bac Test Result : Passed
[DS_DK] Bac Iteration Result : Passed
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet
""")

date_pattern = re.compile("^\d{4}-\d{2}-\d{2}")

first = last = None
for line in data:
    if re.search(date_pattern, line):
        last = line
        if first is None:
            first = line

print(first)
print(last)
Output:
2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ====================" 2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :



RE: First line with digits before last line - DeaD_EyE - Aug-22-2022

Example with start- and enddate. It's unclear what with tailing lines should happen. For example if the end date were detected, but there are following lines without a date. This example prints the remaining lines without date until a line with date is detected.

import io
import re
from datetime import datetime as DateTime


log_file_like = io.StringIO(
    """2022-08-14 14:37:46.523,17784 ,Information,"==================== Bac Start Run ===================="
2022-08-14 14:37:46.523,17784 ,Information,"Bac Info:
[DS_DK] Bac Test Result : Passed
[DS_DK] Bac Iteration Result : Passed
2022-08-14 14:37:46.524,17784 ,Warning
Condition: NO Condition
Allowed Stages: Any Stage
Set Type: Hard
2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces :
[Bac Setup] Ethernet Connection -2
[Bac Setup] USB 3.0 to GB Ethernet
2022-08-14 14:39:00.033,15060 <-- this line should be excluded from results
"""
)


DATE_REGEX = re.compile(r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{2,8})")


def parse_date(line):
    if match := DATE_REGEX.search(line):
        return DateTime.fromisoformat(match.group(1))


def find(fd, start_date: DateTime, end_date: DateTime):
    """
    Reading file line by line and check for each line the date.
    If a date were found and it's later then start_date, those lines are yielded.
    If the date reached the end_date, remaining lines until next date are yielded.
    """
    start_found = end_found = False

    for line in fd:
        date = parse_date(line)

        if end_found and date:
            return

        # date could be None
        # assigning date to last_date
        # for comparison
        if date is not None:
            start_found = True
            last_date = date

        if start_found and last_date >= end_date:
            end_found = True
            yield line
        elif start_found and last_date >= start_date:
            yield line


#          int,  int,   int, int,  int,    int,    int
# datetime(year, month, day, hour, minute, second, microsecond)

start_date = DateTime(2022, 8, 14, 14, 37, 46, 524 * 1000)
end_date = DateTime(2022, 8, 14, 14, 39, 0, 32 * 1000)


for line in find(log_file_like, start_date, end_date):
    print(line, end="")

print()
Output:
[andre@andre-Fujitsu-i5 ~]$ python xxxxxxxx.py 2022-08-14 14:37:46.524,17784 ,Warning Condition: NO Condition Allowed Stages: Any Stage Set Type: Hard 2022-08-14 14:39:00.032,15060 ,Information, Available network interfaces : [Bac Setup] Ethernet Connection -2 [Bac Setup] USB 3.0 to GB Ethernet



RE: First line with digits before last line - tester_V - Aug-22-2022

I really appreciate the snippets you shared!
Both examples look great.

But I'm not sure how to use it.
The script I'm using has about 300 lines already.
The problem with the "last line" is at the end of it.
The file I'm parsing is already open and a lot of things have happened to it by the time I come to the "last string".
I need to make changes here:
    if not re.search('^\d+', last_line) :
        next
    else :
        print(f" Line with the DateTime -> {last_line}")
        #break
Sorry about that!

Tester_v


RE: First line with digits before last line - deanhystad - Aug-22-2022

Without any information about what kind of processing is performed in your 300 line script it is difficult to determine if either example will work for your problem. This is a WAG for how to use my solution.
for line in data:
    if re.search(date_pattern, line):
        last = line
        if first is None:
            first = line
        # Additional processing for lines that start with date/time goes here
    else:
        # Processing for lines that don't start with date/time goes here
Or maybe it would work like this:
    if re.search(date_pattern, line):
        last = line
        if first is None:
            first = line
    # Additional processing for any line goes here



RE: First line with digits before last line - tester_V - Aug-22-2022

Thank you guys!
I appreciate your help!
I'm really not that good with programming.
I came up with a simple (for me) solution that seems working fine.
I probably invented a bicycle... Wink

Here is the code:

import re
import linecache
 
fl = 'C:/01/last_line.txt'
with open(r"C:/01/last_line.txt") as mfiler:
    lnc = 0  # <-- Line Numners
    frt_ln = mfiler.readline()  # <------------------ First Line
    print(f" Fl -> {frt_ln}")
    
    for rn_l in mfiler:
        lnc+=1    
        if 'Start' in rn_l :
            continue        
            # do something with the lines
        la_line = rn_l
 
while (lnc) > 0 :
    print(f" Number of Ln in the File = {str(lnc)}") # < --- Number of lines in the file
    if re.search('^\d+', la_line) :
        print(f"  LN has DT --> {la_line} ")
        lnc = 0
        break
    else :
        lnc = lnc -1
        la_line = linecache.getline('C:/01/last_line.txt', lnc)
Thank you again!
I love this forum Big Grin