Python Forum
[SOLVED] Find last occurence of pattern in text file? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: [SOLVED] Find last occurence of pattern in text file? (/thread-34613.html)



[SOLVED] Find last occurence of pattern in text file? - Winfried - Aug-13-2021

Hello,

In a multiline text file, I need to find the last occurence of a pattern, ie. the Python equivalent of "tail".

None of the following works:

import re

INPUTFILE = "log.txt" 

with open(INPUTFILE) as reader:
    content = reader.read()

#====================
#very slow, even on 130KB file
p = re.compile("(?s:.*)^Some pattern.+$")
m = p.search(content)
if m.group(0):
	print(m.group(0))

#====================
#IndexError: list index out of range
p = re.compile("^Some pattern.+$")
m = p.findall(content)[-1]
if m.group(0):
	print(m.group(0))

#====================
#AttributeError: 'NoneType' object has no attribute 'group'
p = re.compile("^Some pattern.+$")
for line in content:
	m = p.search(line)
	if m.group(0):
		print(m.group(0))

#====================
#AttributeError: 'str' object has no attribute 'readlines'
for line in content.readlines():
	m = p.search(line)
	if m.group(0):
		print(m.group(0))
Anyone knows?

Thank you.


RE: Find last occurence of pattern in text file? - bowlofred - Aug-13-2021

tail doesn't select on patterns or show the "the last occurrence" of something, so I'm not sure I understand what you're looking for.

Do you have to support patterns (just a string match is insufficient)? Can the pattern span lines, or will it always be in a single line? Do you want to display the entire line the pattern matches, or just the result of the match?

Your first one looks like you have a slow pattern. It's possible to construct a pattern that requires backtracking. If you require pattern support across the entire file, it will be possible to supply a pattern that is slow. But if the pattern only has to match within a line, that will usually limit the problems that can arise.

If you don't need full pattern support, and you want to see the line of last occurrence, I'd probably suggest something like:
INPUTFILE = "log.txt"

target = "print"

with open(INPUTFILE) as reader:
    last_line = None
    for line in reader.read().splitlines():
        if target in line:
            last_line = line
if last_line:
    print(last_line)
else:
    print("No match")



RE: Find last occurence of pattern in text file? - Winfried - Aug-13-2021

Sorry for the confusion. I was trying to turn a Windows batch script into Python, that used grep + sed + tail, but you're right, the meat was in the grep + sed.

The following works to find two close by slightly different patterns starting from the end of the file:

pattern = "^START_A.+to (.+?) \(.+$"
p = re.compile(pattern)
for line in reversed(list(open(INPUTFILE))):
	m = p.search(line.rstrip())
	if m:
		print(m.group(1))
		break

pattern = "^START_B.+to (.+?) \(.+$"
p = re.compile(pattern)
for line in reversed(list(open(INPUTFILE))):
	m = p.search(line.rstrip())
	if m:
		print(m.group(1))
		break
I'll see if I can refine it so as to avoid needless copy/pasting.

Thank you!


RE: Find last occurence of pattern in text file? - bowlofred - Aug-13-2021

If you can show the original grep/sed/tail, that might be useful.

Also, how big are the files? Reversing a MB file seems unnecessary, but acceptable. If you're scanning GB files, that starts to get silly.


RE: Find last occurence of pattern in text file? - Winfried - Aug-13-2021

It's just a ~100KB file, so it fast enough.

I simplified the script with a function:

import re
#pip install pyperclip
import pyperclip

def SearchAndTell(MYFILE,mypattern):
	p = re.compile(mypattern)
	for line in reversed(open(MYFILE).readlines()):
		m = p.search(line.rstrip())
		if m:
			return m.group(1)
			break #needed?

INPUTFILE = "log.txt" 
clipb = None

pattern = "^START_A.+to (.+?) \(.+$"
clipb = f"-ss {SearchAndTell(INPUTFILE,pattern)} ".replace(",",".")
pattern = "^START_B.+to (.+?) \(.+$"
clipb += f"-to {SearchAndTell(INPUTFILE,pattern)}".replace(",",".")

pyperclip.copy(clipb)
FWIW, here's the batch script:
grep -Poha "^START_A.+$" log.txt | sed -r "s@^.+ to (.+?) \(.+$@-ss \1@" | sed -r "s@,@.@g" | tail -1
grep -Poha "^START_B.+$" log.txt | sed -r "s@^.+ to (.+?) \(.+$@-to \1@" | sed -r "s@,@.@g" | tail -1