Python Forum

Full Version: [SOLVED] Find last occurence of pattern in text file?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

In a multiline text file, I need to find the last occurence of a pattern, ie. the Python equivalent of "tail".

None of the following works:

import re

INPUTFILE = "log.txt" 

with open(INPUTFILE) as reader:
    content = reader.read()

#====================
#very slow, even on 130KB file
p = re.compile("(?s:.*)^Some pattern.+$")
m = p.search(content)
if m.group(0):
	print(m.group(0))

#====================
#IndexError: list index out of range
p = re.compile("^Some pattern.+$")
m = p.findall(content)[-1]
if m.group(0):
	print(m.group(0))

#====================
#AttributeError: 'NoneType' object has no attribute 'group'
p = re.compile("^Some pattern.+$")
for line in content:
	m = p.search(line)
	if m.group(0):
		print(m.group(0))

#====================
#AttributeError: 'str' object has no attribute 'readlines'
for line in content.readlines():
	m = p.search(line)
	if m.group(0):
		print(m.group(0))
Anyone knows?

Thank you.
tail doesn't select on patterns or show the "the last occurrence" of something, so I'm not sure I understand what you're looking for.

Do you have to support patterns (just a string match is insufficient)? Can the pattern span lines, or will it always be in a single line? Do you want to display the entire line the pattern matches, or just the result of the match?

Your first one looks like you have a slow pattern. It's possible to construct a pattern that requires backtracking. If you require pattern support across the entire file, it will be possible to supply a pattern that is slow. But if the pattern only has to match within a line, that will usually limit the problems that can arise.

If you don't need full pattern support, and you want to see the line of last occurrence, I'd probably suggest something like:
INPUTFILE = "log.txt"

target = "print"

with open(INPUTFILE) as reader:
    last_line = None
    for line in reader.read().splitlines():
        if target in line:
            last_line = line
if last_line:
    print(last_line)
else:
    print("No match")
Sorry for the confusion. I was trying to turn a Windows batch script into Python, that used grep + sed + tail, but you're right, the meat was in the grep + sed.

The following works to find two close by slightly different patterns starting from the end of the file:

pattern = "^START_A.+to (.+?) \(.+$"
p = re.compile(pattern)
for line in reversed(list(open(INPUTFILE))):
	m = p.search(line.rstrip())
	if m:
		print(m.group(1))
		break

pattern = "^START_B.+to (.+?) \(.+$"
p = re.compile(pattern)
for line in reversed(list(open(INPUTFILE))):
	m = p.search(line.rstrip())
	if m:
		print(m.group(1))
		break
I'll see if I can refine it so as to avoid needless copy/pasting.

Thank you!
If you can show the original grep/sed/tail, that might be useful.

Also, how big are the files? Reversing a MB file seems unnecessary, but acceptable. If you're scanning GB files, that starts to get silly.
It's just a ~100KB file, so it fast enough.

I simplified the script with a function:

import re
#pip install pyperclip
import pyperclip

def SearchAndTell(MYFILE,mypattern):
	p = re.compile(mypattern)
	for line in reversed(open(MYFILE).readlines()):
		m = p.search(line.rstrip())
		if m:
			return m.group(1)
			break #needed?

INPUTFILE = "log.txt" 
clipb = None

pattern = "^START_A.+to (.+?) \(.+$"
clipb = f"-ss {SearchAndTell(INPUTFILE,pattern)} ".replace(",",".")
pattern = "^START_B.+to (.+?) \(.+$"
clipb += f"-to {SearchAndTell(INPUTFILE,pattern)}".replace(",",".")

pyperclip.copy(clipb)
FWIW, here's the batch script:
grep -Poha "^START_A.+$" log.txt | sed -r "s@^.+ to (.+?) \(.+$@-ss \1@" | sed -r "s@,@.@g" | tail -1
grep -Poha "^START_B.+$" log.txt | sed -r "s@^.+ to (.+?) \(.+$@-to \1@" | sed -r "s@,@.@g" | tail -1