Python Forum
Matching between two files + using next() - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Matching between two files + using next() (/thread-27709.html)



Matching between two files + using next() - gwilliamson11 - Jun-18-2020

Hey all,

Hoping one of you folks might steer me in the right direction. I have a pretty big log file, and another file with a list of unique event identifiers. What I need to do is go through this big log file, and look for any line in it that contains a unique identifier from the second file. If it's found, I want to print not the line the event is on, but two lines below it ( on the log file). I'm pretty novice, so I'm sure I'm missing something obvious. Here's where I'm at so far. Any hint would be greatly appreciated. This is the culmination of many hours of fidgeting around, trying different methods found in forums. Like I said, I'm pretty novice.

with open('XXXXXXXXXXX', 'r+') as f1:


   with open('XXXXXXXXXXX', 'r+') as f2:


      writelines = f2.readlines()
      alines = f1.readlines()
      lines = iter(alines)
      lines2 = iter(writelines)
      for line in lines: 
         for line2 in lines2:
            if line2 in line:
               print(line)
               break
You can see at this point I'm not even trying to get two lines below the event ID, just trying to pull the event ID line as a starting point. Eventually I need to tackle the next() function, which I'm pretty iffy on.

Example Input + Expected Output:

f2
----------
11111
22222
33333
44444

f1
---------
asdfasldfkjas11111aervoiuer
asdlojaer;lg
aekrjnvetk22222
asldkaer
aa;lckr44444rldfvetr

expected output
----------
asdfasldfkjas11111aervoiuer
aekrjnvetk22222
aa;lckr44444rldfvetr

current output
---------------------


RE: Matching between two files + using next() - DPaul - Jun-18-2020

Hi,

What i would do is read all of the file2 identifiers into a list (using append).
You will end up with something like identif = ['1111', '2222','3333',...]
Close file2.
Now you read file1 line per line and iterate over ident[...] to look for a match.
It is not clear to me from your data, if 2 lines below there is another identifier match,
or what is is you want to do next.

Paul


RE: Matching between two files + using next() - DeaD_EyE - Jun-18-2020

How big are the two log files? Do they fit into memory?

If events do fit in memory, you can load all events into memory, strip newlines and put them into a set. A set has only unique elements and does not preserve the order.

The second file could be bigger because you can iterate line by line over the file object, which saves memory.

# naive approach

with open("f1.txt") as f1:
    events = set(line.strip() for line in f1)

with open("f2.txt") as f2:
    for f2_line in f2:
        for event in events:
            if event in f2_line:
                print(f2_line, end="")



RE: Matching between two files + using next() - gwilliamson11 - Jun-19-2020

@DeaD_EyE Thanks, that is 100% exactly what I needed.