Python Forum
Matching between two files + using next()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Matching between two files + using next()
#1
Hey all,

Hoping one of you folks might steer me in the right direction. I have a pretty big log file, and another file with a list of unique event identifiers. What I need to do is go through this big log file, and look for any line in it that contains a unique identifier from the second file. If it's found, I want to print not the line the event is on, but two lines below it ( on the log file). I'm pretty novice, so I'm sure I'm missing something obvious. Here's where I'm at so far. Any hint would be greatly appreciated. This is the culmination of many hours of fidgeting around, trying different methods found in forums. Like I said, I'm pretty novice.

with open('XXXXXXXXXXX', 'r+') as f1:


   with open('XXXXXXXXXXX', 'r+') as f2:


      writelines = f2.readlines()
      alines = f1.readlines()
      lines = iter(alines)
      lines2 = iter(writelines)
      for line in lines: 
         for line2 in lines2:
            if line2 in line:
               print(line)
               break
You can see at this point I'm not even trying to get two lines below the event ID, just trying to pull the event ID line as a starting point. Eventually I need to tackle the next() function, which I'm pretty iffy on.

Example Input + Expected Output:

f2
----------
11111
22222
33333
44444

f1
---------
asdfasldfkjas11111aervoiuer
asdlojaer;lg
aekrjnvetk22222
asldkaer
aa;lckr44444rldfvetr

expected output
----------
asdfasldfkjas11111aervoiuer
aekrjnvetk22222
aa;lckr44444rldfvetr

current output
---------------------
Reply
#2
Hi,

What i would do is read all of the file2 identifiers into a list (using append).
You will end up with something like identif = ['1111', '2222','3333',...]
Close file2.
Now you read file1 line per line and iterate over ident[...] to look for a match.
It is not clear to me from your data, if 2 lines below there is another identifier match,
or what is is you want to do next.

Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
How big are the two log files? Do they fit into memory?

If events do fit in memory, you can load all events into memory, strip newlines and put them into a set. A set has only unique elements and does not preserve the order.

The second file could be bigger because you can iterate line by line over the file object, which saves memory.

# naive approach

with open("f1.txt") as f1:
    events = set(line.strip() for line in f1)

with open("f2.txt") as f2:
    for f2_line in f2:
        for event in events:
            if event in f2_line:
                print(f2_line, end="")
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
@DeaD_EyE Thanks, that is 100% exactly what I needed.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Finding files matching pattern GrahamL 1 1,241 Jan-14-2022, 01:16 PM
Last Post: DeaD_EyE
  Compare filename with folder name and copy matching files into a particular folder shantanu97 2 4,390 Dec-18-2021, 09:32 PM
Last Post: Larz60+
  Matching two files based on a spited elements tester_V 5 2,730 May-30-2021, 07:49 PM
Last Post: tester_V
  How to get full path of specified hidden files matching pattern recursively SriRajesh 4 3,842 Jan-18-2020, 07:12 PM
Last Post: SriRajesh

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020