Python Forum
Matching two files based on a spited elements
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Matching two files based on a spited elements
#1
Greetings!
I'm trying to find lines from one file in another file.
I have to do that by matching timestamp.
Lines look like this:
File 1
\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
\\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
File 2
\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
\\D1376\d$\logs\BBLog.1.xml,05/29/2021 18:04:02
\\D1376\d$\logs\BBLog.10.xml,05/29/2021 17:02:11
I have to split lines in File-1 and use split elements to match lines in File-2.
for some reason, I cannot print lines that are not in both files.

Here is a code I got:
tod1376 = 'C:\\01\\fileD1376.txt'  
yrd1376 = open ('C:\\02\\fileY1376.txt','r') 
yrd = yrd1376.readlines()

with open (tod1376,'r') as tod :
    for lntod in tod :
        lntod=lntod.strip()
        *whocares,sp_lntod = lntod.split(",")
        sp_EL1 = sp_lntod
        sp_EL1=str(sp_EL1).strip()

        for lnyrd in yrd :          
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)  
            print(f" Fund matched- -{lntod}")
            if not found :
               print(f" Not Matched ++ {lntod}")
Thank you!
Reply
#2
you are printing matched even when not.
Proper syntax:
        for lnyrd in yrd :          
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)
            if found:
                print(f" Fund matched- -{lntod}")
            else:
               print(f" Not Matched ++ {lntod}")
Reply
#3
Thanks Larz60+ !
It still does not work.
hers is what it prints out:
 Fund matched- -\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Fund matched- -\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
I need to get only lines from File-1 that is not in Files-2, see below
\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
\\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
Reply
#4
You should not print "Matched" / "Not Matched" after each line of file 2. You must read ALL of the lines of file 2 and if one matches, then the total result is True.
import re

tod1376 = 'file1.txt'
yrd1376 = open('file2.txt', 'r')
yrd = yrd1376.readlines()
yrd1376.close()

with open(tod1376, 'r') as tod:
    for lntod in tod:
        lntod = lntod.strip()
        *whocares, sp_lntod = lntod.split(",")
        sp_EL1 = sp_lntod
        sp_EL1 = str(sp_EL1).strip()

        foundline = False
        for lnyrd in yrd :
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)
            if found:
                foundline = True
        if foundline:
            print(f" Fund matched- -{lntod}")
        else:
           print(f" Not Matched ++ {lntod}")
Output:
Fund matched- -\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23 Fund matched- -\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03 Not Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03 Not Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
The algorithm is not very efficient, but I did not change that. I would prefer to create a set or some structure like that of file 2 so you can quickly check if a key of file 1 is in that set.
tester_V likes this post
Reply
#5
Can simplify this and regex is not need for this task.
with open('1.txt') as f1,open('2.txt') as f2:
    for line1, line2 in zip(f1, f2):
        if line1.split(',')[1] != line2.split(',')[1]:
             print(f"Found matched -- {line1.strip()}")
        else:
            print(f"Not matched ++ {line2.strip()}")
Output:
Not matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23 Not matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03 Found matched -- \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03 Found matched -- \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
Sets as mention bye ibreeden,will also give correct result with no modification like split().
Maybe need a little some change if most match against timestamp.
from pprint import pprint

with open('1.txt') as f1,open('2.txt') as f2:
    diff = set(f1).difference(f2)
    pprint(diff)
Output:
{'\\\\D1376\\d$\\logs\\BBLog.1.xml,05/28/2021 17:02:03\n', '\\\\D1376\\d$\\logs\\BBLog.10.xml,05/27/2021 15:22:15'}
tester_V likes this post
Reply
#6
You guys are awesome!
Each time I have a problem with a code I come here, I always get helped without an 'attitude' of the 'StackOverflow.
And even more, I get much-needed coaching...

To ibreeden:
Thank you! I'm confused about flags, totally missed it Confused

The files I'm working with are changing file Names each time a new file added to the file's directory but the timestamp is staying the same.
The file "\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03" will be "\\D1376\d$\logs\BBLog.5.xml,05/28/2021 17:02:03" tomorrow or "\\D1376\d$\logs\BBLog.25.xml,05/28/2021 17:02:03" in a week.
To find it tomorrow or in a week I must keep a file's list of yesterday's with the timestamps. That is why I'm using a 'timestamp' as an identifier of a file, not the filename. I'm not sure it is the best way to do this kind of search but this is what I came up with.

To snippsat:
Thank you for the snippet!
I see your point and I was thinking about using 'set' but not sure it will do the job when the names of the files I'm working with are changing and only the timestamp stays the same.
tod = set((line.strip() for line in open('tod1376.txt')))
yrd = set((line.strip() for line in open('yrd1376.txt')))

with open('File_diff.txt', 'w') as diff:
    for line in tod:
        if line not in yrd:
            diff.write(line)               
            print(f"{line}")  
Thank you again!
ibreeden likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Sorting Elements via parameters pointing to those elements. rpalmer 3 627 Feb-10-2021, 04:53 PM
Last Post: rpalmer
  Split gps files based on time (text splitting) dervast 0 380 Nov-09-2020, 09:19 AM
Last Post: dervast
  Removing some elements from array based on a condition claw91 0 444 Oct-27-2020, 03:42 PM
Last Post: claw91
  Matching between two files + using next() gwilliamson11 3 754 Jun-19-2020, 04:21 AM
Last Post: gwilliamson11
  Read Multiples Text Files get specific lines based criteria zinho 5 1,044 May-19-2020, 12:30 PM
Last Post: zinho
  How to get full path of specified hidden files matching pattern recursively SriRajesh 4 1,124 Jan-18-2020, 07:12 PM
Last Post: SriRajesh
  How to ger matching rows data based on index columns SriRajesh 1 988 Mar-08-2019, 11:05 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020