Python Forum
Matching two files based on a spited elements - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Matching two files based on a spited elements (/thread-33816.html)



Matching two files based on a spited elements - tester_V - May-29-2021

Greetings!
I'm trying to find lines from one file in another file.
I have to do that by matching timestamp.
Lines look like this:
File 1
\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
\\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
File 2
\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
\\D1376\d$\logs\BBLog.1.xml,05/29/2021 18:04:02
\\D1376\d$\logs\BBLog.10.xml,05/29/2021 17:02:11
I have to split lines in File-1 and use split elements to match lines in File-2.
for some reason, I cannot print lines that are not in both files.

Here is a code I got:
tod1376 = 'C:\\01\\fileD1376.txt'  
yrd1376 = open ('C:\\02\\fileY1376.txt','r') 
yrd = yrd1376.readlines()

with open (tod1376,'r') as tod :
    for lntod in tod :
        lntod=lntod.strip()
        *whocares,sp_lntod = lntod.split(",")
        sp_EL1 = sp_lntod
        sp_EL1=str(sp_EL1).strip()

        for lnyrd in yrd :          
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)  
            print(f" Fund matched- -{lntod}")
            if not found :
               print(f" Not Matched ++ {lntod}")
Thank you!


RE: Matching two files based on a spited elements - Larz60+ - May-29-2021

you are printing matched even when not.
Proper syntax:
        for lnyrd in yrd :          
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)
            if found:
                print(f" Fund matched- -{lntod}")
            else:
               print(f" Not Matched ++ {lntod}")



RE: Matching two files based on a spited elements - tester_V - May-30-2021

Thanks Larz60+ !
It still does not work.
hers is what it prints out:
 Fund matched- -\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Fund matched- -\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
I need to get only lines from File-1 that is not in Files-2, see below
\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
\\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15



RE: Matching two files based on a spited elements - ibreeden - May-30-2021

You should not print "Matched" / "Not Matched" after each line of file 2. You must read ALL of the lines of file 2 and if one matches, then the total result is True.
import re

tod1376 = 'file1.txt'
yrd1376 = open('file2.txt', 'r')
yrd = yrd1376.readlines()
yrd1376.close()

with open(tod1376, 'r') as tod:
    for lntod in tod:
        lntod = lntod.strip()
        *whocares, sp_lntod = lntod.split(",")
        sp_EL1 = sp_lntod
        sp_EL1 = str(sp_EL1).strip()

        foundline = False
        for lnyrd in yrd :
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)
            if found:
                foundline = True
        if foundline:
            print(f" Fund matched- -{lntod}")
        else:
           print(f" Not Matched ++ {lntod}")
Output:
Fund matched- -\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23 Fund matched- -\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03 Not Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03 Not Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
The algorithm is not very efficient, but I did not change that. I would prefer to create a set or some structure like that of file 2 so you can quickly check if a key of file 1 is in that set.


RE: Matching two files based on a spited elements - snippsat - May-30-2021

Can simplify this and regex is not need for this task.
with open('1.txt') as f1,open('2.txt') as f2:
    for line1, line2 in zip(f1, f2):
        if line1.split(',')[1] != line2.split(',')[1]:
             print(f"Found matched -- {line1.strip()}")
        else:
            print(f"Not matched ++ {line2.strip()}")
Output:
Not matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23 Not matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03 Found matched -- \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03 Found matched -- \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
Sets as mention bye ibreeden,will also give correct result with no modification like split().
Maybe need a little some change if most match against timestamp.
from pprint import pprint

with open('1.txt') as f1,open('2.txt') as f2:
    diff = set(f1).difference(f2)
    pprint(diff)
Output:
{'\\\\D1376\\d$\\logs\\BBLog.1.xml,05/28/2021 17:02:03\n', '\\\\D1376\\d$\\logs\\BBLog.10.xml,05/27/2021 15:22:15'}



RE: Matching two files based on a spited elements - tester_V - May-30-2021

You guys are awesome!
Each time I have a problem with a code I come here, I always get helped without an 'attitude' of the 'StackOverflow.
And even more, I get much-needed coaching...

To ibreeden:
Thank you! I'm confused about flags, totally missed it Confused

The files I'm working with are changing file Names each time a new file added to the file's directory but the timestamp is staying the same.
The file "\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03" will be "\\D1376\d$\logs\BBLog.5.xml,05/28/2021 17:02:03" tomorrow or "\\D1376\d$\logs\BBLog.25.xml,05/28/2021 17:02:03" in a week.
To find it tomorrow or in a week I must keep a file's list of yesterday's with the timestamps. That is why I'm using a 'timestamp' as an identifier of a file, not the filename. I'm not sure it is the best way to do this kind of search but this is what I came up with.

To snippsat:
Thank you for the snippet!
I see your point and I was thinking about using 'set' but not sure it will do the job when the names of the files I'm working with are changing and only the timestamp stays the same.
tod = set((line.strip() for line in open('tod1376.txt')))
yrd = set((line.strip() for line in open('yrd1376.txt')))

with open('File_diff.txt', 'w') as diff:
    for line in tod:
        if line not in yrd:
            diff.write(line)               
            print(f"{line}")  
Thank you again!