Python Forum
Matching two files based on a spited elements
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Matching two files based on a spited elements
#1
Greetings!
I'm trying to find lines from one file in another file.
I have to do that by matching timestamp.
Lines look like this:
File 1
\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
\\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
File 2
\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
\\D1376\d$\logs\BBLog.1.xml,05/29/2021 18:04:02
\\D1376\d$\logs\BBLog.10.xml,05/29/2021 17:02:11
I have to split lines in File-1 and use split elements to match lines in File-2.
for some reason, I cannot print lines that are not in both files.

Here is a code I got:
tod1376 = 'C:\\01\\fileD1376.txt'  
yrd1376 = open ('C:\\02\\fileY1376.txt','r') 
yrd = yrd1376.readlines()

with open (tod1376,'r') as tod :
    for lntod in tod :
        lntod=lntod.strip()
        *whocares,sp_lntod = lntod.split(",")
        sp_EL1 = sp_lntod
        sp_EL1=str(sp_EL1).strip()

        for lnyrd in yrd :          
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)  
            print(f" Fund matched- -{lntod}")
            if not found :
               print(f" Not Matched ++ {lntod}")
Thank you!
Reply
#2
you are printing matched even when not.
Proper syntax:
        for lnyrd in yrd :          
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)
            if found:
                print(f" Fund matched- -{lntod}")
            else:
               print(f" Not Matched ++ {lntod}")
Reply
#3
Thanks Larz60+ !
It still does not work.
hers is what it prints out:
 Fund matched- -\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Fund matched- -\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
 Not -------------------Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
I need to get only lines from File-1 that is not in Files-2, see below
\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03
\\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
Reply
#4
You should not print "Matched" / "Not Matched" after each line of file 2. You must read ALL of the lines of file 2 and if one matches, then the total result is True.
import re

tod1376 = 'file1.txt'
yrd1376 = open('file2.txt', 'r')
yrd = yrd1376.readlines()
yrd1376.close()

with open(tod1376, 'r') as tod:
    for lntod in tod:
        lntod = lntod.strip()
        *whocares, sp_lntod = lntod.split(",")
        sp_EL1 = sp_lntod
        sp_EL1 = str(sp_EL1).strip()

        foundline = False
        for lnyrd in yrd :
            lnyrd=lnyrd.strip()
            found = re.search(sp_EL1,lnyrd)
            if found:
                foundline = True
        if foundline:
            print(f" Fund matched- -{lntod}")
        else:
           print(f" Not Matched ++ {lntod}")
Output:
Fund matched- -\\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23 Fund matched- -\\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03 Not Matched ++ \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03 Not Matched ++ \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
The algorithm is not very efficient, but I did not change that. I would prefer to create a set or some structure like that of file 2 so you can quickly check if a key of file 1 is in that set.
tester_V likes this post
Reply
#5
Can simplify this and regex is not need for this task.
with open('1.txt') as f1,open('2.txt') as f2:
    for line1, line2 in zip(f1, f2):
        if line1.split(',')[1] != line2.split(',')[1]:
             print(f"Found matched -- {line1.strip()}")
        else:
            print(f"Not matched ++ {line2.strip()}")
Output:
Not matched ++ \\D1376\d$\logs\BBLog.100.xml,05/13/2021 00:01:23 Not matched ++ \\D1376\d$\logs\BBLog.11.xml,05/27/2021 15:11:03 Found matched -- \\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03 Found matched -- \\D1376\d$\logs\BBLog.10.xml,05/27/2021 15:22:15
Sets as mention bye ibreeden,will also give correct result with no modification like split().
Maybe need a little some change if most match against timestamp.
from pprint import pprint

with open('1.txt') as f1,open('2.txt') as f2:
    diff = set(f1).difference(f2)
    pprint(diff)
Output:
{'\\\\D1376\\d$\\logs\\BBLog.1.xml,05/28/2021 17:02:03\n', '\\\\D1376\\d$\\logs\\BBLog.10.xml,05/27/2021 15:22:15'}
tester_V likes this post
Reply
#6
You guys are awesome!
Each time I have a problem with a code I come here, I always get helped without an 'attitude' of the 'StackOverflow.
And even more, I get much-needed coaching...

To ibreeden:
Thank you! I'm confused about flags, totally missed it Confused

The files I'm working with are changing file Names each time a new file added to the file's directory but the timestamp is staying the same.
The file "\\D1376\d$\logs\BBLog.1.xml,05/28/2021 17:02:03" will be "\\D1376\d$\logs\BBLog.5.xml,05/28/2021 17:02:03" tomorrow or "\\D1376\d$\logs\BBLog.25.xml,05/28/2021 17:02:03" in a week.
To find it tomorrow or in a week I must keep a file's list of yesterday's with the timestamps. That is why I'm using a 'timestamp' as an identifier of a file, not the filename. I'm not sure it is the best way to do this kind of search but this is what I came up with.

To snippsat:
Thank you for the snippet!
I see your point and I was thinking about using 'set' but not sure it will do the job when the names of the files I'm working with are changing and only the timestamp stays the same.
tod = set((line.strip() for line in open('tod1376.txt')))
yrd = set((line.strip() for line in open('yrd1376.txt')))

with open('File_diff.txt', 'w') as diff:
    for line in tod:
        if line not in yrd:
            diff.write(line)               
            print(f"{line}")  
Thank you again!
ibreeden likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Copy Paste excel files based on the first letters of the file name Viento 2 422 Feb-07-2024, 12:24 PM
Last Post: Viento
  unable to remove all elements from list based on a condition sg_python 3 423 Jan-27-2024, 04:03 PM
Last Post: deanhystad
  Move Files based on partial Match mohamedsalih12 2 808 Sep-20-2023, 07:38 PM
Last Post: snippsat
  Making a question answering chatbot based on the files I upload into python. Joejones 1 1,215 May-19-2023, 03:09 PM
Last Post: deanhystad
  ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements ilknurg 1 5,110 May-17-2022, 11:38 AM
Last Post: Larz60+
  Replace elements of array with elements from another array based on a third array Cola_Reb 6 1,831 May-13-2022, 06:06 PM
Last Post: deanhystad
Question Change elements of array based on position of input data Cola_Reb 6 2,110 May-13-2022, 12:57 PM
Last Post: Cola_Reb
  select Eof extension files based on text list of filenames with if condition RolanRoll 1 1,507 Apr-04-2022, 09:29 PM
Last Post: Larz60+
  Finding files matching pattern GrahamL 1 1,284 Jan-14-2022, 01:16 PM
Last Post: DeaD_EyE
  Compare filename with folder name and copy matching files into a particular folder shantanu97 2 4,471 Dec-18-2021, 09:32 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020