Python Forum
Compare 2 files for duplicates and save the differences
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compare 2 files for duplicates and save the differences
#1
So we have a scenario where we need to compare 2 files and create a 3rd of the duplicates.

I cant provide actual files due to the sensitivity of the data, but here is the situation

Need to compare these 2 files and generate 2 new files for reprocessing.
#1 File Original (Cannot edit or manipulate in anyway)
#2 File Original but was accidently updated and now contains duplicate records that are mixed in with all records

#3 File Needs to contain only the records that were duplicate in both files (verification purposes)
#4 File Needs to be the clean version with NO duplicates. (This file should be identical , with the possibility of extra records to #1 file) This would then allow us to say that original records exists and there are X number of new records and can be safely reprocessed)

What I'm looking for is direction the best way to accomplish. I have used pandas before, is that the best or easiest way to accomplish this ask?
Is there a better package or tool to efficiently do this?

Not looking for examples at this time, just guidance on the proper tools to use and consider.
The one thing though is that there could potentially be double digit or triple digit files to process quickly. Meaning on a good day, could be just a few files to compare, on a really bad day, could be over 50 and really bad cases could be 100+
Reply
#2
I beleive difflib will allow you to do this.
doc here
Even though you didn't aks for them, examples here.
Reply
#3
Thanks, ill take a look, dont think i have used that before. So def worth looking at.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Correct/proper way to create save files snakes 0 474 Mar-11-2025, 06:58 PM
Last Post: snakes
  Compare folder A and subfolder B and display files that are in folder A but not in su Melcu54 3 1,485 Jan-05-2024, 05:16 PM
Last Post: Pedroski55
  how to save to multiple locations during save cubangt 1 1,304 Oct-23-2023, 10:16 PM
Last Post: deanhystad
  change directory of save of python files akbarza 3 3,429 Jul-23-2023, 08:30 AM
Last Post: Gribouillis
  does not save in other path than opened files before icode 3 2,896 Jun-23-2023, 07:25 PM
Last Post: snippsat
  Compare 2 files tslavov 2 1,749 Feb-12-2023, 10:53 AM
Last Post: ibreeden
  python move specific files from source to destination including duplicates mg24 3 1,966 Jan-21-2023, 04:21 AM
Last Post: deanhystad
  remove partial duplicates from csv ledgreve 0 1,652 Dec-12-2022, 04:21 PM
Last Post: ledgreve
  Calculate the sum of the differences inside tuple PUP280 4 2,321 Aug-12-2022, 07:20 PM
Last Post: deanhystad
  Sort Differences in 2.7 and 3.10 Explained dgrunwal 2 2,030 Apr-27-2022, 02:50 AM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020