Compare 2 files for duplicates and save the differences

cubangt · Sep-12-2024, 02:34 PM

So we have a scenario where we need to compare 2 files and create a 3rd of the duplicates.

I cant provide actual files due to the sensitivity of the data, but here is the situation

Need to compare these 2 files and generate 2 new files for reprocessing.
#1 File Original (Cannot edit or manipulate in anyway)
#2 File Original but was accidently updated and now contains duplicate records that are mixed in with all records

#3 File Needs to contain only the records that were duplicate in both files (verification purposes)
#4 File Needs to be the clean version with NO duplicates. (This file should be identical , with the possibility of extra records to #1 file) This would then allow us to say that original records exists and there are X number of new records and can be safely reprocessed)

What I'm looking for is direction the best way to accomplish. I have used pandas before, is that the best or easiest way to accomplish this ask?
Is there a better package or tool to efficiently do this?

Not looking for examples at this time, just guidance on the proper tools to use and consider.
The one thing though is that there could potentially be double digit or triple digit files to process quickly. Meaning on a good day, could be just a few files to compare, on a really bad day, could be over 50 and really bad cases could be 100+

**Larz60+** · Sep-12-2024, 03:52 PM

I beleive difflib will allow you to do this.
doc here
Even though you didn't aks for them, examples here.

cubangt · Sep-12-2024, 03:55 PM

Thanks, ill take a look, dont think i have used that before. So def worth looking at.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Correct/proper way to create save files	snakes	0	474	Mar-11-2025, 06:58 PM Last Post: snakes
	Compare folder A and subfolder B and display files that are in folder A but not in su	Melcu54	3	1,485	Jan-05-2024, 05:16 PM Last Post: Pedroski55
	how to save to multiple locations during save	cubangt	1	1,304	Oct-23-2023, 10:16 PM Last Post: deanhystad
	change directory of save of python files	akbarza	3	3,429	Jul-23-2023, 08:30 AM Last Post: Gribouillis
	does not save in other path than opened files before	icode	3	2,896	Jun-23-2023, 07:25 PM Last Post: snippsat
	Compare 2 files	tslavov	2	1,749	Feb-12-2023, 10:53 AM Last Post: ibreeden
	python move specific files from source to destination including duplicates	mg24	3	1,966	Jan-21-2023, 04:21 AM Last Post: deanhystad
	remove partial duplicates from csv	ledgreve	0	1,652	Dec-12-2022, 04:21 PM Last Post: ledgreve
	Calculate the sum of the differences inside tuple	PUP280	4	2,321	Aug-12-2022, 07:20 PM Last Post: deanhystad
	Sort Differences in 2.7 and 3.10 Explained	dgrunwal	2	2,030	Apr-27-2022, 02:50 AM Last Post: deanhystad

Compare 2 files for duplicates and save the differences

User Panel Messages

Announcements