Python Forum
Compare two large CSV files for a match - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Compare two large CSV files for a match (/thread-17749.html)



Compare two large CSV files for a match - Python_Newbie9 - Apr-22-2019

Hello, I am very new to python. trying to solve below issue.

We have two .csv files.

For example:
File Master: Column_A Column_B Column_C ..... Column Z
123 XYZ Z 1X
234 PQR Y 2X

File New: Column_C Column_A Colum_B
X 001 PQR
Y 123 XYZ
Y 234 PQR

Each file has similar data but not in the same order in terms of columns or rows. When there is a match between Master file and New file, Master file needs an update by adding new column and populate with Match or No Match. And also add weights, for example if ALL columns & values are matching then 1, partial match then 0.5 else 0

These files are large running into several GBs.

Please help!


RE: Compare two large CSV files for a match - ichabod801 - Apr-22-2019

Some clarification is needed here. What counts as a partial match? Matching one column? Matching two columns? You say that when there's a match, the master needs to be updated with match or no match. Why would you ever update with no match if their is a match? Or do you want to add a column to the master file for every row in the new file with the degree of matchiness?

Also, what have you tried? We're not big on writing code for people here, but we would be happy to help you fix your code when you run into problems. When you do run into problems, please post your code in Python tags, and clearly explain the problem you are having, including the full text of any errors.


RE: Compare two large CSV files for a match - Python_Newbie9 - Apr-22-2019

(Apr-22-2019, 05:40 PM)ichabod801 Wrote: Some clarification is needed here. What counts as a partial match? Matching one column? Matching two columns? You say that when there's a match, the master needs to be updated with match or no match. Why would you ever update with no match if their is a match? Or do you want to add a column to the master file for every row in the new file with the degree of matchiness?

Also, what have you tried? We're not big on writing code for people here, but we would be happy to help you fix your code when you run into problems. When you do run into problems, please post your code in Python tags, and clearly explain the problem you are having, including the full text of any errors.

Thank You for your reply.

At least one column value match would be considered partial match, If ALL columns match then Full Match.
Ideally there would be two new columns (Match & Weight) in the master file. When there is match, it will display Match and then its weight. Hope it makes sense.

Here is the link I have found with code and trying to modify it for my needs. it is giving me all sorts of errors.

https://python-forum.io/Thread-How-to-compare-two-files-and-Display-different-results-for-text-and-for-INT?highlight=compare+two+files


RE: Compare two large CSV files for a match - ichabod801 - Apr-22-2019

Then post the code with the modifications you made, and the full text of any errors you got, as I described in my previous post.