Python Forum
Compare two large CSV files for a match
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compare two large CSV files for a match
#1
Hello, I am very new to python. trying to solve below issue.

We have two .csv files.

For example:
File Master: Column_A Column_B Column_C ..... Column Z
123 XYZ Z 1X
234 PQR Y 2X

File New: Column_C Column_A Colum_B
X 001 PQR
Y 123 XYZ
Y 234 PQR

Each file has similar data but not in the same order in terms of columns or rows. When there is a match between Master file and New file, Master file needs an update by adding new column and populate with Match or No Match. And also add weights, for example if ALL columns & values are matching then 1, partial match then 0.5 else 0

These files are large running into several GBs.

Please help!
Reply
#2
Some clarification is needed here. What counts as a partial match? Matching one column? Matching two columns? You say that when there's a match, the master needs to be updated with match or no match. Why would you ever update with no match if their is a match? Or do you want to add a column to the master file for every row in the new file with the degree of matchiness?

Also, what have you tried? We're not big on writing code for people here, but we would be happy to help you fix your code when you run into problems. When you do run into problems, please post your code in Python tags, and clearly explain the problem you are having, including the full text of any errors.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
(Apr-22-2019, 05:40 PM)ichabod801 Wrote: Some clarification is needed here. What counts as a partial match? Matching one column? Matching two columns? You say that when there's a match, the master needs to be updated with match or no match. Why would you ever update with no match if their is a match? Or do you want to add a column to the master file for every row in the new file with the degree of matchiness?

Also, what have you tried? We're not big on writing code for people here, but we would be happy to help you fix your code when you run into problems. When you do run into problems, please post your code in Python tags, and clearly explain the problem you are having, including the full text of any errors.

Thank You for your reply.

At least one column value match would be considered partial match, If ALL columns match then Full Match.
Ideally there would be two new columns (Match & Weight) in the master file. When there is match, it will display Match and then its weight. Hope it makes sense.

Here is the link I have found with code and trying to modify it for my needs. it is giving me all sorts of errors.

https://python-forum.io/Thread-How-to-co...+two+files
Reply
#4
Then post the code with the modifications you made, and the full text of any errors you got, as I described in my previous post.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Compare folder A and subfolder B and display files that are in folder A but not in su Melcu54 3 445 Jan-05-2024, 05:16 PM
Last Post: Pedroski55
  Move Files based on partial Match mohamedsalih12 2 713 Sep-20-2023, 07:38 PM
Last Post: snippsat
  Compare 2 files tslavov 2 892 Feb-12-2023, 10:53 AM
Last Post: ibreeden
  Compare fields from two csv files georgebijum 3 1,322 Apr-25-2022, 11:16 PM
Last Post: Pedroski55
  Compare filename with folder name and copy matching files into a particular folder shantanu97 2 4,366 Dec-18-2021, 09:32 PM
Last Post: Larz60+
  Open and read multiple text files and match words kozaizsvemira 3 6,639 Jul-07-2021, 11:27 AM
Last Post: Larz60+
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 5,675 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Iterating Large Files Robotguy 10 4,991 Jul-22-2020, 09:13 PM
Last Post: Gribouillis
  Look for match in two files and print out in the first file Batistuta 0 1,549 Mar-03-2020, 02:27 PM
Last Post: Batistuta
  Handling Large XML Files (>10GB) in Python onlydibs 1 4,088 Dec-22-2019, 05:46 AM
Last Post: Clunk_Head

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020