Apr-23-2022, 04:19 AM
(This post was last modified: Apr-23-2022, 04:19 AM by georgebijum.)
I am quite new to the Python Programming and would appreciate your help on the below.
Use case - I have Customer files from two sources .Each of them are huge and is having around 1 million records.Need to compare only selected attributes and if varying need to print data from both files to a log file.Looking for something similar to database joins with where clause.What is the best Python approach in achieving this as row by row processing doesn't look to be an ideal approach ? Example - In the below CustomerID is the joining key
File-CustomerDetails1
CustomerID Active Country Industry Colunn5 Colunn6 Colunn7
1 Y SGP BNK
2 Y SGP MFG
3 N SGP BNK
File-CustomerDetails2
CustomerID Active Country Industry Colunn5 Colunn6 Colunn7
1 Y SGP BNK
2 N SGP MFG
3 N SGP BNK
4 Y JPY BNK
Expected Output in an Excel
Tab#1 Get all the Customer records that have Matching values for Active, Country ,Industry -- Output should have all the attribs from Customer file
Tab#2 Get all the Customer records that have varying values in of Active, Country ,Industry fields -- Output should capture the values from both files side by side
ie.How to simulate an sql join like --
select t1.* from t1 join t2 on t1. CustomerID=t2.CustomerID and ( t1. Active <> t2.Active OR t1. Country <> t2.Country )
Looking mainly for the best approach only as the row by row processing seems expensive for 1 Million records
Thanks in Advance
Use case - I have Customer files from two sources .Each of them are huge and is having around 1 million records.Need to compare only selected attributes and if varying need to print data from both files to a log file.Looking for something similar to database joins with where clause.What is the best Python approach in achieving this as row by row processing doesn't look to be an ideal approach ? Example - In the below CustomerID is the joining key
File-CustomerDetails1
CustomerID Active Country Industry Colunn5 Colunn6 Colunn7
1 Y SGP BNK
2 Y SGP MFG
3 N SGP BNK
File-CustomerDetails2
CustomerID Active Country Industry Colunn5 Colunn6 Colunn7
1 Y SGP BNK
2 N SGP MFG
3 N SGP BNK
4 Y JPY BNK
Expected Output in an Excel
Tab#1 Get all the Customer records that have Matching values for Active, Country ,Industry -- Output should have all the attribs from Customer file
Tab#2 Get all the Customer records that have varying values in of Active, Country ,Industry fields -- Output should capture the values from both files side by side
ie.How to simulate an sql join like --
select t1.* from t1 join t2 on t1. CustomerID=t2.CustomerID and ( t1. Active <> t2.Active OR t1. Country <> t2.Country )
Looking mainly for the best approach only as the row by row processing seems expensive for 1 Million records
Thanks in Advance