Python Forum
Need help working with two excel file with different formats - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Need help working with two excel file with different formats (/thread-26114.html)



Need help working with two excel file with different formats - mikey3580 - Apr-21-2020

Like many one here I am new to python and would greatly appreciate any help.
I need some guidance on how to create an excel output off of two different excel files, but the files I am working with have different formats.

Input Files:
File 1 is "Holdings" and has 30 columns - ID column is in Column E, Quantity Column is in Column H

File 2 is "Tax Lot Holdings" and has 20 columns - ID column is in Column U, Quantity Column is in Column D

Output File:
4 columns are needed: ID Column, Holdings Quantity, Tax Lot Holdings Quantity, and Difference between Holdings Quantity and Tax Lot Holdings Quantity.

Where the ID column matches between the two files is where each of the holdings for each fiel would pull in and show a difference. If there is an ID on one file that is not on another, I would expect it to pull in on one side but show zero on other and show a difference in difference column.

Below is what I have code wise but the middle part of how to get different columns to intersect and get and output that has columns I need and output I am lost. If anyone can provide some direction, I would greatly appreciate it as I am just lost. I found code online that shows differences if the format of the files are the same, but that doesn't help me since these files are different formats.

Thank you

import pandas as pd
import numpy as np
import os

##os.getcwd() + '\\' + is for current working directory where python file is saved
df1=pd.read_excel(os.getcwd() + '\\' + 'Holdings.xlsx')
df2=pd.read_excel(os.getcwd() + '\\' + 'Tax Lot Holdings.xlsx')

###NEED CODE TO RECONCILE TWO FILES


df1.to_excel(os.getcwd() + '\\' + 'Excel_diff.xlsx',index=False,header=True)



RE: Need help working with two excel file with different formats - DPaul - Apr-22-2020

For simplicity sake, i propose a giant detour :
a) sort both spreadsheets on ID
b) read the 2 business columns of each one into two 2D python lists
c) Iterate over list 1[id] and seek a match in list 2[id] -> calculate or put zero if not found -> print or store in list 3
d) you'll need a trick to remember the items in list 2 that were omitted, e.g. you could set the ones that matched to -1.
e) just list the ones from list 2 that did not match

All this if i understood your problem correctly :-)
Paul