Python Forum
Python create a spreadsheet with column and row header - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Python create a spreadsheet with column and row header (/thread-37658.html)



Python create a spreadsheet with column and row header - ouruslife - Jul-06-2022

I want to compare three common columns(ID) in three different dataframes(USA, Canada, Mexico) and output if an ID matches in at least two data frames. How can I accomplish this?

So here is some sample data.

Below are the mockup spreadsheets

USA

| ID | COl A |
| -------- | ------- |
| 42345 | Test1 |
| 681593 | Test2 |
| 331574 | Test3 |
| 15786 | Test4 |

Mexico

| ID | COl A |
| -------- | ------- |
| 93512 | Chk1 |
| 681593 | Chk2 |
| 331574 | Chk3 |
| 89153 | Chk4 |


Canada

| ID | COl A |
| -------- | ------- |
| 42345 | Val1 |
| 93512 | Val2 |
| 331574 | Val3 |
| 76543 | Val4 |

Result Table

I want to print an "x" in the Result spreadsheet if more than one country has an ID match.

| ID | USA | Mexico | CANADA|
| -------- | ------- | ------- | ------- |
| 42345 | x | | x |
| 93512 | | x | x |
| 331574 | x | x | x |
| 681593 | x | x | |


RE: Python create a spreadsheet with column and row header - Pedroski55 - Jul-07-2022

Where are you getting your dataframes from? Are you loading Excel files? From an SQL database?


RE: Python create a spreadsheet with column and row header - ouruslife - Jul-07-2022

(Jul-07-2022, 08:59 AM)Pedroski55 Wrote: Where are you getting your dataframes from? Are you loading Excel files? From an SQL database?

I am loading it from excel files


RE: Python create a spreadsheet with column and row header - Larz60+ - Jul-07-2022

see post: https://python-forum.io/thread-37667-post-159234.html#pid159234


RE: Python create a spreadsheet with column and row header - Pedroski55 - Jul-09-2022

I never use pandas for my puny needs, so I have no experience, but I was interested.

Perhaps the experts here can improve this! I'd like to know how to do this better!

I think this does what you want.

For small amounts of data, I would do this with openpyxl, I understand that better!

import pandas as pd
import glob

savepath = '/home/pedro/myPython/pandas/xl_files/'
saveoutputpath = '/home/pedro/myPython/pandas/outputxl/'
# I put an extra column in each excel file named country with either Canada, Mexico or USA
filelist = glob.glob(savepath + "*.xlsx")
 
df = pd.concat((pd.read_excel(f) for f in filelist))
#df = df.fillna('') # replace nan
#df_usa = pd.read_excel(csv_file)

# change the indexes to get unique indexes
# df.index.size gets how many indexes there are
indexes = []
for i in range(df.index.size):
    indexes.append(i)
# now change the indexes pass a list to df.index
# never good to have 2 indexes the same
df.index = indexes

# this gets the dictionary ok
mydict = {i:[df.iloc[i, 0], df.iloc[i, 1], df.iloc[i,2]] for i in indexes}
print(len(mydict))
# get a list of the ids
mylist = df["id"].values.tolist()
# get a set of the unique ids
myset = set(mylist)

#create new DataFrame
df_out = pd.DataFrame(columns=['id', 'Canada', 'Mexico', 'USA'], index=range(0, len(myset)))

# make a list of unique ids
id_names = list(myset)
id_names.sort()
# populate the id column with id_names
df_out["id"] = id_names
# see how many rows and columns
print(df_out.shape)

# mydict[key][0] is the id column , mydict[key][2]]is the country
for key in mydict.keys():    
    df_out.loc[df_out["id"] == mydict[key][0], mydict[key][2]] = "X"

# put zeroes in all cells with NaN
df_out.fillna(0)
df_out.to_excel(saveoutputpath + 'my_saved_data.xlsx', sheet_name='mydata', index=True) 
df_out.fillna(0) seems to work in the Python shell, but is not saved to Excel, I don't know why!