Python Forum
Python create a spreadsheet with column and row header
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python create a spreadsheet with column and row header
#1
I want to compare three common columns(ID) in three different dataframes(USA, Canada, Mexico) and output if an ID matches in at least two data frames. How can I accomplish this?

So here is some sample data.

Below are the mockup spreadsheets

USA

| ID | COl A |
| -------- | ------- |
| 42345 | Test1 |
| 681593 | Test2 |
| 331574 | Test3 |
| 15786 | Test4 |

Mexico

| ID | COl A |
| -------- | ------- |
| 93512 | Chk1 |
| 681593 | Chk2 |
| 331574 | Chk3 |
| 89153 | Chk4 |


Canada

| ID | COl A |
| -------- | ------- |
| 42345 | Val1 |
| 93512 | Val2 |
| 331574 | Val3 |
| 76543 | Val4 |

Result Table

I want to print an "x" in the Result spreadsheet if more than one country has an ID match.

| ID | USA | Mexico | CANADA|
| -------- | ------- | ------- | ------- |
| 42345 | x | | x |
| 93512 | | x | x |
| 331574 | x | x | x |
| 681593 | x | x | |
Reply
#2
Where are you getting your dataframes from? Are you loading Excel files? From an SQL database?
Reply
#3
(Jul-07-2022, 08:59 AM)Pedroski55 Wrote: Where are you getting your dataframes from? Are you loading Excel files? From an SQL database?

I am loading it from excel files
Reply
#4
see post: https://python-forum.io/thread-37667-pos...#pid159234
Reply
#5
I never use pandas for my puny needs, so I have no experience, but I was interested.

Perhaps the experts here can improve this! I'd like to know how to do this better!

I think this does what you want.

For small amounts of data, I would do this with openpyxl, I understand that better!

import pandas as pd
import glob

savepath = '/home/pedro/myPython/pandas/xl_files/'
saveoutputpath = '/home/pedro/myPython/pandas/outputxl/'
# I put an extra column in each excel file named country with either Canada, Mexico or USA
filelist = glob.glob(savepath + "*.xlsx")
 
df = pd.concat((pd.read_excel(f) for f in filelist))
#df = df.fillna('') # replace nan
#df_usa = pd.read_excel(csv_file)

# change the indexes to get unique indexes
# df.index.size gets how many indexes there are
indexes = []
for i in range(df.index.size):
    indexes.append(i)
# now change the indexes pass a list to df.index
# never good to have 2 indexes the same
df.index = indexes

# this gets the dictionary ok
mydict = {i:[df.iloc[i, 0], df.iloc[i, 1], df.iloc[i,2]] for i in indexes}
print(len(mydict))
# get a list of the ids
mylist = df["id"].values.tolist()
# get a set of the unique ids
myset = set(mylist)

#create new DataFrame
df_out = pd.DataFrame(columns=['id', 'Canada', 'Mexico', 'USA'], index=range(0, len(myset)))

# make a list of unique ids
id_names = list(myset)
id_names.sort()
# populate the id column with id_names
df_out["id"] = id_names
# see how many rows and columns
print(df_out.shape)

# mydict[key][0] is the id column , mydict[key][2]]is the country
for key in mydict.keys():    
    df_out.loc[df_out["id"] == mydict[key][0], mydict[key][2]] = "X"

# put zeroes in all cells with NaN
df_out.fillna(0)
df_out.to_excel(saveoutputpath + 'my_saved_data.xlsx', sheet_name='mydata', index=True) 
df_out.fillna(0) seems to work in the Python shell, but is not saved to Excel, I don't know why!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Returning Column and Row Data From Spreadsheet knight2000 0 449 Oct-22-2023, 07:07 AM
Last Post: knight2000
  Editing spreadsheet/csv BSDevo 6 1,043 Sep-01-2023, 05:47 PM
Last Post: BSDevo
  Looking to automate updating a spreadsheet with image from email cubangt 2 983 Feb-14-2023, 03:43 PM
Last Post: cubangt
  Import XML file directly into Excel spreadsheet demdej 0 859 Jan-24-2023, 02:48 PM
Last Post: demdej
  create new column based on condition arvin 12 2,255 Dec-13-2022, 04:53 PM
Last Post: jefsummers
  python pandas sql table with header mg24 3 1,971 Dec-08-2022, 08:31 PM
Last Post: Larz60+
  updating Google spreadsheet with gspread mgallotti 0 1,101 Sep-30-2022, 11:26 PM
Last Post: mgallotti
  Modify LibreOffice's ods spreadsheet Pavel_47 0 1,189 Jul-13-2022, 11:28 AM
Last Post: Pavel_47
  Reshaping a single column in to multiple column using Python sahar 7 2,077 Jun-20-2022, 12:35 PM
Last Post: deanhystad
  Insert a multiple constant value after header in csv file using python shantanu97 1 1,159 Apr-24-2022, 10:04 AM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020