Python create a spreadsheet with column and row header - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Python create a spreadsheet with column and row header (/thread-37658.html) |
Python create a spreadsheet with column and row header - ouruslife - Jul-06-2022 I want to compare three common columns(ID) in three different dataframes(USA, Canada, Mexico) and output if an ID matches in at least two data frames. How can I accomplish this? So here is some sample data. Below are the mockup spreadsheets USA | ID | COl A | | -------- | ------- | | 42345 | Test1 | | 681593 | Test2 | | 331574 | Test3 | | 15786 | Test4 | Mexico | ID | COl A | | -------- | ------- | | 93512 | Chk1 | | 681593 | Chk2 | | 331574 | Chk3 | | 89153 | Chk4 | Canada | ID | COl A | | -------- | ------- | | 42345 | Val1 | | 93512 | Val2 | | 331574 | Val3 | | 76543 | Val4 | Result Table I want to print an "x" in the Result spreadsheet if more than one country has an ID match. | ID | USA | Mexico | CANADA| | -------- | ------- | ------- | ------- | | 42345 | x | | x | | 93512 | | x | x | | 331574 | x | x | x | | 681593 | x | x | | RE: Python create a spreadsheet with column and row header - Pedroski55 - Jul-07-2022 Where are you getting your dataframes from? Are you loading Excel files? From an SQL database? RE: Python create a spreadsheet with column and row header - ouruslife - Jul-07-2022 (Jul-07-2022, 08:59 AM)Pedroski55 Wrote: Where are you getting your dataframes from? Are you loading Excel files? From an SQL database? I am loading it from excel files RE: Python create a spreadsheet with column and row header - Larz60+ - Jul-07-2022 see post: https://python-forum.io/thread-37667-post-159234.html#pid159234 RE: Python create a spreadsheet with column and row header - Pedroski55 - Jul-09-2022 I never use pandas for my puny needs, so I have no experience, but I was interested. Perhaps the experts here can improve this! I'd like to know how to do this better! I think this does what you want. For small amounts of data, I would do this with openpyxl, I understand that better! import pandas as pd import glob savepath = '/home/pedro/myPython/pandas/xl_files/' saveoutputpath = '/home/pedro/myPython/pandas/outputxl/' # I put an extra column in each excel file named country with either Canada, Mexico or USA filelist = glob.glob(savepath + "*.xlsx") df = pd.concat((pd.read_excel(f) for f in filelist)) #df = df.fillna('') # replace nan #df_usa = pd.read_excel(csv_file) # change the indexes to get unique indexes # df.index.size gets how many indexes there are indexes = [] for i in range(df.index.size): indexes.append(i) # now change the indexes pass a list to df.index # never good to have 2 indexes the same df.index = indexes # this gets the dictionary ok mydict = {i:[df.iloc[i, 0], df.iloc[i, 1], df.iloc[i,2]] for i in indexes} print(len(mydict)) # get a list of the ids mylist = df["id"].values.tolist() # get a set of the unique ids myset = set(mylist) #create new DataFrame df_out = pd.DataFrame(columns=['id', 'Canada', 'Mexico', 'USA'], index=range(0, len(myset))) # make a list of unique ids id_names = list(myset) id_names.sort() # populate the id column with id_names df_out["id"] = id_names # see how many rows and columns print(df_out.shape) # mydict[key][0] is the id column , mydict[key][2]]is the country for key in mydict.keys(): df_out.loc[df_out["id"] == mydict[key][0], mydict[key][2]] = "X" # put zeroes in all cells with NaN df_out.fillna(0) df_out.to_excel(saveoutputpath + 'my_saved_data.xlsx', sheet_name='mydata', index=True)df_out.fillna(0) seems to work in the Python shell, but is not saved to Excel, I don't know why! |