Python and pandas: Aggregate lines form Excel sheet

Pedroski55 · (This post was last modified: Oct-14-2023, 06:46 AM by Pedroski55.)

Please have a look to see if this starts what you need correctly.

I would save the output to a new Excel file. I added 2 columns:
1. How many rows joined here
2. Next source row to process

The purpose of 2. is, if you later return with a longer version of the source file, you know which row to start at.

Using the Excel you posted with 104 rows, the way I see it, you need 9 rows for different permutations of deposit, 4 rows for different permutations of withdraw and 4 rows for different permutations of order, if I understand what you want correctly.

Is that correct so far?

If this is correct, I can add similar rows, that is, rows having the same permutation of the important columns easily.

import openpyxl
# import os
# these 4 can help you format cells in openpyxl
# check the documentation online
from openpyxl.styles import PatternFill
from openpyxl.styles import Font
from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment

# set the paths
path2file = '/home/pedro/myPython/openpyxl/xlsx_files/Beispieltabelle.xlsx'
savepath = '/home/pedro/myPython/openpyxl/xlsx_files/condensed_Beispieltabelle.xlsx'
# get the source XL
sourceFile = openpyxl.load_workbook(path2file)
sourceFilesheets = sourceFile.sheetnames
# show the available sheets
for sheet in sourceFilesheets:
    print('These are the sheets:', sheet)
# get the sheet you want to process
mysheet = input('What sheet do you want to process? Copy and paste the sheet name ... ')
# get the maximum row and column numbers for later loops
smaxRow = sourceFile[mysheet].max_row
smaxCol = sourceFile[mysheet].max_column
# each entry in column 2 type is either deposit, withdraw or order
# deposit may have different boughtCurrency, col 4, boughtCurrencyId, col 5, soldCurrency, col 7, feeCurrency, col 10, classification, col 12,
# location, col 13, area, col 14 (and of course boughtCurrencyId, col 5, txId, col 15 and id, col 16, and comment, col 17)
# collect those different permutations
type_dict = {'deposit': [], 'withdraw': [], 'order': []}

for rowNum in range(2, smaxRow + 1):
    mylist = [sourceFile[mysheet].cell(row=rowNum, column=t).value for t in [4, 7, 10, 12, 13, 14]]
    # no doubles
    if not mylist in type_dict[sourceFile[mysheet].cell(row=rowNum, column=2).value]:
        type_dict[sourceFile[mysheet].cell(row=rowNum, column=2).value].append(mylist)
    
# these are the different permutations of the types deposit, withdraw and order
for key in type_dict.keys():
    print(key, type_dict[key])
    print('This list is', len(type_dict[key]), 'long')

"""
deposit [['Stone', None, 'Stone', 'mined', 'quarry', 'Kingstown'], ['Copper', None, None, 'add', 'quarry', 'Kingstown'], ['Iron', None, None, None, 'quarry', 'Kingstown'], ['Iron', None, None, 'mined', 'quarry', 'Kingstown'], ['Stone', None, None, 'internal', 'quarry', 'Kingstown'], ['Stone', None, None, 'internal', 'warehouse', 'Kingstown'], ['wood', None, None, None, 'warehouse', 'Kingstown'], ['gold', None, None, 'rent_income', 'tenement houses', 'Belize'], ['gold', None, None, 'rent_income', 'tenement houses', 'Kingstown']]
This list is 9 long
withdraw [[None, 'Stone', 'Stone', 'payment', 'quarry', 'Kingstown'], [None, 'Stone', 'Stone', 'remove', 'quarry', 'Kingstown'], [None, 'Stone', 'Stone', 'internal', 'quarry', 'Kingstown'], [None, 'Stone', 'Stone', 'internal', 'warehouse', 'Kingstown']]
This list is 4 long
order [['Stone', 'Copper', 'Copper', None, 'quarry', 'Kingstown'], ['Iron', 'Stone', 'Copper', None, 'quarry', 'Kingstown'], ['wool', 'silver', 'gold', None, 'market', 'Kingstown'], ['VAI', 'silver', 'gold', None, 'market', 'Kingstown']]
This list is 4 long

"""
# The above are the different combinations of type, boughtCurrency or soldCurrency, classification, location and area, 
# of course there may be more when the XL file is much longer
# at the moment you need 9 amalgamated rows for deposit and 4 amalgamated rows for withdraw and order
# make a new XL to take the results of processing
targetFile = openpyxl.Workbook()
# a new workbook only has 1 sheet called Sheet
# give the sheet a new name
tsheet = targetFile.active
# rename the sheet
tsheet.title = mysheet + '_condensed data'
targetFilesheets = targetFile.sheetnames
# for some reason I get an error here: the sheet is recognised as <Worksheet "condensed data">  not just "condensed data"
# so assign sheet directly
tsheet = targetFilesheets[0]
# save the basic file or reload the existing file
targetFile.save(savepath)
### reload the file because I added a column: How many rows joined here
### targetFile = openpyxl.load_workbook(savepath)
##tmaxRow = targetFile[sheet].max_row
##tmaxCol = targetFile[sheet].max_column
# get a list of tuples of format (column number, column name)
# will be useful later
column_numbers_names = []
count = 1
for colNum in range(1, smaxCol + 1):
    # write the column headers to the target file
    data1 = sourceFile[mysheet].cell(row=1, column=colNum).value
    # get 1 row of data from the sourceFile
    data2 = sourceFile[mysheet].cell(row=2, column=colNum).value
    # save the column_numbers_names for later
    column_numbers_names.append((count, data1))
    # put the headers in the target file
    targetFile[tsheet].cell(row=1, column=colNum).value=data1
    # put the first row of data in the targetFile
    targetFile[tsheet].cell(row=2, column=colNum).value=data2
    count +=1

# have a look at column_numbers_names any time you need to
# just run showColumn_data()
def showColumn_data():
    for tup in column_numbers_names:
        print(tup)

showColumn_data()
# save the basic file or reload the existing file
targetFile.save(savepath)

The above does not add similar rows yet. I would first like to know if I have understood what you want correctly.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	New on python. Needs help with Google sheet	jeromep	1	303	Apr-25-2024, 06:47 PM Last Post: deanhystad
	What are these python lines for? What are tey doing?	Led_Zeppelin	7	1,681	Feb-13-2023, 03:08 PM Last Post: deanhystad
	how to read txt file, and write into excel with multiply sheet	jacklee26	14	10,502	Jan-21-2023, 06:57 AM Last Post: jacklee26
	Start Putty into Python Form	Schlazen	5	5,594	Dec-13-2022, 06:28 AM Last Post: divya130
	Need Help! Pandas EXCEL PIVOT	psb3958	1	985	Nov-13-2022, 10:37 PM Last Post: deanhystad
	export into excel, how to implement pandas into for-loop	deneme2	6	2,605	Sep-01-2022, 05:44 AM Last Post: deneme2
	Help with Integration Pandas excel - Python	Gegemendes	5	1,906	Jun-05-2022, 09:46 PM Last Post: Gegemendes
	df column aggregate and group by multiple columns	SriRajesh	0	1,090	May-06-2022, 02:26 PM Last Post: SriRajesh
	Reading Excel file and use a wildcard in file name and sheet name	randolphoralph	6	7,362	Jan-13-2022, 10:20 PM Last Post: randolphoralph
	Problem in saving .xlsm (excel) file using pandas dataframe in python	shantanu97	2	4,433	Aug-29-2021, 12:39 PM Last Post: snippsat

Python and pandas: Aggregate lines form Excel sheet

User Panel Messages

Announcements