Oct-14-2023, 06:46 AM
(This post was last modified: Oct-14-2023, 06:46 AM by Pedroski55.)
Please have a look to see if this starts what you need correctly.
I would save the output to a new Excel file. I added 2 columns:
1. How many rows joined here
2. Next source row to process
The purpose of 2. is, if you later return with a longer version of the source file, you know which row to start at.
Using the Excel you posted with 104 rows, the way I see it, you need 9 rows for different permutations of deposit, 4 rows for different permutations of withdraw and 4 rows for different permutations of order, if I understand what you want correctly.
Is that correct so far?
If this is correct, I can add similar rows, that is, rows having the same permutation of the important columns easily.
I would save the output to a new Excel file. I added 2 columns:
1. How many rows joined here
2. Next source row to process
The purpose of 2. is, if you later return with a longer version of the source file, you know which row to start at.
Using the Excel you posted with 104 rows, the way I see it, you need 9 rows for different permutations of deposit, 4 rows for different permutations of withdraw and 4 rows for different permutations of order, if I understand what you want correctly.
Is that correct so far?
If this is correct, I can add similar rows, that is, rows having the same permutation of the important columns easily.
import openpyxl # import os # these 4 can help you format cells in openpyxl # check the documentation online from openpyxl.styles import PatternFill from openpyxl.styles import Font from openpyxl.utils import get_column_letter from openpyxl.styles import Alignment # set the paths path2file = '/home/pedro/myPython/openpyxl/xlsx_files/Beispieltabelle.xlsx' savepath = '/home/pedro/myPython/openpyxl/xlsx_files/condensed_Beispieltabelle.xlsx' # get the source XL sourceFile = openpyxl.load_workbook(path2file) sourceFilesheets = sourceFile.sheetnames # show the available sheets for sheet in sourceFilesheets: print('These are the sheets:', sheet) # get the sheet you want to process mysheet = input('What sheet do you want to process? Copy and paste the sheet name ... ') # get the maximum row and column numbers for later loops smaxRow = sourceFile[mysheet].max_row smaxCol = sourceFile[mysheet].max_column # each entry in column 2 type is either deposit, withdraw or order # deposit may have different boughtCurrency, col 4, boughtCurrencyId, col 5, soldCurrency, col 7, feeCurrency, col 10, classification, col 12, # location, col 13, area, col 14 (and of course boughtCurrencyId, col 5, txId, col 15 and id, col 16, and comment, col 17) # collect those different permutations type_dict = {'deposit': [], 'withdraw': [], 'order': []} for rowNum in range(2, smaxRow + 1): mylist = [sourceFile[mysheet].cell(row=rowNum, column=t).value for t in [4, 7, 10, 12, 13, 14]] # no doubles if not mylist in type_dict[sourceFile[mysheet].cell(row=rowNum, column=2).value]: type_dict[sourceFile[mysheet].cell(row=rowNum, column=2).value].append(mylist) # these are the different permutations of the types deposit, withdraw and order for key in type_dict.keys(): print(key, type_dict[key]) print('This list is', len(type_dict[key]), 'long') """ deposit [['Stone', None, 'Stone', 'mined', 'quarry', 'Kingstown'], ['Copper', None, None, 'add', 'quarry', 'Kingstown'], ['Iron', None, None, None, 'quarry', 'Kingstown'], ['Iron', None, None, 'mined', 'quarry', 'Kingstown'], ['Stone', None, None, 'internal', 'quarry', 'Kingstown'], ['Stone', None, None, 'internal', 'warehouse', 'Kingstown'], ['wood', None, None, None, 'warehouse', 'Kingstown'], ['gold', None, None, 'rent_income', 'tenement houses', 'Belize'], ['gold', None, None, 'rent_income', 'tenement houses', 'Kingstown']] This list is 9 long withdraw [[None, 'Stone', 'Stone', 'payment', 'quarry', 'Kingstown'], [None, 'Stone', 'Stone', 'remove', 'quarry', 'Kingstown'], [None, 'Stone', 'Stone', 'internal', 'quarry', 'Kingstown'], [None, 'Stone', 'Stone', 'internal', 'warehouse', 'Kingstown']] This list is 4 long order [['Stone', 'Copper', 'Copper', None, 'quarry', 'Kingstown'], ['Iron', 'Stone', 'Copper', None, 'quarry', 'Kingstown'], ['wool', 'silver', 'gold', None, 'market', 'Kingstown'], ['VAI', 'silver', 'gold', None, 'market', 'Kingstown']] This list is 4 long """ # The above are the different combinations of type, boughtCurrency or soldCurrency, classification, location and area, # of course there may be more when the XL file is much longer # at the moment you need 9 amalgamated rows for deposit and 4 amalgamated rows for withdraw and order # make a new XL to take the results of processing targetFile = openpyxl.Workbook() # a new workbook only has 1 sheet called Sheet # give the sheet a new name tsheet = targetFile.active # rename the sheet tsheet.title = mysheet + '_condensed data' targetFilesheets = targetFile.sheetnames # for some reason I get an error here: the sheet is recognised as <Worksheet "condensed data"> not just "condensed data" # so assign sheet directly tsheet = targetFilesheets[0] # save the basic file or reload the existing file targetFile.save(savepath) ### reload the file because I added a column: How many rows joined here ### targetFile = openpyxl.load_workbook(savepath) ##tmaxRow = targetFile[sheet].max_row ##tmaxCol = targetFile[sheet].max_column # get a list of tuples of format (column number, column name) # will be useful later column_numbers_names = [] count = 1 for colNum in range(1, smaxCol + 1): # write the column headers to the target file data1 = sourceFile[mysheet].cell(row=1, column=colNum).value # get 1 row of data from the sourceFile data2 = sourceFile[mysheet].cell(row=2, column=colNum).value # save the column_numbers_names for later column_numbers_names.append((count, data1)) # put the headers in the target file targetFile[tsheet].cell(row=1, column=colNum).value=data1 # put the first row of data in the targetFile targetFile[tsheet].cell(row=2, column=colNum).value=data2 count +=1 # have a look at column_numbers_names any time you need to # just run showColumn_data() def showColumn_data(): for tup in column_numbers_names: print(tup) showColumn_data() # save the basic file or reload the existing file targetFile.save(savepath)The above does not add similar rows yet. I would first like to know if I have understood what you want correctly.