Jul-01-2020, 04:56 PM
Good day
I've been attempting to group rows in an excel spreadsheet, the data is a bit of mess and I need to tidy it up.
The idea is to link all the 'A numbers' (column A) by their common alternate numbers columns G to M so that I can create one 'master A' number that links to several common part numbers.
the difficulty is that the common numbers are spread across columns and not every A number has the same common number but is interlinked with another common number in the columns.
example of completed (highlighted colours are 'common'):
A01090102299 W Y 0 0 TX0255 FDB55
A01090200456 A N 0 38 D3522 FDB554 ATE519
A01090200140 A N 0 10 D3078 FDB554 ATE519
A01090102409 W Y 0 0 TX0369 FDB554 D3078 ATE519 D3060/D3522
A01090400350 A N 2 206 LP2711 FDB554 D3078 D3522 ATE519 FDB1854 P83092
A01090400423 A N 13 18 LP989 FDB554 D3078 D3522 LP2711
A01090100489 A N 0 6 ATE519 FDB554 D3078 TX0369
A01090500073 W N 0 0 BD1108OE FDB554 D3078
A01090500071 A N 0 30 BD1108 FDB554 D3522 D3078 D3522 FDB1854
A01090100145 A N 0 13 098650A184 FDB554 D3522 D3078
A01090500072 W N 0 0 BD1108CE FDB554
A01090101813 A N 0 258 FDB554 BD1108 D3078 D3522 FDB1854
A01090200455 A N 0 19 D3522 ATE519 FDB554 D3522T 04465-YZZE9
A01090101814 A N 0 0 FDB554Q BD1108Q D3078 FVR1884
A01090500074 W N 0 0 BD1108Q FDB554Q D3078 FVR1884
A01090100099 A N 0 1 04465-YZZE9
A01090103032 W Y 0 0 TX1078 FDB1854
I'm trying the duplicated method to no avail also my data keeps coming up with NaN:
I've been attempting to group rows in an excel spreadsheet, the data is a bit of mess and I need to tidy it up.
The idea is to link all the 'A numbers' (column A) by their common alternate numbers columns G to M so that I can create one 'master A' number that links to several common part numbers.
the difficulty is that the common numbers are spread across columns and not every A number has the same common number but is interlinked with another common number in the columns.
example of completed (highlighted colours are 'common'):
A01090102299 W Y 0 0 TX0255 FDB55
A01090200456 A N 0 38 D3522 FDB554 ATE519
A01090200140 A N 0 10 D3078 FDB554 ATE519
A01090102409 W Y 0 0 TX0369 FDB554 D3078 ATE519 D3060/D3522
A01090400350 A N 2 206 LP2711 FDB554 D3078 D3522 ATE519 FDB1854 P83092
A01090400423 A N 13 18 LP989 FDB554 D3078 D3522 LP2711
A01090100489 A N 0 6 ATE519 FDB554 D3078 TX0369
A01090500073 W N 0 0 BD1108OE FDB554 D3078
A01090500071 A N 0 30 BD1108 FDB554 D3522 D3078 D3522 FDB1854
A01090100145 A N 0 13 098650A184 FDB554 D3522 D3078
A01090500072 W N 0 0 BD1108CE FDB554
A01090101813 A N 0 258 FDB554 BD1108 D3078 D3522 FDB1854
A01090200455 A N 0 19 D3522 ATE519 FDB554 D3522T 04465-YZZE9
A01090101814 A N 0 0 FDB554Q BD1108Q D3078 FVR1884
A01090500074 W N 0 0 BD1108Q FDB554Q D3078 FVR1884
A01090100099 A N 0 1 04465-YZZE9
A01090103032 W Y 0 0 TX1078 FDB1854
I'm trying the duplicated method to no avail also my data keeps coming up with NaN:
import pandas as pd df = pd.read_excel(r'/home/rus/DAR/Sheet_complete_1.xlsx', sheet_name='Brake Pad') def main(): duplicateRowsDF = df[df.duplicated(['Manufacturers Product Code', 'ALT 1','ALT 2','ALT 3','ALT 4','ALT 5','ALT 6'])] print(duplicateRowsDF) duplicateRowsDF.to_excel(r'/home/rus/DAR/Sheet_complete_pandas.xlsx', index=False) if __name__ == '__main__': main()