Hi Team,
I have 20 xlsb files, my task to combine all xlsb files data into single csv files , with Pipe delimiter
later this CSV I need to import it into SQL Table.
Challenges
18 xlsb files contains 30 headers , all header are same sequence
2 xlsb files contains 31 headers , all header are same , but one extra header for last column
I wanted to add one extra header in 18 xlsb header file
to make it 31 same headers. Extra header name I want to give word 'IFRS'
Expected output
In my output csv files ,
I want 31 header , extra header name is IFRS.
my attempted code is as below.
I have 20 xlsb files, my task to combine all xlsb files data into single csv files , with Pipe delimiter
later this CSV I need to import it into SQL Table.
Challenges
18 xlsb files contains 30 headers , all header are same sequence
2 xlsb files contains 31 headers , all header are same , but one extra header for last column
I wanted to add one extra header in 18 xlsb header file
to make it 31 same headers. Extra header name I want to give word 'IFRS'
Expected output
In my output csv files ,
I want 31 header , extra header name is IFRS.
my attempted code is as below.
<python from pathlib import Path import pandas as pd foldername = "your_folder_name" output_folder = "output_folder" # Define the folder where you want to save the output CSV files extra_header = "Extra_Header" columns_to_fill = ['year1', 'year2', 'year3', 'year4', 'year5', 'Total'] files_combine = Path(f"{foldername}").rglob("*.xlsx") header_written = False for input_xls in files_combine: # Attempt to read from the 'Data' sheet; if it doesn't exist, read from the active sheet try: df = pd.read_excel(input_xls, sheet_name='Data') except Exception: df = pd.read_excel(input_xls, sheet_name=None, engine='openpyxl') # Read all sheets using openpyxl # Select the first non-empty sheet as the active sheet for sheet_name, sheet_data in df.items(): if not sheet_data.empty: df = sheet_data break # Check if "IFRS" column already exists, and if not, add it if 'IFRS' not in df.columns: df['IFRS'] = extra_header # Replace blank cells in the specified columns with 0 df[columns_to_fill] = df[columns_to_fill].fillna(0) # Extract the existing headers from the first row existing_headers = list(df.columns) # Add the extra header "IFRS" to the list of existing headers existing_headers.append(extra_header) # Update the DataFrame columns with the new list of headers df.columns = existing_headers # Insert an additional column for 'FileName' in the DataFrame df.insert(0, 'FileName', input_xls.name) # Define the output CSV file name based on the input file name output_csv_file = f"{output_folder}/{input_xls.stem}_combined.csv" # Write the headers only if header_written is False if not header_written: df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=True, mode='w') header_written = True else: # Append the workbook data to the CSV file without writing headers df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=False, mode='a') /python>