merge all xlsb files into csv

mg24 · (This post was last modified: Nov-13-2023, 08:25 AM by mg24.)

Hi Team,

I have 20 xlsb files, my task to combine all xlsb files data into single csv files , with Pipe delimiter
later this CSV I need to import it into SQL Table.

Challenges
18 xlsb files contains 30 headers , all header are same sequence
2 xlsb files contains 31 headers , all header are same , but one extra header for last column

I wanted to add one extra header in 18 xlsb header file
to make it 31 same headers. Extra header name I want to give word 'IFRS'

Expected output
In my output csv files ,

I want 31 header , extra header name is IFRS.

my attempted code is as below.

<python
from pathlib import Path
import pandas as pd

foldername = "your_folder_name"
output_folder = "output_folder"  # Define the folder where you want to save the output CSV files
extra_header = "Extra_Header"
columns_to_fill = ['year1', 'year2', 'year3', 'year4', 'year5', 'Total']

files_combine = Path(f"{foldername}").rglob("*.xlsx")

header_written = False

for input_xls in files_combine:
    # Attempt to read from the 'Data' sheet; if it doesn't exist, read from the active sheet
    try:
        df = pd.read_excel(input_xls, sheet_name='Data')
    except Exception:
        df = pd.read_excel(input_xls, sheet_name=None, engine='openpyxl')  # Read all sheets using openpyxl
        
        # Select the first non-empty sheet as the active sheet
        for sheet_name, sheet_data in df.items():
            if not sheet_data.empty:
                df = sheet_data
                break
    
    # Check if "IFRS" column already exists, and if not, add it
    if 'IFRS' not in df.columns:
        df['IFRS'] = extra_header
    
    # Replace blank cells in the specified columns with 0
    df[columns_to_fill] = df[columns_to_fill].fillna(0)
    
    # Extract the existing headers from the first row
    existing_headers = list(df.columns)
    
    # Add the extra header "IFRS" to the list of existing headers
    existing_headers.append(extra_header)
    
    # Update the DataFrame columns with the new list of headers
    df.columns = existing_headers
    
    # Insert an additional column for 'FileName' in the DataFrame
    df.insert(0, 'FileName', input_xls.name)
    
    # Define the output CSV file name based on the input file name
    output_csv_file = f"{output_folder}/{input_xls.stem}_combined.csv"
    
    # Write the headers only if header_written is False
    if not header_written:
        df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=True, mode='w')
        header_written = True
    else:
        # Append the workbook data to the CSV file without writing headers
        df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=False, mode='a')
/python>

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Merge htm files with shutil library (TypeError: 'module' object is not callable)	Melcu54	7	3,474	Mar-09-2025, 04:25 PM Last Post: Pedroski55
	I'm trying to merge 2 .csv files with no joy!	Sick_Stigma	3	1,000	Aug-03-2024, 03:20 PM Last Post: mariadsouza362
	merge two csv files into output.csv using Subprocess	mg24	9	3,626	Dec-11-2022, 09:58 PM Last Post: Larz60+
	Merge all json files in folder after filtering	deneme2	10	4,637	Sep-18-2022, 10:32 AM Last Post: deneme2
	How to merge all the files in a directory in to one file	sutra	3	3,499	Dec-10-2020, 12:09 AM Last Post: sutra
	Merge JSON Files	Ugo	4	5,919	Aug-20-2020, 06:25 AM Last Post: ndc85430
	How to read multiple csv files and merge data	rajeshE	0	2,574	Mar-28-2020, 04:01 PM Last Post: rajeshE
	error merge text files	ledgreve	3	3,491	Nov-18-2019, 12:41 PM Last Post: DeaD_EyE
	merge files	AGC	4	5,034	Oct-04-2017, 08:54 PM Last Post: AGC

merge all xlsb files into csv

User Panel Messages

Announcements