Python Forum
merge all xlsb files into csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
merge all xlsb files into csv
#1
Hi Team,

I have 20 xlsb files, my task to combine all xlsb files data into single csv files , with Pipe delimiter
later this CSV I need to import it into SQL Table.


Challenges
18 xlsb files contains 30 headers , all header are same sequence
2 xlsb files contains 31 headers , all header are same , but one extra header for last column

I wanted to add one extra header in 18 xlsb header file
to make it 31 same headers. Extra header name I want to give word 'IFRS'


Expected output
In my output csv files ,

I want 31 header , extra header name is IFRS.

my attempted code is as below.

<python
from pathlib import Path
import pandas as pd

foldername = "your_folder_name"
output_folder = "output_folder"  # Define the folder where you want to save the output CSV files
extra_header = "Extra_Header"
columns_to_fill = ['year1', 'year2', 'year3', 'year4', 'year5', 'Total']

files_combine = Path(f"{foldername}").rglob("*.xlsx")

header_written = False

for input_xls in files_combine:
    # Attempt to read from the 'Data' sheet; if it doesn't exist, read from the active sheet
    try:
        df = pd.read_excel(input_xls, sheet_name='Data')
    except Exception:
        df = pd.read_excel(input_xls, sheet_name=None, engine='openpyxl')  # Read all sheets using openpyxl
        
        # Select the first non-empty sheet as the active sheet
        for sheet_name, sheet_data in df.items():
            if not sheet_data.empty:
                df = sheet_data
                break
    
    # Check if "IFRS" column already exists, and if not, add it
    if 'IFRS' not in df.columns:
        df['IFRS'] = extra_header
    
    # Replace blank cells in the specified columns with 0
    df[columns_to_fill] = df[columns_to_fill].fillna(0)
    
    # Extract the existing headers from the first row
    existing_headers = list(df.columns)
    
    # Add the extra header "IFRS" to the list of existing headers
    existing_headers.append(extra_header)
    
    # Update the DataFrame columns with the new list of headers
    df.columns = existing_headers
    
    # Insert an additional column for 'FileName' in the DataFrame
    df.insert(0, 'FileName', input_xls.name)
    
    # Define the output CSV file name based on the input file name
    output_csv_file = f"{output_folder}/{input_xls.stem}_combined.csv"
    
    # Write the headers only if header_written is False
    if not header_written:
        df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=True, mode='w')
        header_written = True
    else:
        # Append the workbook data to the CSV file without writing headers
        df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=False, mode='a')
/python>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  merge two csv files into output.csv using Subprocess mg24 9 1,803 Dec-11-2022, 09:58 PM
Last Post: Larz60+
  Merge all json files in folder after filtering deneme2 10 2,375 Sep-18-2022, 10:32 AM
Last Post: deneme2
  Merge htm files with shutil library (TypeError: 'module' object is not callable) Melcu54 5 1,608 Aug-28-2022, 07:11 AM
Last Post: Melcu54
  How to merge all the files in a directory in to one file sutra 3 2,656 Dec-10-2020, 12:09 AM
Last Post: sutra
  Merge JSON Files Ugo 4 4,642 Aug-20-2020, 06:25 AM
Last Post: ndc85430
  How to read multiple csv files and merge data rajeshE 0 1,957 Mar-28-2020, 04:01 PM
Last Post: rajeshE
  error merge text files ledgreve 3 2,711 Nov-18-2019, 12:41 PM
Last Post: DeaD_EyE
  merge files AGC 4 3,921 Oct-04-2017, 08:54 PM
Last Post: AGC

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020