Nov-09-2023, 10:56 AM
(This post was last modified: Nov-09-2023, 12:59 PM by deanhystad.)
Hi Team,
This is code I am using for cosolidation of xlsx files into Main file, Do ou see any error or suggest please
mg
This is code I am using for cosolidation of xlsx files into Main file, Do ou see any error or suggest please
from pathlib import Path import pandas as pd foldername = "your_folder_name" output_folder = "output_folder" # Define the folder where you want to save the output CSV files extra_header = "IFRS" columns_to_fill = ['year1', 'year2', 'year3', 'year4', 'year5', 'Total'] files_combine = Path(f"{foldername}").rglob("*.xlsx") header_written = False for input_xls in files_combine: # Attempt to read from the 'Data' sheet; if it doesn't exist, read the first active sheet try: df = pd.read_excel(input_xls, sheet_name='Data') # Check if the header contains "reports" if 'reports' in df.columns.str.lower(): df_dict = {input_xls.name: df} else: raise ValueError("Header does not contain 'reports'") except Exception: df_dict = pd.read_excel(input_xls, sheet_name=None, engine='openpyxl') # Read all sheets into a dictionary # Select the first non-empty sheet as the active sheet for sheet_name, sheet_data in df_dict.items(): if not sheet_data.empty: # Check if the header contains "reports" if 'reports' in sheet_data.columns.str.lower(): df = sheet_data df_dict = {input_xls.name: df} break else: raise ValueError("Header does not contain 'reports'") # Check if "IFRS" column already exists, and if not, add it if 'IFRS' not in df.columns: df['IFRS'] = extra_header # Replace blank cells in the specified columns with 0 df[columns_to_fill] = df[columns_to_fill].fillna(0) # Convert 'reporting date' column to yyyy-mm-dd format df['reporting date'] = pd.to_datetime(df['reporting date']).dt.strftime('%Y-%m-%d') # Extract the existing headers from the first row existing_headers = list(df.columns) # Add the extra header "IFRS" to the list of existing headers existing_headers.append(extra_header) # Update the DataFrame columns with the new list of headers df.columns = existing_headers # Insert an additional column for 'FileName' in the DataFrame df.insert(0, 'FileName', input_xls.name) # Define the output CSV file name based on the input file name output_csv_file = f"{output_folder}/{input_xls.stem}_combined.csv" # Check if the header has been written, and if not, write it if not header_written: header_written = True df.to_csv(output_csv_file, index=False, sep='\t', encoding='utf-8', float_format='%.6f', header=True, mode='a') else: # Append the workbook data to the CSV file without writing the header again df.to_csv(output_csv_file, index=False, sep='\t', encoding='utf-8', float_format='%.6f', header=False, mode='a')Thanks
mg
deanhystad write Nov-09-2023, 12:59 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.