Python Forum
python Read each xlsx file and write it into csv with pipe delimiter
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python Read each xlsx file and write it into csv with pipe delimiter
#1
Hi Team,

I am trying to Read xlsx file and I want to write it into CSV File with pipe delimited.all decimal values wanted to round up 6 decimal places.

Challenges I am facing for Negative Numbers

xlsx values
-2816151988.789466

in CSV File values are getting written as , negative symbol getting missed.
2816151988.789466


how to fix it.



Thanks
mg



import os
import pandas as pd

# Specify the source directory containing the xlsx files
xlsx_directory = "path/to/source/directory"

# Specify the destination directory for saving the csv files
csv_destination = "path/to/destination/directory"

# Create the destination directory if it doesn't exist
if not os.path.exists(csv_destination):
    os.makedirs(csv_destination)

# Get a list of all xlsx files in the source directory
xlsx_files = [f for f in os.listdir(xlsx_directory) if f.endswith('.xlsx')]

# Loop through the xlsx files
for xlsx_file in xlsx_files:
    # Construct the full path of the xlsx file
    xlsx_path = os.path.join(xlsx_directory, xlsx_file)
    
    # Load the xlsx file into a pandas DataFrame
    df = pd.read_excel(xlsx_path)
    
    # Construct the path for the CSV file in the destination directory
    csv_file = os.path.splitext(xlsx_file)[0] + '.csv'
    csv_path = os.path.join(csv_destination, csv_file)
    
    # Save the DataFrame as a CSV file with pipe delimiter
    df.to_csv(csv_path, sep='|', index=False, float_format='%.6f')

    print(f"Converted {xlsx_file} to {csv_file}")
Larz60+ write Aug-30-2023, 03:31 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Just a suggestion, I don't know pandas very well:

Using pandas save the excel as csv.

df.to_csv(savepathXL+savenameCSV, index=False)
Then you have a text file. Open the csv as a text file and change things using re, or .replace()

Pandas docs here.

Found this on stackoverflow

   
carecavoador likes this post
Reply
#3
This is the way... And to use the pipe | as the separator just pass '|' as the sep argument to df.to_csv():

df.to_csv(savepathXL+savenameCSV, index=False, sep='|')
Reply
#4
Is there a negative number to print? Verify that df has a negative number. If so, the problem is writing the csv. Try writing without the format. If there is not a negative number in df, verify there is a negative number in the spreadsheet.

Your code works for me.
Reply
#5
Hi Team,

This is code I am using for cosolidation of xlsx files into Main file, Do ou see any error or suggest please




from pathlib import Path
import pandas as pd

foldername = "your_folder_name"
output_folder = "output_folder"  # Define the folder where you want to save the output CSV files
extra_header = "IFRS"
columns_to_fill = ['year1', 'year2', 'year3', 'year4', 'year5', 'Total']

files_combine = Path(f"{foldername}").rglob("*.xlsx")

header_written = False

for input_xls in files_combine:
    # Attempt to read from the 'Data' sheet; if it doesn't exist, read the first active sheet
    try:
        df = pd.read_excel(input_xls, sheet_name='Data')
        
        # Check if the header contains "reports"
        if 'reports' in df.columns.str.lower():
            df_dict = {input_xls.name: df}
        else:
            raise ValueError("Header does not contain 'reports'")
    except Exception:
        df_dict = pd.read_excel(input_xls, sheet_name=None, engine='openpyxl')  # Read all sheets into a dictionary
        
        # Select the first non-empty sheet as the active sheet
        for sheet_name, sheet_data in df_dict.items():
            if not sheet_data.empty:
                # Check if the header contains "reports"
                if 'reports' in sheet_data.columns.str.lower():
                    df = sheet_data
                    df_dict = {input_xls.name: df}
                    break
                else:
                    raise ValueError("Header does not contain 'reports'")
    
    # Check if "IFRS" column already exists, and if not, add it
    if 'IFRS' not in df.columns:
        df['IFRS'] = extra_header
    
    # Replace blank cells in the specified columns with 0
    df[columns_to_fill] = df[columns_to_fill].fillna(0)
    
    # Convert 'reporting date' column to yyyy-mm-dd format
    df['reporting date'] = pd.to_datetime(df['reporting date']).dt.strftime('%Y-%m-%d')
    
    # Extract the existing headers from the first row
    existing_headers = list(df.columns)
    
    # Add the extra header "IFRS" to the list of existing headers
    existing_headers.append(extra_header)
    
    # Update the DataFrame columns with the new list of headers
    df.columns = existing_headers
    
    # Insert an additional column for 'FileName' in the DataFrame
    df.insert(0, 'FileName', input_xls.name)
    
    # Define the output CSV file name based on the input file name
    output_csv_file = f"{output_folder}/{input_xls.stem}_combined.csv"
    
    # Check if the header has been written, and if not, write it
    if not header_written:
        header_written = True
        df.to_csv(output_csv_file, index=False, sep='\t', encoding='utf-8', float_format='%.6f', header=True, mode='a')
    else:
        # Append the workbook data to the CSV file without writing the header again
        df.to_csv(output_csv_file, index=False, sep='\t', encoding='utf-8', float_format='%.6f', header=False, mode='a')
Thanks
mg
deanhystad write Nov-09-2023, 12:59 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  What does .flush do? How can I change this to write to the file? Pedroski55 3 229 Apr-22-2024, 01:15 PM
Last Post: snippsat
  Last record in file doesn't write to newline gonksoup 3 446 Jan-22-2024, 12:56 PM
Last Post: deanhystad
  Recommended way to read/create PDF file? Winfried 3 2,901 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  write to csv file problem jacksfrustration 11 1,551 Nov-09-2023, 01:56 PM
Last Post: deanhystad
Question Special Characters read-write Prisonfeed 1 631 Sep-17-2023, 08:26 PM
Last Post: Gribouillis
  read file txt on my pc to telegram bot api Tupa 0 1,129 Jul-06-2023, 01:52 AM
Last Post: Tupa
  parse/read from file seperated by dots giovanne 5 1,126 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Formatting a date time string read from a csv file DosAtPython 5 1,300 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  How do I read and write a binary file in Python? blackears 6 6,701 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Context-sensitive delimiter ZZTurn 9 1,514 May-16-2023, 07:31 AM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020