Python Forum
Creating csv files from Excel file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Creating csv files from Excel file
#1
I want to transfer worksheets in a workbook into individual csv files per worksheet (I am experimenting with 1 worksheet at the moment). I found the following code on the internet, but I am getting errors. I see an empty file called 'file1.csvSheet1.csv'. I expected the file to be file1.csv.

import csv
import openpyxl

def xls_to_csv(xls_name, csv_name) -> None:
    wb=openpyxl.load_workbook(xls_name)

    for sheet in wb.sheetnames:
        with open(f'{csv_name}{sheet.title()}.csv','w') as csv_file:
            writer=csv.writer(csv_name)
            xls_sheet=wb[sheet]
            maxRow=xls_sheet.max_row+1
            maxCol=xls_sheet.max_column+1
            headers=(xls_sheet.cell(row=1,column=col).value for col in range(1, maxCol))
            writer.writerow(headers)

            for r in range(2,maxRow):
                xls_row=(xls_sheet.cell(row=r,column=col).value for col in range(1, maxCol))
                writer.writerrow(xls_row)

if __name__ == '__main__':
    import sys
    import pathlib

    with pathlib.Path("D:/Power BI & Python/ExcelData3.xlsx") as xls_file:
        if xls_file.is_file():
            xls_to_csv("D:/Power BI & Python/ExcelData3.xlsx","file1.csv")
"D:\Power BI & Python\venv\Scripts\python.exe" "D:/Power BI & Python/Test.py" 
Traceback (most recent call last):
  File "D:\Power BI & Python\Test.py", line 26, in <module>
    xls_to_csv("D:/Power BI & Python/ExcelData3.xlsx","file1.csv")
  File "D:\Power BI & Python\Test.py", line 9, in xls_to_csv
    writer=csv.writer(csv_name)
TypeError: argument 1 must have a "write" method

Process finished with exit code 1
Reply
#2
You can do this easily using pandas.
Here's some code that does this.
the sample excel file used is available here

import pandas as pd
from pathlib import Path
import os 


class ExcelToCsv:
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        self.homepath = Path('.')

    def excel_to_csv(self, workbook, sheetname, startrow, endrow, csvfile, index=False):
        df = pd.read_excel(workbook, sheet_name=sheetname, skiprows=startrow-1, nrows=endrow-startrow)
        df.to_csv(csvfile, index=index)


def Testit():
    etc = ExcelToCsv()
    excelfile = etc.homepath / 'tab01.xls'
    csvfile = etc.homepath / 'tab01.csv'
    df = etc.excel_to_csv(excelfile, sheetname='places2000', startrow=17, endrow=260, csvfile=csvfile)


if __name__ == '__main__':
    Testit()
Partial results (tab01.csv):
Output:
Place Name,Area,"April 1, 2000","April 1, 1990",Number,Percent Abilene city,TX,115930,106654,9276,8.697282802332778 Akron city,OH,217074,223019,-5945,-2.6656921607576036 Albuquerque city,NM,448607,384736,63871,16.601253846793647
Reply
#3
csv_writer needs a file, not a file name.
writer=csv.writer(csv_file)  # File object, not str object (csv_name)
This is creating a file for each sheet in your spreadsheent.
        with open(f'{csv_name}{sheet.title()}.csv','w') as csv_file:
This is also why you get odd filenames.
But I'm with Larz60+, writing something to do this when there are tools, widely used tools, to do all the work is kooky.
Reply
#4
That worked, thanks.

How can I adjust the code so it loops through all the worksheets and each worksheet created a csv file with the name of the worksheet e.g. Sheet1.csv, Sheet2.csv, etc.
Reply
#5
You have all the pieces

You know how to open an excel spreadsheet and get the sheets. <- From your program
You know how to get a sheet from a workbook and write it to a csv file. <- From Larz60+ program
You'll need to loop through the sheets in the workbook, converting a page at a time.

The only part you don't quite have is making up a good name for the CSV file(s). You had a good start with your first post, but you'll need to do some work with removing extensions and getting the sheet name. Minor details that won't take long to iron out. Look at pathlib.Path. With Path it is easy to get directory, filename and extension from a file path.

I don't know that Excel_to_Csv needs a class. I think some boilerplate got copied over from a project where you want to change the working directory (maybe making zip files). Not needed here.
Some of the arguments to excel_to_csv could also use default values. Most of the time you want to convert the entire sheet and many workbooks have a single sheet. I made a few small changes.
def excel_to_csv(workbook, csvfile=None, sheet=0, start=None, end=None, index=False):
    """Convert excel spreadsheet (.xlsx) to csv file

    workbook : Name of excel file to read.
    csvfile  : Name of csv file to write.  Default to workbook with .csv extension
    sheet    : Name of sheet in workbook to convert.  Defaults to first sheet
    start    : Start reading at this row.  Default is start reading at row 0.
    end      : Stop reading at this row.  Default is read to end.
    index    : Write index column if True.  Default is False
    """
    if csvfile is None:
        csvfile = Path(workbook).with_suffix(".csv")
    skiprows = max(0, start - 1) if start is not None else None
    nrows = end - start if start and end else None
    df = pd.read_excel(workbook, sheet_name=sheet, skiprows=skiprows, nrows=nrows)
    df.to_csv(csvfile, index=index)
Reply
#6
I get an error in the amended code, which I don't understand.

import pandas as pd
from pathlib import Path
import os
import csv
import openpyxl

class ExcelToCsv:
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        self.homepath = Path('.')

    def excel_to_csv(self, workbook, sheetname, startrow, endrow, csvfile, index=False):
        df = pd.read_excel(workbook, sheet_name=sheetname, skiprows=startrow - 1, nrows=endrow - startrow)
        df.to_csv(csvfile, index=index)

def Testit():
    xls_name = 'ExcelData3.xlsx'
    wb = openpyxl.load_workbook(xls_name)

    for sheet in wb.sheetnames:
        xls_sheet = wb[sheet]
        maxRow = xls_sheet.max_row
        csv_file = f'{xls_sheet.title}.csv'
        sheet_name = xls_sheet
        start_row = 1
        end_row = maxRow
        etc = ExcelToCsv()
        df = excel_to_csv(xls_name, sheetname=sheet_name, startrow=start_row, endrow=end_row, csvfile=csv_file)

if __name__ == '__main__':
    Testit()
Error:
"D:\Power BI & Python\venv\Scripts\python.exe" "D:/Power BI & Python/Test.py" Traceback (most recent call last): File "D:\Power BI & Python\Test.py", line 33, in <module> Testit() File "D:\Power BI & Python\Test.py", line 30, in Testit df = excel_to_csv(xls_name, sheetname=sheet_name, startrow=start_row, endrow=end_row, csvfile=csv_file) NameError: name 'excel_to_csv' is not defined. Did you mean: 'ExcelToCsv'? Process finished with exit code 1
Reply
#7
        ...
        etc = ExcelToCsv()
        df = excel_to_csv(xls_name, sheetname=sheet_name, startrow=start_row, endrow=end_row, csvfile=csv_file)
        ...
You have created an instance of ExcelToCsv but not called its method excel_to_csv
change it to
        ...
        etc = ExcelToCsv()
        df = etc.excel_to_csv(xls_name, sheetname=sheet_name, startrow=start_row, endrow=end_row, csvfile=csv_file)
        ...
Reply
#8
you instanciate an instance of ExcelToCsv() on line 27, but then don't use it.
change line 28 to read: ext.excel_to_csv(xls_name, sheetname=sheet_name, startrow=start_row, endrow=end_row, csvfile=csv_file)
notice I removed the df = because ExcelToCsv.excel_to_csv returns nothing (None actually)
sorry Yortz, we posted at same time
Yoriz likes this post
Reply
#9
Got it working.

from openpyxl import load_workbook
import pandas as pd

wb = load_workbook(r"ExcelData3.xlsx")

for ws in wb.worksheets:
    sheetname=ws.title
    csv_file=f'{ws.title}.csv'
    start_row=1
    max_row=ws.max_row
    df=pd.read_excel(io="ExcelData3.xlsx", sheet_name=sheetname, usecols="A:J", skiprows=start_row, nrows=max_row)
    df.to_csv(csv_file,index=False)
Reply
#10
Sorry guys I only saw the last two posts after my last post.
Thanks everyone for your contributions.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python openyxl not updating Excel file MrBean12 1 315 Mar-03-2024, 12:16 AM
Last Post: MrBean12
  Copy Paste excel files based on the first letters of the file name Viento 2 423 Feb-07-2024, 12:24 PM
Last Post: Viento
  Search Excel File with a list of values huzzug 4 1,216 Nov-03-2023, 05:35 PM
Last Post: huzzug
  Updating sharepoint excel file odd results cubangt 1 821 Nov-03-2023, 05:13 PM
Last Post: noisefloor
  Help creating shell scrip for python file marciokoko 10 1,346 Sep-16-2023, 09:46 PM
Last Post: snippsat
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,089 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Save and Close Excel File avd88 0 3,008 Feb-20-2023, 07:19 PM
Last Post: avd88
  Trying to access excel file on our sharepoint server but getting errors cubangt 0 805 Feb-16-2023, 08:11 PM
Last Post: cubangt
  Import XML file directly into Excel spreadsheet demdej 0 837 Jan-24-2023, 02:48 PM
Last Post: demdej
  how to read txt file, and write into excel with multiply sheet jacklee26 14 9,895 Jan-21-2023, 06:57 AM
Last Post: jacklee26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020