Python Forum

Full Version: CSV file created is huge in size. How to reduce the size?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have the below code used to create a CSV file:

import csv
srcFile = open("C:\Users\XXXX\Expenses.csv","r")
data = srcFile.read()

srcList = data.split("\n")

colNames = srcList[0].split(",")
header = srcList[1]

sheetData = srcList[2:]
final_data = []
expense_data = []

for row in sheetData:
    split_row = row.split(",")
    final_data.append(split_row)

for row in range(0, len(final_data)-1):
    for col in range(3, 30):
        loc = final_data[row][0]
        opsheet = final_data[row][1]
        rowNum = final_data[row][2]
        colName = colNames[col]

        str1 = "='\\\xxxxxxx\xxxx\xxx\[APR_"
        str2 = "_DATA.xlsm]"
        str3 = "\'!"
        str4 = "$"
        finalStr = str1 + loc + str2 + opsheet + str3 + colName + str4 + rowNum
        final_data[row][col] = ""

with open("c:\users\xxxxx\new_expenses.csv", "wb") as f:
    fwriter = csv.writer(f)
    fwriter.writerows(final_data)
I'm a beginner in Python and I know this code is not up to the standards or a real way to code. But this code works fine and creates a CSV file.

Now the problem is this file is already created by someone else and I'm replicating it for other purposes with new data and everything. The existing file, created earlier, was only 30MB in size. But the file that is created in this method is 180MB and takes an enormous amount of time to open.

The original file is nothing but a single worksheet with 51,000 rows and 20 Columns each cell referencing to some cell in a workbook in a network location.

Any idea on how to reduce the size? or What is causing the huge?
Also as a beginner, any suggestions/feedback to improving the code is really appreciated. Thanks in advance!!