Python Forum
Exporting a huge dataFrame - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Exporting a huge dataFrame (/thread-32988.html)



Exporting a huge dataFrame - stylingpat - Mar-21-2021

Hey guys I am trying to export a pretty big data frame. I am using this code

df = pd.DataFrame(historicalData)
df.to_csv(r'dataframe.csv')
Now my dataframe has 5 million rows in it.

When I open the CSV file, it says "not loaded completely" and only 1 million rows are loaded.

If I use this code

df = pd.DataFrame(historicalData)
df.to_excel(r'dataframe.xlsx', header=True)
I get the ValueError : This sheet is too large! Your sheet size is: 5218863, 8 Max sheet size is: 1048576, 16384

Anyone know how to solve this problem?


RE: Exporting a huge dataFrame - perfringo - Mar-21-2021

Just thinking aloud.

If you say 'when I open the CSV file' then how & what program you use? Is this the result while reading csv file data back using pandas.read_csv? Point being - can't it be some text editor/OS limiting factor and not file problem.

Can you use other formats instead of csv? One untested idea: convert df to to numpy array (pandas.DataFrame.to_numpy) and then use numpy.save to save into numpy.lib.format binary file.


RE: Exporting a huge dataFrame - stylingpat - Mar-21-2021

The CSV file is being opened by Excel


RE: Exporting a huge dataFrame - perfringo - Mar-21-2021

Excel can't handle 5 mln lines (Excel specification and limits).


RE: Exporting a huge dataFrame - supuflounder - Mar-22-2021

(Mar-21-2021, 02:34 PM)stylingpat Wrote: Hey guys I am trying to export a pretty big data frame. I am using this code

df = pd.DataFrame(historicalData)
df.to_csv(r'dataframe.csv')
Now my dataframe has 5 million rows in it.

When I open the CSV file, it says "not loaded completely" and only 1 million rows are loaded.

If I use this code

df = pd.DataFrame(historicalData)
df.to_excel(r'dataframe.xlsx', header=True)
I get the ValueError : This sheet is too large! Your sheet size is: 5218863, 8 Max sheet size is: 1048576, 16384

Anyone know how to solve this problem?
You have hit a documented limitation of the object you are using. So you can't use an Excel sheet to hold all the data. You can't put ten pounds of flour in a five-pound sack. So you have to figure out how to get two five-pound sacks. Or enough cells in a spreadsheet to handle your load. You might try multiple sheets, but you might also just flat run out of memory to hold that much data. You need a different tool.
Welcome to the Wonderful World of Programming.


RE: Exporting a huge dataFrame - stylingpat - Mar-23-2021

Ha thanks, I figured it out. It was kinda simple. Here's the solution for anyone else facing the same problem.

df1 = pd.DataFrame(historicalData[0:1000000])
df2 = pd.DataFrame(historicalData[1000000:2000000])
df3 = pd.DataFrame(historicalData[2000000:3000000])
df4 = pd.DataFrame(historicalData[3000000:4000000])
df5 = pd.DataFrame(historicalData[4000000:5000000])
df6 = pd.DataFrame(historicalData[5000000:6000000])
writer = pd.ExcelWriter('15 Year Backtest.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
df4.to_excel(writer, sheet_name='Sheet4')
df5.to_excel(writer, sheet_name='Sheet5')
df6.to_excel(writer, sheet_name='Sheet6')

writer.save()
Now if I can just sort out this threading problem... Its like after an hour, the program forgets its running... Or to be more accurate, one of the threads just forgets what its supposed to do after a period of time.