Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Exporting a huge dataFrame
#1
Hey guys I am trying to export a pretty big data frame. I am using this code

df = pd.DataFrame(historicalData)
df.to_csv(r'dataframe.csv')
Now my dataframe has 5 million rows in it.

When I open the CSV file, it says "not loaded completely" and only 1 million rows are loaded.

If I use this code

df = pd.DataFrame(historicalData)
df.to_excel(r'dataframe.xlsx', header=True)
I get the ValueError : This sheet is too large! Your sheet size is: 5218863, 8 Max sheet size is: 1048576, 16384

Anyone know how to solve this problem?
Reply
#2
Just thinking aloud.

If you say 'when I open the CSV file' then how & what program you use? Is this the result while reading csv file data back using pandas.read_csv? Point being - can't it be some text editor/OS limiting factor and not file problem.

Can you use other formats instead of csv? One untested idea: convert df to to numpy array (pandas.DataFrame.to_numpy) and then use numpy.save to save into numpy.lib.format binary file.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
The CSV file is being opened by Excel
Reply
#4
Excel can't handle 5 mln lines (Excel specification and limits).
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
(Mar-21-2021, 02:34 PM)stylingpat Wrote: Hey guys I am trying to export a pretty big data frame. I am using this code

df = pd.DataFrame(historicalData)
df.to_csv(r'dataframe.csv')
Now my dataframe has 5 million rows in it.

When I open the CSV file, it says "not loaded completely" and only 1 million rows are loaded.

If I use this code

df = pd.DataFrame(historicalData)
df.to_excel(r'dataframe.xlsx', header=True)
I get the ValueError : This sheet is too large! Your sheet size is: 5218863, 8 Max sheet size is: 1048576, 16384

Anyone know how to solve this problem?
You have hit a documented limitation of the object you are using. So you can't use an Excel sheet to hold all the data. You can't put ten pounds of flour in a five-pound sack. So you have to figure out how to get two five-pound sacks. Or enough cells in a spreadsheet to handle your load. You might try multiple sheets, but you might also just flat run out of memory to hold that much data. You need a different tool.
Welcome to the Wonderful World of Programming.
Reply
#6
Ha thanks, I figured it out. It was kinda simple. Here's the solution for anyone else facing the same problem.

df1 = pd.DataFrame(historicalData[0:1000000])
df2 = pd.DataFrame(historicalData[1000000:2000000])
df3 = pd.DataFrame(historicalData[2000000:3000000])
df4 = pd.DataFrame(historicalData[3000000:4000000])
df5 = pd.DataFrame(historicalData[4000000:5000000])
df6 = pd.DataFrame(historicalData[5000000:6000000])
writer = pd.ExcelWriter('15 Year Backtest.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
df4.to_excel(writer, sheet_name='Sheet4')
df5.to_excel(writer, sheet_name='Sheet5')
df6.to_excel(writer, sheet_name='Sheet6')

writer.save()
Now if I can just sort out this threading problem... Its like after an hour, the program forgets its running... Or to be more accurate, one of the threads just forgets what its supposed to do after a period of time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Exporting Stock Fundamental Data to a CSV file with yahoo_fin DustinKlent 2 4,637 Aug-01-2022, 06:08 PM
Last Post: paulyan
  Exporting dataframes to excel without loosing datetime format Rafa 0 1,212 Oct-27-2021, 10:42 AM
Last Post: Rafa
  exporting all lines YazeedbnMohmmed 2 2,075 Feb-24-2021, 03:29 AM
Last Post: YazeedbnMohmmed
  Pip prints huge error when installing p5 hayden2s 1 2,217 Aug-08-2020, 02:30 PM
Last Post: snippsat
  Huge CSV Reading and Sorting for comparison akshaynimkar 3 2,427 Aug-04-2020, 11:20 AM
Last Post: Larz60+
  Python code for exporting table using Selenium gj31980 4 2,932 Aug-04-2020, 01:29 AM
Last Post: gj31980
  Exporting data from python into excel Zankawah 5 3,307 Jun-02-2020, 03:17 AM
Last Post: buran
  How to scan huge files and make it in chunks ampai 2 2,542 May-28-2020, 08:20 PM
Last Post: micseydel
  convert huge xml to csv using python srikanta_p 2 1,985 Feb-08-2020, 07:16 PM
Last Post: srikanta_p
  Exporting list with dictionary to Excel veromi22 0 3,011 Oct-15-2019, 12:54 AM
Last Post: veromi22

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020