Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pd.read_csv timing out
#1
Hi

I am trying to read in a large NASA data file into a panda dataframe. It was working ok yesterday but overnight it stopped working giving errors as per below:

IncompleteRead: IncompleteRead(1079967499 bytes read, 740902801 more expected)

I am using very basic read.csv to import the tab file which was working ok. Is the issue with the file itself not opening or my browser or PC or internet connection? I cant even open in a browser as it doesn't get to the end of the file which leads me to think its not python setting ie a timeout that needs extended. Either my internet or the host site having problems. Its always the same amount being read and not read it seems. Even if I ask for minimal columns of data back ie 1 it still has the problem or all columns.

df = pd.read_csv ("https://hirise-pds.lpl.arizona.edu/PDS/INDEX/EDRCUMINDEX.TAB", header=None, usecols=col_list)

Thanks if you can help narrow down the issue for me. I have also asked the hosts at NASA if there is an issue with the web page.
Reply
#2
My guess is that you don't have enough memory to read the entire file at once.
That fact that you can run for a long time, most likely is because the OS is probably trying to swap memory out to disk (paging on windows),
and can do this for quite a while until either the paging file gets full, or swap on Linux gets full.
there are some ways around this, see: https://stackoverflow.com/a/48407838
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020