Python Forum

Full Version: pd.read_csv timing out
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi

I am trying to read in a large NASA data file into a panda dataframe. It was working ok yesterday but overnight it stopped working giving errors as per below:

IncompleteRead: IncompleteRead(1079967499 bytes read, 740902801 more expected)

I am using very basic read.csv to import the tab file which was working ok. Is the issue with the file itself not opening or my browser or PC or internet connection? I cant even open in a browser as it doesn't get to the end of the file which leads me to think its not python setting ie a timeout that needs extended. Either my internet or the host site having problems. Its always the same amount being read and not read it seems. Even if I ask for minimal columns of data back ie 1 it still has the problem or all columns.

df = pd.read_csv ("https://hirise-pds.lpl.arizona.edu/PDS/INDEX/EDRCUMINDEX.TAB", header=None, usecols=col_list)

Thanks if you can help narrow down the issue for me. I have also asked the hosts at NASA if there is an issue with the web page.
My guess is that you don't have enough memory to read the entire file at once.
That fact that you can run for a long time, most likely is because the OS is probably trying to swap memory out to disk (paging on windows),
and can do this for quite a while until either the paging file gets full, or swap on Linux gets full.
there are some ways around this, see: https://stackoverflow.com/a/48407838