Python Forum
Loading large .csv file with pandas - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Loading large .csv file with pandas (/thread-27457.html)



Loading large .csv file with pandas - hangejj - Jun-07-2020

Hello, I am trying to load a .csv file that has around 3 million lines. This file has a delimiter that is "|" due to commas are in the needed data so I didn't want to use a comma as the delimiter. I'm running into issues getting the data to load so I can clean it up to be able to use for SQL. Spreadsheet programs in general I've seen/known can handle little over 1 million so I have a few questions. Please note I'm looking for guidance not someone to do the work for me.

1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?
2. Currently the code examples below that I've tried, I'm getting a tokenizing error.
import pandas as pd
csv="/home/file.csv"
c_size = 500

for chunk in pd.read_csv(csv,chunksize=c_size):
    print(chunk)
    
Quote:
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4

import pandas as pd
csv="/home/joe/study.csv"
c_size = 500

for chunk in pd.read_csv(csv,chunksize=c_size):
    print(chunk.shape)
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4



RE: Loading large .csv file with pandas - ndc85430 - Jun-07-2020

(Jun-07-2020, 03:56 PM)hangejj Wrote: 1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?

No, at least on Unix, file extensions aren't particularly meaningful. The read_csv function has a parameter that lets you specify the delimiter. See the docs here.


RE: Loading large .csv file with pandas - hangejj - Jun-08-2020

(Jun-07-2020, 04:08 PM)ndc85430 Wrote:
(Jun-07-2020, 03:56 PM)hangejj Wrote: 1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?

No, at least on Unix, file extensions aren't particularly meaningful. The read_csv function has a parameter that lets you specify the delimiter. See the docs here.

Thank you. Once I solve this I'll put the solution in case anyone else comes across this.