Jun-07-2020, 03:56 PM
Hello, I am trying to load a .csv file that has around 3 million lines. This file has a delimiter that is "|" due to commas are in the needed data so I didn't want to use a comma as the delimiter. I'm running into issues getting the data to load so I can clean it up to be able to use for SQL. Spreadsheet programs in general I've seen/known can handle little over 1 million so I have a few questions. Please note I'm looking for guidance not someone to do the work for me.
1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?
2. Currently the code examples below that I've tried, I'm getting a tokenizing error.
1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?
2. Currently the code examples below that I've tried, I'm getting a tokenizing error.
import pandas as pd csv="/home/file.csv" c_size = 500 for chunk in pd.read_csv(csv,chunksize=c_size): print(chunk)
Quote:Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4
import pandas as pd csv="/home/joe/study.csv" c_size = 500 for chunk in pd.read_csv(csv,chunksize=c_size): print(chunk.shape)
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4