Python Forum
Loading large .csv file with pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loading large .csv file with pandas
#1
Hello, I am trying to load a .csv file that has around 3 million lines. This file has a delimiter that is "|" due to commas are in the needed data so I didn't want to use a comma as the delimiter. I'm running into issues getting the data to load so I can clean it up to be able to use for SQL. Spreadsheet programs in general I've seen/known can handle little over 1 million so I have a few questions. Please note I'm looking for guidance not someone to do the work for me.

1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?
2. Currently the code examples below that I've tried, I'm getting a tokenizing error.
import pandas as pd
csv="/home/file.csv"
c_size = 500

for chunk in pd.read_csv(csv,chunksize=c_size):
    print(chunk)
    
Quote:
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4

import pandas as pd
csv="/home/joe/study.csv"
c_size = 500

for chunk in pd.read_csv(csv,chunksize=c_size):
    print(chunk.shape)
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4
Reply
#2
(Jun-07-2020, 03:56 PM)hangejj Wrote: 1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?

No, at least on Unix, file extensions aren't particularly meaningful. The read_csv function has a parameter that lets you specify the delimiter. See the docs here.
Reply
#3
(Jun-07-2020, 04:08 PM)ndc85430 Wrote:
(Jun-07-2020, 03:56 PM)hangejj Wrote: 1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?

No, at least on Unix, file extensions aren't particularly meaningful. The read_csv function has a parameter that lets you specify the delimiter. See the docs here.

Thank you. Once I solve this I'll put the solution in case anyone else comes across this.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Converted EXE file size is too large Rajasekaran 0 1,445 Mar-30-2023, 11:50 AM
Last Post: Rajasekaran
  validate large json file with millions of records in batches herobpv 3 1,222 Dec-10-2022, 10:36 PM
Last Post: bowlofred
  Pyinstaller distribution file seems too large hammer 4 2,632 Mar-31-2022, 02:33 PM
Last Post: snippsat
  Initializing, reading and updating a large JSON file medatib531 0 1,724 Mar-10-2022, 07:58 PM
Last Post: medatib531
  code for CSV file to html file without pandas jony057 1 2,909 Apr-24-2021, 09:41 PM
Last Post: snippsat
  can't read QRcode in large file simoneek 0 1,476 Sep-16-2020, 08:52 AM
Last Post: simoneek
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 5,707 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Read/Sort Large text file avoiding line-by-line read using mmep or hdf5 Robotguy 0 2,030 Jul-22-2020, 08:11 PM
Last Post: Robotguy
  Openpyxl with large file LocalFolder 4 9,151 Oct-15-2019, 11:24 AM
Last Post: LocalFolder
  I am trying to read a pandas file Balaji 1 1,918 Oct-08-2019, 10:55 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020