Python Forum
Loading large .csv file with pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loading large .csv file with pandas
#1
Hello, I am trying to load a .csv file that has around 3 million lines. This file has a delimiter that is "|" due to commas are in the needed data so I didn't want to use a comma as the delimiter. I'm running into issues getting the data to load so I can clean it up to be able to use for SQL. Spreadsheet programs in general I've seen/known can handle little over 1 million so I have a few questions. Please note I'm looking for guidance not someone to do the work for me.

1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?
2. Currently the code examples below that I've tried, I'm getting a tokenizing error.
import pandas as pd
csv="/home/file.csv"
c_size = 500

for chunk in pd.read_csv(csv,chunksize=c_size):
    print(chunk)
    
Quote:
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4

import pandas as pd
csv="/home/joe/study.csv"
c_size = 500

for chunk in pd.read_csv(csv,chunksize=c_size):
    print(chunk.shape)
Quote:ParserError: Error tokenizing data. C error: Expected 3 fields in line 94909, saw 4
Reply
#2
(Jun-07-2020, 03:56 PM)hangejj Wrote: 1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?

No, at least on Unix, file extensions aren't particularly meaningful. The read_csv function has a parameter that lets you specify the delimiter. See the docs here.
Reply
#3
(Jun-07-2020, 04:08 PM)ndc85430 Wrote:
(Jun-07-2020, 03:56 PM)hangejj Wrote: 1. Since I'm using a different delimiter than the file type, would it be better to save the file as a .txt file?

No, at least on Unix, file extensions aren't particularly meaningful. The read_csv function has a parameter that lets you specify the delimiter. See the docs here.

Thank you. Once I solve this I'll put the solution in case anyone else comes across this.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Next/Prev file without loading all filenames WilliamKappler 9 452 Apr-12-2024, 05:13 AM
Last Post: Pedroski55
  Converted EXE file size is too large Rajasekaran 0 1,503 Mar-30-2023, 11:50 AM
Last Post: Rajasekaran
  validate large json file with millions of records in batches herobpv 3 1,256 Dec-10-2022, 10:36 PM
Last Post: bowlofred
  Pyinstaller distribution file seems too large hammer 4 2,705 Mar-31-2022, 02:33 PM
Last Post: snippsat
  Initializing, reading and updating a large JSON file medatib531 0 1,765 Mar-10-2022, 07:58 PM
Last Post: medatib531
  code for CSV file to html file without pandas jony057 1 2,950 Apr-24-2021, 09:41 PM
Last Post: snippsat
  can't read QRcode in large file simoneek 0 1,497 Sep-16-2020, 08:52 AM
Last Post: simoneek
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 5,809 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Read/Sort Large text file avoiding line-by-line read using mmep or hdf5 Robotguy 0 2,044 Jul-22-2020, 08:11 PM
Last Post: Robotguy
  Openpyxl with large file LocalFolder 4 9,230 Oct-15-2019, 11:24 AM
Last Post: LocalFolder

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020