Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Too big CSV file management
#1
Hey!

I am pretty new to "pandas" in python and I would like to ask for some help. I don't think it's complicated, I just can't figure it out. I have a huge CSV file (around 2 gigabytes, 4,4 million lines), excel cant open it fully. There is a very small part of it that I need, and everything else could be deleted. I only need the rows where "PUBLIC LIMITED COMPANY" or "PLC" appears as a substring in column A (I need the whole row where it does appear). These could be added to a new csv/excel file or it could be done in a way that everything else gets deleted in this one besides the ones we need. The filename is "AllCompanies.csv".

Thank you for your help!
Reply
#2
Here's an outline of the cod you would want:

with open('AllCompanies.csv') as in_file:
    with open('PLCCompanies.csv', 'w') as out_file:
        for line in in_file:
            if line_matches_criteria:
                out_file.write(line)
The for loop will read the file one line at a time, so it doesn't clog your memory.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
maybe you can cobvert it to sqlite (-x $'\t' means delimiter tab) with csv-to-sqlite

csv-to-sqlite -x $'\t' -f /path/to/file.csv -o /path/to/file.db
Reply
#4
That was so simple I started to wonder how that not came to my mind lol. Thank you, it worked like a charm. Had to add UTF8 encoding to it ,in the end it looked like this:

with open('AllCompanies.csv', encoding="utf-8") as in_file:
    with open('PLCCompanies.csv', 'w', encoding="utf-8") as out_file:
        for line in in_file:
            if "PLC" in str(line) or "PUBLIC LIMITED COMPANY" in str(line):
                out_file.write(line)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Management software with image comparison franc1986 0 25 1 hour ago
Last Post: franc1986
  code management trix 3 616 Oct-23-2023, 05:29 PM
Last Post: buran
  Keep getting Session management error when running imshow in pycharm pace 0 2,093 Mar-25-2021, 10:06 AM
Last Post: pace
  Can Python Do This? Asset Management mbaker_wv 4 2,232 Oct-28-2020, 01:37 PM
Last Post: mbaker_wv
  User management library? MuntyScruntfundle 0 1,458 Jan-14-2020, 02:01 PM
Last Post: MuntyScruntfundle
  flux management chris_thibault 3 2,847 Sep-10-2018, 10:23 AM
Last Post: chris_thibault

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020