Python Forum
Too big CSV file management - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Too big CSV file management (/thread-20985.html)



Too big CSV file management - CaptainCsaba - Sep-09-2019

Hey!

I am pretty new to "pandas" in python and I would like to ask for some help. I don't think it's complicated, I just can't figure it out. I have a huge CSV file (around 2 gigabytes, 4,4 million lines), excel cant open it fully. There is a very small part of it that I need, and everything else could be deleted. I only need the rows where "PUBLIC LIMITED COMPANY" or "PLC" appears as a substring in column A (I need the whole row where it does appear). These could be added to a new csv/excel file or it could be done in a way that everything else gets deleted in this one besides the ones we need. The filename is "AllCompanies.csv".

Thank you for your help!


RE: Too big CSV file management - ichabod801 - Sep-09-2019

Here's an outline of the cod you would want:

with open('AllCompanies.csv') as in_file:
    with open('PLCCompanies.csv', 'w') as out_file:
        for line in in_file:
            if line_matches_criteria:
                out_file.write(line)
The for loop will read the file one line at a time, so it doesn't clog your memory.


RE: Too big CSV file management - Axel_Erfurt - Sep-09-2019

maybe you can cobvert it to sqlite (-x $'\t' means delimiter tab) with csv-to-sqlite

csv-to-sqlite -x $'\t' -f /path/to/file.csv -o /path/to/file.db


RE: Too big CSV file management - CaptainCsaba - Sep-10-2019

That was so simple I started to wonder how that not came to my mind lol. Thank you, it worked like a charm. Had to add UTF8 encoding to it ,in the end it looked like this:

with open('AllCompanies.csv', encoding="utf-8") as in_file:
    with open('PLCCompanies.csv', 'w', encoding="utf-8") as out_file:
        for line in in_file:
            if "PLC" in str(line) or "PUBLIC LIMITED COMPANY" in str(line):
                out_file.write(line)