![]() |
Too big CSV file management - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Too big CSV file management (/thread-20985.html) |
Too big CSV file management - CaptainCsaba - Sep-09-2019 Hey! I am pretty new to "pandas" in python and I would like to ask for some help. I don't think it's complicated, I just can't figure it out. I have a huge CSV file (around 2 gigabytes, 4,4 million lines), excel cant open it fully. There is a very small part of it that I need, and everything else could be deleted. I only need the rows where "PUBLIC LIMITED COMPANY" or "PLC" appears as a substring in column A (I need the whole row where it does appear). These could be added to a new csv/excel file or it could be done in a way that everything else gets deleted in this one besides the ones we need. The filename is "AllCompanies.csv". Thank you for your help! RE: Too big CSV file management - ichabod801 - Sep-09-2019 Here's an outline of the cod you would want: with open('AllCompanies.csv') as in_file: with open('PLCCompanies.csv', 'w') as out_file: for line in in_file: if line_matches_criteria: out_file.write(line)The for loop will read the file one line at a time, so it doesn't clog your memory. RE: Too big CSV file management - Axel_Erfurt - Sep-09-2019 maybe you can cobvert it to sqlite (-x $'\t' means delimiter tab) with csv-to-sqlite csv-to-sqlite -x $'\t' -f /path/to/file.csv -o /path/to/file.db
RE: Too big CSV file management - CaptainCsaba - Sep-10-2019 That was so simple I started to wonder how that not came to my mind lol. Thank you, it worked like a charm. Had to add UTF8 encoding to it ,in the end it looked like this: with open('AllCompanies.csv', encoding="utf-8") as in_file: with open('PLCCompanies.csv', 'w', encoding="utf-8") as out_file: for line in in_file: if "PLC" in str(line) or "PUBLIC LIMITED COMPANY" in str(line): out_file.write(line) |