Python Forum

Hi all,

I'm trying to work with a large CSV file to effectively return what a filter would in excel....
So when the value in column 4 = a defined number then return that entire row.
Seems pointless, but some of the csv files won't open with excel as they have too many rows, so this would be quite useful for just looking at 1 customer.
Also, I plan on expanding this a lot further, this is just the 1st step!

What I have so far is:

import csv
import sys


number = '1'

csv_file = csv.reader(open('File.csv', "r", encoding="Latin1"))
filename  = open("Result.csv",'w')
sys.stdout =filename
print(next(csv_file))


for row in csv_file:

    if number == row[4]:
       print(','.join(row))

This is getting me some of the way there, however there are a few cases where cells have ',' within them, this results in the results from here going into 2 different columns.

I was trying to use quotechar="'" but I am using it wrong as my file returns nothing with this.

Also, using the

print(','.join(row))

it changed 'text' to text (which is what I want) however the header row still has 'text'.

Anyone know a good way of fixing these two issues?
Guessing it's quite straight forward, but anything I find online I think I end up using wrong as it returns errors, or it runs but the result file only has the header row, or nothing at all!

Thanks in advance,

N

I would look closely at your csv file. Confirm that you have the right quotechar. See how the file handles quotechars within quoted text (the escapechar setting in csv.reader).

I would also use filename.write rather than redirecting sys.stdout, in the interests of keeping it simple.

Why not simply use pandas dataframe to read the csv and filter out the row(s)? Something like this:

import pandas as pd
import csv

df = pd.read_csv('C:\Test\Test.csv')
df_filtered = df[df['Column'] == number]
df_filtered

from a long term perspective (given you want to expand, etc.) the best approach would be to load the csv fie into database.
for start you can look into sqlite3 - builtin support comes with python. It will be scalable solution and you can upgrade to something like MySQL, etc. in the future

Thanks for all the help.
I tried a few of the solutions, and at the moment for the simple part of the overall task the Pandas solution worked a treat!
I will no doubt have many more questions while i'm getting up to speed in python, so thanks again for the quick and helpful replies!

NHeav

ichabod801

sd_0912

buran

NHeav