Python Forum

Full Version: Help iterating through DictReader loaded from csv
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a comma delimited csv file loaded using DictReader.

csv...
person_id,type,arrive_date,leave_date
90,check_in,2/15/2018,2/15/2018
90,brunch,2/15/2018,2/15/2018
90,lunch ,2/15/2018,2/15/2018
90,cancelled,2/16/2018,2/16/2018
90,breakfast ,2/15/2018,2/22/2018
80,,,
40,check_in,2/15/2018,2/15/2018
50,check_in,2/15/2018,3/1/2018
50,breakfast ,2/15/2018,2/26/2018
50,lunch ,2/15/2018,3/1/2018
60,check_in,2/15/2018,2/15/2018
60,dinner,2/15/2018,2/15/2018
60,lunch ,2/21/2018,2/21/2018
60,breakfast ,3/15/2018,3/15/2018
35,check_in,3/15/2018,3/15/2018
35,cancelled,3/20/2018,3/20/2018
35,cancelled,3/21/2018,3/21/2018

Code to read...
import csv
with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'rt') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
Looks like...
OrderedDict([('person_id', '90'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '90'), ('type', 'brunch'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
etc...
Header: ['person_id', 'type', 'arrive_date', 'leave_date']

I need 2 counts…
eat_count
no_eat_count

If a person_id has a visit entry other than check_in and other than cancelled, the eat_count is incremented. A person_id with no entries in the other columns does not count. person_id 80 has no values in the other columns and does not count here.

If a person_id has only visits equal check_in and/or cancelled, the no_eat_count is incrememented. person_id 80 counts here.

The output should be…
eat_count = 3
no_eat_count = 3

This is the breakdown for each person_id for clarification. It’s not needed in the output.
90 - eat_count (has types other than check_in and cancelled)
80 - no_eat_count (blank entries)
40 - no_eat_count (check_in only)
50 - eat_count
60 - eat_count
35 - no_eat_count (check_in and cancelled only)


I need something like the below code. I'd like an inner loop through each person_id within the for row in reader loop.
import csv
with open(r'C:\eat.csv', 'rt') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        #I need something like the below
        #this is not correct
        for k in row['person_id']:
            if 'check_in' not in row ['type']:
                if 'cancelled' not in row ['type']:
                    eat_count += 1
                    break
       else:
           no_eat_count += 1
    print('eat count = ' + str(eat_count))
    print('no eat count = ' + str(no_eat_count))
This is wordier than I would like. I tried to make it concise. I need the two counts and help with that inner loop on person_id.
Thanks.

Additional comment... the two date columns are not used in this code. I probably should have excluded them in this question but will use them later.
This is much easier to deal with as a plain list.
Here's the two methods compared:
import csv

def read_file1(filename):
    with open(filename, 'r') as f:
    # with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'rt') as f:
        reader = csv.DictReader(f, delimiter=',')
        for row in reader:
            print(row)

def read_file2(filename):
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)

if __name__ == '__main__':
    print('\n\nUsing read_file1')
    read_file1('eat.csv')
    print('\n\nUsing read_file2')
    read_file2('eat.csv')
results:
Output:
Using read_file1 OrderedDict([('person_id', '90'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')]) OrderedDict([('person_id', '90'), ('type', 'brunch'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')]) OrderedDict([('person_id', '90'), ('type', 'lunch '), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')]) OrderedDict([('person_id', '90'), ('type', 'cancelled'), ('arrive_date', '2/16/2018'), ('leave_date', '2/16/2018')]) OrderedDict([('person_id', '90'), ('type', 'breakfast '), ('arrive_date', '2/15/2018'), ('leave_date', '2/22/2018')]) OrderedDict([('person_id', '80'), ('type', ''), ('arrive_date', ''), ('leave_date', '')]) OrderedDict([('person_id', '40'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')]) OrderedDict([('person_id', '50'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '3/1/2018')]) OrderedDict([('person_id', '50'), ('type', 'breakfast '), ('arrive_date', '2/15/2018'), ('leave_date', '2/26/2018')]) OrderedDict([('person_id', '50'), ('type', 'lunch '), ('arrive_date', '2/15/2018'), ('leave_date', '3/1/2018')]) OrderedDict([('person_id', '60'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')]) OrderedDict([('person_id', '60'), ('type', 'dinner'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')]) OrderedDict([('person_id', '60'), ('type', 'lunch '), ('arrive_date', '2/21/2018'), ('leave_date', '2/21/2018')]) OrderedDict([('person_id', '60'), ('type', 'breakfast '), ('arrive_date', '3/15/2018'), ('leave_date', '3/15/2018')]) OrderedDict([('person_id', '35'), ('type', 'check_in'), ('arrive_date', '3/15/2018'), ('leave_date', '3/15/2018')]) OrderedDict([('person_id', '35'), ('type', 'cancelled'), ('arrive_date', '3/20/2018'), ('leave_date', '3/20/2018')]) OrderedDict([('person_id', '35'), ('type', 'cancelled'), ('arrive_date', '3/21/2018'), ('leave_date', '3/21/2018')]) Using read_file2 ['person_id', 'type', 'arrive_date', 'leave_date'] ['90', 'check_in', '2/15/2018', '2/15/2018'] ['90', 'brunch', '2/15/2018', '2/15/2018'] ['90', 'lunch ', '2/15/2018', '2/15/2018'] ['90', 'cancelled', '2/16/2018', '2/16/2018'] ['90', 'breakfast ', '2/15/2018', '2/22/2018'] ['80', '', '', ''] ['40', 'check_in', '2/15/2018', '2/15/2018'] ['50', 'check_in', '2/15/2018', '3/1/2018'] ['50', 'breakfast ', '2/15/2018', '2/26/2018'] ['50', 'lunch ', '2/15/2018', '3/1/2018'] ['60', 'check_in', '2/15/2018', '2/15/2018'] ['60', 'dinner', '2/15/2018', '2/15/2018'] ['60', 'lunch ', '2/21/2018', '2/21/2018'] ['60', 'breakfast ', '3/15/2018', '3/15/2018'] ['35', 'check_in', '3/15/2018', '3/15/2018'] ['35', 'cancelled', '3/20/2018', '3/20/2018'] ['35', 'cancelled', '3/21/2018', '3/21/2018']
Thanks, Larz60+.

I'll try the list.
I'm new to Python and am still figuring out how to specify the columns in the code. I don't know if I use header name or an index at this point.

import csv
eat_count = 0
no_eat_count = 0
with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        #the inner loop is still incorrect
        #I'm not sure if I can use the column names instead of indexes with lists
        for k in row['person_id']:
            if 'check_in' not in row ['type']:
                if 'cancelled' not in row ['type']:
                    eat_count += 1
                    break
        else:
           no_eat_count += 1
print('eat count = ' + str(eat_count))
print('no eat count = ' + str(no_eat_count))
I'll appreciate any suggestions using either the list or OrderedDict. I'll later try using Pandas.

OK, I see how to specify the columns using indexes... row [1].

Now I have...
import csv
eat_count = 0
no_eat_count = 0
with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'r') as f:
    reader = csv.reader(f)
    next(reader, None)  # skip the headers
    for row in reader:
        #the inner loop is still incorrect
        #I don't know how to loop through each person_id
        for k in row[0]:
            if 'check_in' not in row [1]:
                if 'cancelled' not in row [1]:
                    if row [1] != "":
                        eat_count += 1
                        print(row)
                        break
        else:
           no_eat_count += 1
print('eat count = ' + str(eat_count))
print('no eat count = ' + str(no_eat_count))
The incorrect output...
eat count = 8
no eat count = 9

It should be...
eat count = 3
no eat count = 3

The rows counted in eat count...
['90', 'brunch', '2/15/2018', '2/15/2018']
['90', 'lunch ', '2/15/2018', '2/15/2018']
['90', 'breakfast ', '2/15/2018', '2/22/2018']
['50', 'breakfast ', '2/15/2018', '2/26/2018']
['50', 'lunch ', '2/15/2018', '3/1/2018']
['60', 'dinner', '2/15/2018', '2/15/2018']
['60', 'lunch ', '2/21/2018', '2/21/2018']
['60', 'breakfast ', '3/15/2018', '3/15/2018']

Each person_id should only be counted once. I'm not sure how to do that inner loop on person_id in Python. I was attempting to loop through and then break out of the inner loop once the eat_count is incremented.
Can I get any suggestions?
To separate into columns, you could do something like:
import csv

def read_file(filename):
    column_titles = None
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            for col in row:
                print('{0:<15} '.format(col), end='')
            print()

if __name__ == '__main__':
    read_file('eat.csv')
which will yield:
Output:
person_id type arrive_date leave_date 90 check_in 2/15/2018 2/15/2018 90 brunch 2/15/2018 2/15/2018 90 lunch 2/15/2018 2/15/2018 90 cancelled 2/16/2018 2/16/2018 90 breakfast 2/15/2018 2/22/2018 80 40 check_in 2/15/2018 2/15/2018 50 check_in 2/15/2018 3/1/2018 50 breakfast 2/15/2018 2/26/2018 50 lunch 2/15/2018 3/1/2018 60 check_in 2/15/2018 2/15/2018 60 dinner 2/15/2018 2/15/2018 60 lunch 2/21/2018 2/21/2018 60 breakfast 3/15/2018 3/15/2018 35 check_in 3/15/2018 3/15/2018 35 cancelled 3/20/2018 3/20/2018 35 cancelled 3/21/2018 3/21/2018
Thanks for the replies. They will be helpful in the long run.
I'm still not sure how to handle the loops to get the totals I need...

eat_count - A person_id has a visit entry other than check_in and other than cancelled, the eat_count is incremented. A person_id with no entries in the other columns does not count. person_id 80 has no values in the other columns and does not count here.

no_eat_count a person_id has only visits equal check_in and/or cancelled, the no_eat_count is incremented. person_id 80 counts here.

Each person_id should only have one count.

The output should be…
eat_count = 3
no_eat_count = 3

I know it should be simple, but I don't yet grasp how to work with lists in Python.
Can anyone help?
Thanks.