Python Forum

I have a comma delimited csv file loaded using DictReader.

csv...
person_id,type,arrive_date,leave_date
90,check_in,2/15/2018,2/15/2018
90,brunch,2/15/2018,2/15/2018
90,lunch ,2/15/2018,2/15/2018
90,cancelled,2/16/2018,2/16/2018
90,breakfast ,2/15/2018,2/22/2018
80,,,
40,check_in,2/15/2018,2/15/2018
50,check_in,2/15/2018,3/1/2018
50,breakfast ,2/15/2018,2/26/2018
50,lunch ,2/15/2018,3/1/2018
60,check_in,2/15/2018,2/15/2018
60,dinner,2/15/2018,2/15/2018
60,lunch ,2/21/2018,2/21/2018
60,breakfast ,3/15/2018,3/15/2018
35,check_in,3/15/2018,3/15/2018
35,cancelled,3/20/2018,3/20/2018
35,cancelled,3/21/2018,3/21/2018

Code to read...

import csv
with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'rt') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:

Looks like...
OrderedDict([('person_id', '90'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '90'), ('type', 'brunch'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
etc...
Header: ['person_id', 'type', 'arrive_date', 'leave_date']

I need 2 counts…
eat_count
no_eat_count

If a person_id has a visit entry other than check_in and other than cancelled, the eat_count is incremented. A person_id with no entries in the other columns does not count. person_id 80 has no values in the other columns and does not count here.

If a person_id has only visits equal check_in and/or cancelled, the no_eat_count is incrememented. person_id 80 counts here.

The output should be…
eat_count = 3
no_eat_count = 3

This is the breakdown for each person_id for clarification. It’s not needed in the output.
90 - eat_count (has types other than check_in and cancelled)
80 - no_eat_count (blank entries)
40 - no_eat_count (check_in only)
50 - eat_count
60 - eat_count
35 - no_eat_count (check_in and cancelled only)

I need something like the below code. I'd like an inner loop through each person_id within the for row in reader loop.

import csv
with open(r'C:\eat.csv', 'rt') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        #I need something like the below
        #this is not correct
        for k in row['person_id']:
            if 'check_in' not in row ['type']:
                if 'cancelled' not in row ['type']:
                    eat_count += 1
                    break
       else:
           no_eat_count += 1
    print('eat count = ' + str(eat_count))
    print('no eat count = ' + str(no_eat_count))

This is wordier than I would like. I tried to make it concise. I need the two counts and help with that inner loop on person_id.
Thanks.

Additional comment... the two date columns are not used in this code. I probably should have excluded them in this question but will use them later.

This is much easier to deal with as a plain list.
Here's the two methods compared:

import csv

def read_file1(filename):
    with open(filename, 'r') as f:
    # with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'rt') as f:
        reader = csv.DictReader(f, delimiter=',')
        for row in reader:
            print(row)

def read_file2(filename):
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)

if __name__ == '__main__':
    print('\n\nUsing read_file1')
    read_file1('eat.csv')
    print('\n\nUsing read_file2')
    read_file2('eat.csv')

results:

Output:Using read_file1
OrderedDict([('person_id', '90'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '90'), ('type', 'brunch'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '90'), ('type', 'lunch '), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '90'), ('type', 'cancelled'), ('arrive_date', '2/16/2018'), ('leave_date', '2/16/2018')])
OrderedDict([('person_id', '90'), ('type', 'breakfast '), ('arrive_date', '2/15/2018'), ('leave_date', '2/22/2018')])
OrderedDict([('person_id', '80'), ('type', ''), ('arrive_date', ''), ('leave_date', '')])
OrderedDict([('person_id', '40'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '50'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '3/1/2018')])
OrderedDict([('person_id', '50'), ('type', 'breakfast '), ('arrive_date', '2/15/2018'), ('leave_date', '2/26/2018')])
OrderedDict([('person_id', '50'), ('type', 'lunch '), ('arrive_date', '2/15/2018'), ('leave_date', '3/1/2018')])
OrderedDict([('person_id', '60'), ('type', 'check_in'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '60'), ('type', 'dinner'), ('arrive_date', '2/15/2018'), ('leave_date', '2/15/2018')])
OrderedDict([('person_id', '60'), ('type', 'lunch '), ('arrive_date', '2/21/2018'), ('leave_date', '2/21/2018')])
OrderedDict([('person_id', '60'), ('type', 'breakfast '), ('arrive_date', '3/15/2018'), ('leave_date', '3/15/2018')])
OrderedDict([('person_id', '35'), ('type', 'check_in'), ('arrive_date', '3/15/2018'), ('leave_date', '3/15/2018')])
OrderedDict([('person_id', '35'), ('type', 'cancelled'), ('arrive_date', '3/20/2018'), ('leave_date', '3/20/2018')])
OrderedDict([('person_id', '35'), ('type', 'cancelled'), ('arrive_date', '3/21/2018'), ('leave_date', '3/21/2018')])


Using read_file2
['person_id', 'type', 'arrive_date', 'leave_date']
['90', 'check_in', '2/15/2018', '2/15/2018']
['90', 'brunch', '2/15/2018', '2/15/2018']
['90', 'lunch ', '2/15/2018', '2/15/2018']
['90', 'cancelled', '2/16/2018', '2/16/2018']
['90', 'breakfast ', '2/15/2018', '2/22/2018']
['80', '', '', '']
['40', 'check_in', '2/15/2018', '2/15/2018']
['50', 'check_in', '2/15/2018', '3/1/2018']
['50', 'breakfast ', '2/15/2018', '2/26/2018']
['50', 'lunch ', '2/15/2018', '3/1/2018']
['60', 'check_in', '2/15/2018', '2/15/2018']
['60', 'dinner', '2/15/2018', '2/15/2018']
['60', 'lunch ', '2/21/2018', '2/21/2018']
['60', 'breakfast ', '3/15/2018', '3/15/2018']
['35', 'check_in', '3/15/2018', '3/15/2018']
['35', 'cancelled', '3/20/2018', '3/20/2018']
['35', 'cancelled', '3/21/2018', '3/21/2018']

Thanks, Larz60+.

I'll try the list.
I'm new to Python and am still figuring out how to specify the columns in the code. I don't know if I use header name or an index at this point.

import csv
eat_count = 0
no_eat_count = 0
with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        #the inner loop is still incorrect
        #I'm not sure if I can use the column names instead of indexes with lists
        for k in row['person_id']:
            if 'check_in' not in row ['type']:
                if 'cancelled' not in row ['type']:
                    eat_count += 1
                    break
        else:
           no_eat_count += 1
print('eat count = ' + str(eat_count))
print('no eat count = ' + str(no_eat_count))

I'll appreciate any suggestions using either the list or OrderedDict. I'll later try using Pandas.

OK, I see how to specify the columns using indexes... row [1].

Now I have...

import csv
eat_count = 0
no_eat_count = 0
with open(r'C:\Users\delliott\Desktop\pythoncsv\Q3\eat.csv', 'r') as f:
    reader = csv.reader(f)
    next(reader, None)  # skip the headers
    for row in reader:
        #the inner loop is still incorrect
        #I don't know how to loop through each person_id
        for k in row[0]:
            if 'check_in' not in row [1]:
                if 'cancelled' not in row [1]:
                    if row [1] != "":
                        eat_count += 1
                        print(row)
                        break
        else:
           no_eat_count += 1
print('eat count = ' + str(eat_count))
print('no eat count = ' + str(no_eat_count))

The incorrect output...
eat count = 8
no eat count = 9

It should be...
eat count = 3
no eat count = 3

The rows counted in eat count...
['90', 'brunch', '2/15/2018', '2/15/2018']
['90', 'lunch ', '2/15/2018', '2/15/2018']
['90', 'breakfast ', '2/15/2018', '2/22/2018']
['50', 'breakfast ', '2/15/2018', '2/26/2018']
['50', 'lunch ', '2/15/2018', '3/1/2018']
['60', 'dinner', '2/15/2018', '2/15/2018']
['60', 'lunch ', '2/21/2018', '2/21/2018']
['60', 'breakfast ', '3/15/2018', '3/15/2018']

Each person_id should only be counted once. I'm not sure how to do that inner loop on person_id in Python. I was attempting to loop through and then break out of the inner loop once the eat_count is incremented.
Can I get any suggestions?

To separate into columns, you could do something like:

import csv

def read_file(filename):
    column_titles = None
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            for col in row:
                print('{0:<15} '.format(col), end='')
            print()

if __name__ == '__main__':
    read_file('eat.csv')

which will yield:

Output:person_id       type            arrive_date     leave_date      
90              check_in        2/15/2018       2/15/2018       
90              brunch          2/15/2018       2/15/2018       
90              lunch           2/15/2018       2/15/2018       
90              cancelled       2/16/2018       2/16/2018       
90              breakfast       2/15/2018       2/22/2018       
80                                                              
40              check_in        2/15/2018       2/15/2018       
50              check_in        2/15/2018       3/1/2018        
50              breakfast       2/15/2018       2/26/2018       
50              lunch           2/15/2018       3/1/2018        
60              check_in        2/15/2018       2/15/2018       
60              dinner          2/15/2018       2/15/2018       
60              lunch           2/21/2018       2/21/2018       
60              breakfast       3/15/2018       3/15/2018       
35              check_in        3/15/2018       3/15/2018       
35              cancelled       3/20/2018       3/20/2018       
35              cancelled       3/21/2018       3/21/2018

Thanks for the replies. They will be helpful in the long run.
I'm still not sure how to handle the loops to get the totals I need...

eat_count - A person_id has a visit entry other than check_in and other than cancelled, the eat_count is incremented. A person_id with no entries in the other columns does not count. person_id 80 has no values in the other columns and does not count here.

no_eat_count a person_id has only visits equal check_in and/or cancelled, the no_eat_count is incremented. person_id 80 counts here.

Each person_id should only have one count.

The output should be…
eat_count = 3
no_eat_count = 3

I know it should be simple, but I don't yet grasp how to work with lists in Python.
Can anyone help?
Thanks.

Huck

Larz60+

Huck

Larz60+

Huck