Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reset csv.reader
#11
I attached the csv:
client_id visittype_id visittype intake_date visit_date services_needed
90062526 109 Engagement 7/12/2018 7/12/2018
90063330 830 CM Eval 7/17/2018 7/17/2018 Yes
90063330 34 Case Management7/17/2018 7/25/2018 Yes
90065363 109 Engagement 7/16/2018 7/16/2018 No
90066364 123 GAP Case Mangmt Reg 7/12/2018 7/18/2018 Yes
90066364 123 GAP Case Mangmt Reg 7/12/2018 7/23/2018 Yes
90066364 122 GAP Case Mangmt High 7/12/2018 7/24/2018 Yes
90069663 Yes
90096343 233 Psychotherapy 7/5/2018 7/5/2018 No
90096343 233 Psychotherapy 7/5/2018 7/19/2018 No
90096592 436 Comprehensive Eval 7/5/2018 7/5/2018 Yes
90099646 830 CM Eval 7/16/2018 7/16/2018 Yes
90905234 481 No Show Cancelation 7/10/2018 7/19/2018 Yes
90905234 127 Group Orientation 7/10/2018 7/24/2018 Yes
91222222 481 No Show Cancelation 7/12/2018 7/23/2018 Yes
91222222 481 No Show Cancelation 7/12/2018 7/24/2018 Yes

For each client ID, we are tracking who had a followup and who had no followup after an initial evaluation.

The criteria for a with_followup:
services_needed = ‘Yes’.
There must be a visittype other than ‘Eval’ (anything that contains ‘Eval’).
There must be a visittype other than ‘No Show Cancelation’
If the visittype = ‘Evaluation’, the visit_date must be greater than intake_date.
If only one of the rows fits that criteria, it qualifies as a followup regardless of the values in the other rows for a client.

The criteria for no_followup:
services_needed = ‘Yes’.
If none of the criteria for a followup exists for a client, that client had no_followup.

The criteria for cancellation_only:
services_needed = ‘Yes’.
The client only has visittypes = ‘No Show Cancelation’.
This would also qualify for no_followup.

The sum of the percentages for with_followup and no_followup will be 100%.
The percentage for cancellation_only is separate.

The following works:

import csv
import xlwt
import xlsxwriter
from datetime import datetime
workbook = xlsxwriter.Workbook('followups.xlsx')
worksheet = workbook.add_worksheet()
fs=open("followups.txt" , "w+")
ns=open("nofollowups.txt", "w+")
cs=open("cancellations.txt", "w+")
with_followup = []
no_followup = []
cancellation_only = []
yes_service = []
no_service = []
followup_count = 0
no_followup_count = 0
cancellation_only_count = 0
total_count = 0
avg_followup = 0
avg_no_followup = 0
avg_no_show_cancel = 0
yes_service_count = 0
no_service_count = 0
with open(r'C:\mockup.csv', 'r') as f:
    reader = csv.reader(f)
    next(reader, None)  # skip the headers
    for row in reader:
        for k in row[0]:
            if 'Yes' in row [5]:
                if 'Eval' not in row [2]:
                    if 'No Show Cancelation' not in row [2]:
                        if 'Engagement' not in row [2]:
                            if row [2] != "":
                                if row [0] not in with_followup:
                                    followup_count += 1
                                    total_count += 1
                                    with_followup.append(row[0])
                                    fs.write(row[0] + "\n")
                                    break
    f.seek(0)
    next(reader, None)  # skip the headers
    for row in reader:
        for k in row[0]:
            if 'Yes' in row [5]:
                if 'Engagement' in row [2]:
                    if (row [4]) > (row [3]):
                        if row [0] not in with_followup:
                            followup_count += 1
                            total_count += 1
                            with_followup.append(row[0])
                            fs.write(row[0] + "\n")
                            break

    f.seek(0)
    next(reader, None)  # skip the headers
    for row in reader:
        for k in row[0]:
            if 'Yes' in row [5]:
                if row[0] not in with_followup:
                    if row[0] not in no_followup:
                        no_followup_count += 1
                        total_count += 1
                        no_followup.append(row[0])
                        ns.write(row[0] + "\n")
                        break

    f.seek(0)
    next(reader, None)  # skip the headers
    for row in reader:
        for k in row[0]:
            if 'Yes' in row [5]:
                if 'No Show Cancelation' in row [2]:
                    if row[0] not in with_followup:
                        if row[0] not in cancellation_only:
                            cancellation_only_count += 1
                            cancellation_only.append(row[0])
                            cs.write(row[0] + "\n")
                        break

    f.seek(0)
    next(reader, None)  # skip the headers
    for row in reader:
        for k in row[0]:
            if 'Yes' in row [5]:
                if row[0] not in yes_service:
                    yes_service_count += 1
                    yes_service.append(row[0])
                break
            elif 'No' in row [5]:
                if row[0] not in no_service:
                    no_service_count += 1
                    no_service.append(row[0])
                break

    avg_followup = followup_count/total_count
    avg_no_followup = no_followup_count/total_count
    avg_no_show_cancel = cancellation_only_count/total_count
    avg_followup = round(avg_followup,2)
    avg_no_followup = round(avg_no_followup,2)
    avg_no_show_cancel = round(avg_no_show_cancel,2)
    print (with_followup)
    print (no_followup)
    print (cancellation_only)
    print ('followup ' + str(followup_count))
    print ('no followup ' + str(no_followup_count))
    print ('cancellation only ' + str(cancellation_only_count))
    print ('total count ' + str(total_count))
    print ('average followup ' + str(avg_followup))
    print ('average no followup ' + str(avg_no_followup))
    print ('average no show / cancellation ' + str(avg_no_show_cancel))
    print ('yes service count ' + str(yes_service_count))
    print ('no service count ' + str(no_service_count))
    row = 0
    col = 0
    for item in (with_followup):
        worksheet.write(row,col,item)
        row += 1
    fs.close()
    ns.close()
    cs.close()
    workbook.close()
        
I made the criteria as concise as possible. I understand that some people might not want to wade through all of that. Please let me know if you need any clarifications.

I could do a lot, if not all, of this in a query. I don't want to make the query too complicated and slow. I need to do as much outside of the query as possible. Also, I am unable to have the Python interact with the database. It is prepackaged software. I can write queries but am not allowed to write outside code to run the queries.

I realize this is not ideal Python code (understatement). I still have a lot to learn.

I'll only be on sporadically until tomorrow.

Thanks to anyone that bothers to look this over.

Output:
followup 3 no followup 4 cancellation only 1 total count 7 average followup 0.43 average no followup 0.57 average no show / cancellation 0.14

.csv   mockup.csv (Size: 889 bytes / Downloads: 10)
Reply
#12
1. Loop over the file once.
2. Store the information in the memory in a dict, where id is key, i.e. one record per id. Values will be list, that holds all visits for given id.
3. Loop over the dict for each analyze the information for each id and classify accordingly

I will not provide specific code, because I'm not sure I fully understand the conditions as well there are records that don't fit - e.g. missing service_need or missing visit type/dates info. I would expect you know how to deal with such cases.

You can go OOP and define class for Visit and Client. Or at least you can use namedtuple from collections module to make your code more readable.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#13
Thanks, Buran. I'll try those suggestion. I was loading into a dict at one point. You're right that I should make my code more OOP.
I'll start another thread if I get stuck.
Reply
#14
    for row in reader:
        for k in row[0]:
            if 'Yes' in row [5]:
If you have more decisions in one row, then call out to another function, where the logic is handled.
Work row by row. Don't loop over and over again the same data. In some cases you may need to know the last row or two rows before, but your calculation does not depend on other data than the current row.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#15
His decision depends on info for given id from entire file. That's why I suggest to read once in a dict then analyze the info for each id
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#16
Yes, reading the whole file is more pragmatic :-D
I guess the file is not very big.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#17
Thanks, DeaD_EyE.

I'll keep your suggestions in mind.

I'll experiment with using a dictionary later. As suggested, I also need to start using functions and making my code more object oriented.

Pandas is another thing I need to learn.

This file will never be that large. It will be about 3x larger than my sample file. I should still attempt to write code that will handle humongous files as quickly and efficiently as possible. I'll work on that.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to optimize analog gauge reader? kadink 0 753 May-19-2023, 08:58 PM
Last Post: kadink
  vtk reader paul18fr 1 1,558 Feb-07-2022, 10:21 PM
Last Post: Larz60+
  Frequency Reader? mekha 8 5,277 Jul-11-2018, 04:04 AM
Last Post: mekha

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020