Python Forum

Full Version: Suggestions for a simple data analysis program
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys,
I'm starting a new project and would like to know what some of you more seasoned python coders might think of my plans.

I need to create a program with the purpose to cross-reference personal data from a spreadsheet, to check for any conflicts of interest between clients of 3 different law firms.

So it will read/parse-in data from various columns of a spreadsheet (name, age, address, date of incident, law firm).

I'm thinking that I will then parse this info into a dictionary for each client's personal info- then into 1 of 3 "superset" dictionaries for each law firm being looked at. I then need to somehow cross/compare the clients and check for any duplicate individuals that are a client at more than one of the law firms being looked at.

to determine "a match" I would first find any matches in last name, and then of those matches, look for any matches in birthday.
So if 2 entries have the same last name and birthday, they will be considered a match.

Questions-
1) does this type of thing sound like a good project for pandas? I've never used pandas before, but from what I've heard, this is exactly the type of thing pandas would be useful for. Any other modules that might be helpful in accomplishing this?

2) Does anybody foresee any issues in the procedure I'm planning to use to "store and sort" the data sets? I.E. the "dictionaries within larger dictionaries" idea? Would it be better to use lists, and have a list of dictionaries for each law firm? Do you think there's a better/easier way to do this?

Any input or comments at all would be appreciated, I just want to make sure I have a sound plan going into this project, and that there aren't any major issues with any of my ideas.

Thank you!