Feb-22-2021, 10:13 PM
I'm creating an application to automatically categorize personal financial transactions based on the description (I know some institutions do that already but not any of the ones I use). My thought on categorization is to use a preassigned list of key phrases and categories. Let's say there are n=6000 transactions and p=300 key phrase / category combinations. I don't want to have a nested loop that performs n*p phrase-in-phrase operations.
My current thinking using SQL terminology is to have a single loop of size p containing a query using a WHERE DESCRIPTION LIKE phrase. That way I only have a single loop that is only as big as the size of my key phrase list. Pandas probably has a similar capability.
Q1: Is there a more efficient approach than what I describe?
This morning I started creating the key phrase list. My goal is to be able to automatically categorize 90% of the transactions. After an hour I was only through the D's. This is turning out to be a far more difficult problem than I first thought. Example: AMERICAN could be Travel as in airlines or payment to AMERICAN EXPRESS credit card. For each phrase I'm contemplating I perform a sample search to see how good it captures the desired transactions, then adjust based on the result. This is turning out to be monumental task.
Q2: Any thoughts on how to create a good key phrase list?
My current thinking using SQL terminology is to have a single loop of size p containing a query using a WHERE DESCRIPTION LIKE phrase. That way I only have a single loop that is only as big as the size of my key phrase list. Pandas probably has a similar capability.
Q1: Is there a more efficient approach than what I describe?
This morning I started creating the key phrase list. My goal is to be able to automatically categorize 90% of the transactions. After an hour I was only through the D's. This is turning out to be a far more difficult problem than I first thought. Example: AMERICAN could be Travel as in airlines or payment to AMERICAN EXPRESS credit card. For each phrase I'm contemplating I perform a sample search to see how good it captures the desired transactions, then adjust based on the result. This is turning out to be monumental task.
Q2: Any thoughts on how to create a good key phrase list?