Python Forum
Efficient method to find phrase in string
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Efficient method to find phrase in string
#1
I'm creating an application to automatically categorize personal financial transactions based on the description (I know some institutions do that already but not any of the ones I use). My thought on categorization is to use a preassigned list of key phrases and categories. Let's say there are n=6000 transactions and p=300 key phrase / category combinations. I don't want to have a nested loop that performs n*p phrase-in-phrase operations.

My current thinking using SQL terminology is to have a single loop of size p containing a query using a WHERE DESCRIPTION LIKE phrase. That way I only have a single loop that is only as big as the size of my key phrase list. Pandas probably has a similar capability.

Q1: Is there a more efficient approach than what I describe?

This morning I started creating the key phrase list. My goal is to be able to automatically categorize 90% of the transactions. After an hour I was only through the D's. This is turning out to be a far more difficult problem than I first thought. Example: AMERICAN could be Travel as in airlines or payment to AMERICAN EXPRESS credit card. For each phrase I'm contemplating I perform a sample search to see how good it captures the desired transactions, then adjust based on the result. This is turning out to be monumental task.

Q2: Any thoughts on how to create a good key phrase list?
Reply


Messages In This Thread
Efficient method to find phrase in string - by Tuxedo - Feb-22-2021, 10:13 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Regex find string then return 10 character after it pyStund 6 1,608 Aug-04-2022, 11:26 PM
Last Post: Pedroski55
  Architecting Efficient Plot blipton 0 1,345 Jan-03-2021, 07:44 PM
Last Post: blipton
  material for OOP and efficient numrical programming paul18fr 0 2,104 Sep-11-2019, 08:36 PM
Last Post: paul18fr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020