Python Forum
Strategy for data extraction
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Strategy for data extraction
#1
I am trying to come up with a strategy for extracting key data from generic letters for different clients. This is the format of the letter I want to parse. It should look the same for every client, although there may be minor layout differences. First I want to extract the addressee of the letter which is redacted. Second I want to extract the name of the client. It is over on the right hand margin after "Re:". Then there are 2 items of data I want from the main body of the letter: the time period of the records requested (first sentence after heading "What We Need From You".) Then I want the date in the first sentence of the third paragraph in that heading "Please respond by May 26, 2023".

I have wondered about a regex approach, but then wondered is using some nlp tool like spacy better? Thanks for any advice - I really appreciate it!

Attached Files

.pdf   MedRequestTemplate_Redacted-min.pdf (Size: 176.14 KB / Downloads: 5)
Reply


Messages In This Thread
Strategy for data extraction - by standenman - Feb-22-2024, 10:52 PM
RE: Strategy for data extraction - by carecavoador - Mar-11-2024, 01:44 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Address Extraction standenman 7 586 Apr-10-2024, 05:22 PM
Last Post: DPaul
  Python Machine Learning: For Data Extraction JaneTan 0 1,879 Nov-24-2020, 06:45 AM
Last Post: JaneTan
  Backtesting trading strategy Finpyth 1 2,316 Mar-20-2020, 04:32 PM
Last Post: Finpyth
  Feature extraction algorithm lukaznt 1 2,619 Mar-02-2018, 05:16 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020