Feb-22-2024, 10:52 PM
(This post was last modified: Feb-22-2024, 10:53 PM by standenman.)
I am trying to come up with a strategy for extracting key data from generic letters for different clients. This is the format of the letter I want to parse. It should look the same for every client, although there may be minor layout differences. First I want to extract the addressee of the letter which is redacted. Second I want to extract the name of the client. It is over on the right hand margin after "Re:". Then there are 2 items of data I want from the main body of the letter: the time period of the records requested (first sentence after heading "What We Need From You".) Then I want the date in the first sentence of the third paragraph in that heading "Please respond by May 26, 2023".
I have wondered about a regex approach, but then wondered is using some nlp tool like spacy better? Thanks for any advice - I really appreciate it!
I have wondered about a regex approach, but then wondered is using some nlp tool like spacy better? Thanks for any advice - I really appreciate it!
Attached Files
MedRequestTemplate_Redacted-min.pdf (Size: 176.14 KB / Downloads: 5)