NLP Issue

standenman · Jul-08-2019, 12:03 AM

I had an upwork freelancer working on a Django Python App. Now over $3K in, we have failure, and I suspect he is not up to the task. The app does the following: (i) extracts text from a html doc that is the name of a medical source (doctor) (ii) extracts contact information (address) from a third source (iii) uses a Word template to create a letter to that medical source requesting medical records. The problem came with the sloppiness with which the medical entity name text was input. This is an html doc created by a government agency. So for example proper name "Presbyterian Hospital of Dallas" might be "Dallas Presbyterian" or "Presbyterian Hosp." or even "Presbyterian Medical Center".

We were making API queries to public sources of hospital and doctor data, and if getting no response, doing some NLP processing. Our testing was consistently failing. He seemed resistant to my ideas on how to solve the problem. While I am without NLP experience, he seemed to be rejecting my ideas more out of pride than giving me solid reasons whey they were wrong.

I would really appreciate some honest feedback on whether I indeed don't know what I am talking about.

The issue to be solved, of course, is what did my lazy data entry clerk "really mean" with their slopping data entry. in considering how to do that, I seems to me very significant that we are dealing with a finite universe of possible answers. That is, we could have a database of all the hospitals and all the doctors in the US - that is the group I would be dealing with. And it also seems significant to me that in that database, lets say, of hospital names, certain words are going to be very frequent and others quite rare. So we know words like "Hospital" and "Medical Center" don't mean squat. Words like "Presbyterian" or "Lutheran" less so. But some words could be quite "identifying" like "Mayo" or "Sloan-Ketterling".

Moreover, our app does harvest geo-location data from the html doc, so we can assume that the correct medi al source is near that location.

Could you not take every word in a database of Hospital names in the US and rate there frequency in this database? Then apply some type of decision making based upon a words number rating? For example, if the word "Parkland" appears only once in the DB, bingo, it does not matter if the data entry person called "Parkland Memorial Hospital" the "Parkland Center for Certain Death", we have found the match. You could also apply this to word patterns such that if "Parkland Hospital" appears only once in the DB, we have a match.

My developer has told that this is "too hard". Any advice would be most and sincerely appreciated!!

NLP Issue

User Panel Messages

Announcements