Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
NLP Issue
#1
I had an upwork freelancer working on a Django Python App. Now over $3K in, we have failure, and I suspect he is not up to the task. The app does the following: (i) extracts text from a html doc that is the name of a medical source (doctor) (ii) extracts contact information (address) from a third source (iii) uses a Word template to create a letter to that medical source requesting medical records. The problem came with the sloppiness with which the medical entity name text was input. This is an html doc created by a government agency. So for example proper name "Presbyterian Hospital of Dallas" might be "Dallas Presbyterian" or "Presbyterian Hosp." or even "Presbyterian Medical Center".

We were making API queries to public sources of hospital and doctor data, and if getting no response, doing some NLP processing. Our testing was consistently failing. He seemed resistant to my ideas on how to solve the problem. While I am without NLP experience, he seemed to be rejecting my ideas more out of pride than giving me solid reasons whey they were wrong.

I would really appreciate some honest feedback on whether I indeed don't know what I am talking about.

The issue to be solved, of course, is what did my lazy data entry clerk "really mean" with their slopping data entry. in considering how to do that, I seems to me very significant that we are dealing with a finite universe of possible answers. That is, we could have a database of all the hospitals and all the doctors in the US - that is the group I would be dealing with. And it also seems significant to me that in that database, lets say, of hospital names, certain words are going to be very frequent and others quite rare. So we know words like "Hospital" and "Medical Center" don't mean squat. Words like "Presbyterian" or "Lutheran" less so. But some words could be quite "identifying" like "Mayo" or "Sloan-Ketterling".

Moreover, our app does harvest geo-location data from the html doc, so we can assume that the correct medi al source is near that location.

Could you not take every word in a database of Hospital names in the US and rate there frequency in this database? Then apply some type of decision making based upon a words number rating? For example, if the word "Parkland" appears only once in the DB, bingo, it does not matter if the data entry person called "Parkland Memorial Hospital" the "Parkland Center for Certain Death", we have found the match. You could also apply this to word patterns such that if "Parkland Hospital" appears only once in the DB, we have a match.

My developer has told that this is "too hard". Any advice would be most and sincerely appreciated!!
Reply
#2
Get the data entry clerk to enter the correct information or put something in place that only allows the correct information to be entered.
Reply
#3
The data entry is by a government clerk that does not work for me. I have control over their work.
Reply
#4
You could do your plan, but I don't think it's going to be very effective. We had to the same sort of problem with business names where I used to work, which had been entered by dozens of different people with no real standards in place. We could only get about 70% of the names mapped programmatically, and we would have had to get people to do the rest of the mapping by hand.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
70% would be better than I have been getting. What was the strategy?
Reply
#6
I don't know, I wasn't a developer on that project, I was a user. They were trying to merge my division's business database with another division's, to have a unified database.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020