Python Forum
Use regular expression to return 5 words before and after target word. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Use regular expression to return 5 words before and after target word. (/thread-5318.html)



Use regular expression to return 5 words before and after target word. - steve1040 - Sep-28-2017

Hello,
I've extracted doctor notes from our organization's database (Sequel Server) into a CSV file.

I'd like to use Python to add a new field with just a few words from the notes - (I have Anaconda installed on my pc).

Every row in my csv contains the word "diagnosis" within 5 words of the term diagnosis is the ICD10 code.

Example of data:

Row1: Hey doctor Who is here - I will add R45.1 as diagnosis, then some other medical terms and stuff about the patient. ffdsas ,dsfd tsdsf
Row2: Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd

I want
Row1: I will add R45.1 as diagnosis, then some other medical terms
Row2: other medical terms and stuff diagnosis of R45.2 was entered for

I'm open to any suggestions if Reg expression is not the best approach.

Thanks
Steve


RE: Use regular expression to return 5 words before and after target word. - snippsat - Sep-28-2017

You could do it like this if it's one string and want 5 word before diagnosis and 5 word after.
>>> import re

>>> s1 = 'Hey doctor Who is here - I will add R45.1 as diagnosis, then some other medical terms and stuff about the patient. ffdsas ,dsfd tsdsf'
>>> s2 = 'Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd'
>>> r1 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}diagnosis(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", s1)
>>> r2 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}diagnosis(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", s2)
>>> r1.group()
'I will add R45.1 as diagnosis, then some other medical terms'
>>> r2.group()
'other medical terms and stuff diagnosis of R45.2 was entered for'



RE: Use regular expression to return 5 words before and after target word. - nilamo - Sep-28-2017

I think the word "diagnosis" is unimportant here.  Just scrape all the codes from the notes.  It's a well defined format, and should be easy to get.

I don't know anything about it, but yours have all been "R[number][number].[number]", which is a regex that would look like R\d{2}\.\d.  Trying to parse the notes to get codes that happen to be near a word seems ridiculous, when you can just easily go straight to the codes.