![]() |
Use regular expression to return 5 words before and after target word. - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Use regular expression to return 5 words before and after target word. (/thread-5318.html) |
Use regular expression to return 5 words before and after target word. - steve1040 - Sep-28-2017 Hello, I've extracted doctor notes from our organization's database (Sequel Server) into a CSV file. I'd like to use Python to add a new field with just a few words from the notes - (I have Anaconda installed on my pc). Every row in my csv contains the word "diagnosis" within 5 words of the term diagnosis is the ICD10 code. Example of data: Row1: Hey doctor Who is here - I will add R45.1 as diagnosis, then some other medical terms and stuff about the patient. ffdsas ,dsfd tsdsf Row2: Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd I want Row1: I will add R45.1 as diagnosis, then some other medical terms Row2: other medical terms and stuff diagnosis of R45.2 was entered for I'm open to any suggestions if Reg expression is not the best approach. Thanks Steve RE: Use regular expression to return 5 words before and after target word. - snippsat - Sep-28-2017 You could do it like this if it's one string and want 5 word before diagnosis and 5 word after.>>> import re >>> s1 = 'Hey doctor Who is here - I will add R45.1 as diagnosis, then some other medical terms and stuff about the patient. ffdsas ,dsfd tsdsf' >>> s2 = 'Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd' >>> r1 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}diagnosis(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", s1) >>> r2 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}diagnosis(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", s2) >>> r1.group() 'I will add R45.1 as diagnosis, then some other medical terms' >>> r2.group() 'other medical terms and stuff diagnosis of R45.2 was entered for' RE: Use regular expression to return 5 words before and after target word. - nilamo - Sep-28-2017 I think the word "diagnosis" is unimportant here. Just scrape all the codes from the notes. It's a well defined format, and should be easy to get. I don't know anything about it, but yours have all been "R[number][number].[number]", which is a regex that would look like R\d{2}\.\d . Trying to parse the notes to get codes that happen to be near a word seems ridiculous, when you can just easily go straight to the codes.
|