Python Forum
Use regular expression to return 5 words before and after target word.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Use regular expression to return 5 words before and after target word.
#1
Hello,
I've extracted doctor notes from our organization's database (Sequel Server) into a CSV file.

I'd like to use Python to add a new field with just a few words from the notes - (I have Anaconda installed on my pc).

Every row in my csv contains the word "diagnosis" within 5 words of the term diagnosis is the ICD10 code.

Example of data:

Row1: Hey doctor Who is here - I will add R45.1 as diagnosis, then some other medical terms and stuff about the patient. ffdsas ,dsfd tsdsf
Row2: Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd

I want
Row1: I will add R45.1 as diagnosis, then some other medical terms
Row2: other medical terms and stuff diagnosis of R45.2 was entered for

I'm open to any suggestions if Reg expression is not the best approach.

Thanks
Steve
Reply
#2
You could do it like this if it's one string and want 5 word before diagnosis and 5 word after.
>>> import re

>>> s1 = 'Hey doctor Who is here - I will add R45.1 as diagnosis, then some other medical terms and stuff about the patient. ffdsas ,dsfd tsdsf'
>>> s2 = 'Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd'
>>> r1 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}diagnosis(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", s1)
>>> r2 = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}diagnosis(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", s2)
>>> r1.group()
'I will add R45.1 as diagnosis, then some other medical terms'
>>> r2.group()
'other medical terms and stuff diagnosis of R45.2 was entered for'
Reply
#3
I think the word "diagnosis" is unimportant here.  Just scrape all the codes from the notes.  It's a well defined format, and should be easy to get.

I don't know anything about it, but yours have all been "R[number][number].[number]", which is a regex that would look like R\d{2}\.\d.  Trying to parse the notes to get codes that happen to be near a word seems ridiculous, when you can just easily go straight to the codes.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  data validation with specific regular expression shaheen07 0 331 Jan-12-2024, 07:56 AM
Last Post: shaheen07
  Pyautogui, İmagesearch, Moving target Beyazx 0 585 Jun-27-2023, 08:47 PM
Last Post: Beyazx
  What is the Target value in this GridSearchCV problem? Led_Zeppelin 0 657 Feb-15-2023, 07:32 PM
Last Post: Led_Zeppelin
  Regular Expression search to comment lines of code Gman2233 5 1,661 Sep-08-2022, 06:57 AM
Last Post: ndc85430
  List Creation and Position of Continue Statement In Regular Expression Code new_coder_231013 3 1,662 Jun-15-2022, 12:00 PM
Last Post: new_coder_231013
  How to read python shortcut target profile directory of Chrome Ink file sunny9495 1 1,656 Apr-12-2022, 06:12 PM
Last Post: sunny9495
  Need help with my code (regular expression) shailc 5 1,920 Apr-04-2022, 07:34 PM
Last Post: shailc
  Regular Expression for matching words xinyulon 1 2,165 Mar-09-2022, 10:34 PM
Last Post: snippsat
  regular expression question Skaperen 4 2,477 Aug-23-2021, 06:01 PM
Last Post: Skaperen
Question Problem: Check if a list contains a word and then continue with the next word Mangono 2 2,488 Aug-12-2021, 04:25 PM
Last Post: palladium

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020