extract specific content in a pandas dataframe with a regex? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: extract specific content in a pandas dataframe with a regex? (/thread-5458.html) |
extract specific content in a pandas dataframe with a regex? - steve1040 - Oct-05-2017 I'm trying to extract a few words from a large Text field and place result in a new column. After creating the new column, I'll then run another expression looking for a numerical value between 1 and 29 on either side of the word m_m_s_e. This is a score that I need to capture to another column I'm stuck at stage 1 - Please help The following code creates a new column that excludes the target text I want MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "") So I tried the following which throws an error MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True)Error
I got it working using the following # A function to get the target test from a Provider notes. import re def get_targettext(notes): # note_search = re.search(' ([A-Za-z]+)\.', name) note_search = re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", notes) # If the title exists, extract and return it. if note_search: return note_search.group() return "" |