Python Forum
extract specific content in a pandas dataframe with a regex?
Thread Rating:
  • 2 Vote(s) - 4.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract specific content in a pandas dataframe with a regex?
#1
I'm trying to extract a few words from a large Text field and place result in a new column.
After creating the new column, I'll then run another expression looking for a numerical value between 1 and 29 on either side of the word m_m_s_e. This is a score that I need to capture to another column

I'm stuck at stage 1 - Please help


The following code creates a new column that excludes the target text I want
MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "")

So I tried the following which throws an error
MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True)
Error
Error:
ValueError Traceback (most recent call last) <ipython-input-98-93cd99ac572d> in <module>() 1 #MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.replace(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", "") 2 ----> 3 MMSE_df['targettext'] = MMSE_df['cleannotetext'].str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", expand=True) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in extract(self, pat, flags, expand) 1706 @copy(str_extract) 1707 def extract(self, pat, flags=0, expand=None): -> 1708 return str_extract(self, pat, flags=flags, expand=expand) 1709 1710 @copy(str_extractall) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in str_extract(arr, pat, flags, expand) 689 raise ValueError("expand must be True or False") 690 if expand: --> 691 return _str_extract_frame(arr._orig, pat, flags=flags) 692 else: 693 result, name = _str_extract_noexpand(arr._data, pat, flags=flags) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in _str_extract_frame(arr, pat, flags) 581 582 regex = re.compile(pat, flags=flags) --> 583 groups_or_na = _groups_or_na_fun(regex) 584 names = dict(zip(regex.groupindex.values(), regex.groupindex.keys())) 585 columns = [names.get(1 + i, i) for i in range(regex.groups)] C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\strings.py in _groups_or_na_fun(regex) 524 """Used in both extract_noexpand and extract_frame""" 525 if regex.groups == 0: --> 526 raise ValueError("pattern contains no capture groups") 527 empty_row = [np.nan] * regex.groups 528 ValueError: pattern contains no capture groups

I got it working using the following

# A function to get the target test from a Provider notes.
import re
def get_targettext(notes):
   
  #  note_search = re.search(' ([A-Za-z]+)\.', name)
   note_search =  re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}m_m_s_e(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", notes)
    
    # If the title exists, extract and return it.
   if note_search:
        return note_search.group()
   return ""
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML Decoder pandas dataframe column mbrown009 3 962 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 1,091 Jan-06-2023, 10:53 PM
Last Post: haihal
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 955 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  Pandas Dataframe Filtering based on rows mvdlm 0 1,396 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  Pandas dataframe: calculate metrics by year mcva 1 2,269 Mar-02-2022, 08:22 AM
Last Post: mcva
  Pandas dataframe comparing anto5 0 1,243 Jan-30-2022, 10:21 AM
Last Post: anto5
  PANDAS: DataFrame | Replace and others questions moduki1 2 1,759 Jan-10-2022, 07:19 PM
Last Post: moduki1
  PANDAS: DataFrame | Saving the wrong value moduki1 0 1,527 Jan-10-2022, 04:42 PM
Last Post: moduki1
  Remove specific values from dataframe jonah88888 0 1,688 Sep-24-2021, 05:09 AM
Last Post: jonah88888
  update values in one dataframe based on another dataframe - Pandas iliasb 2 9,103 Aug-14-2021, 12:38 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020